Job Description:
- 10+ years of professional experience in DevOps, Site Reliability Engineering (SRE), or infrastructure roles, with at least 3 years in a Principal or lead capacity.
- Extensive experience with cloud environments (AWS, Azure, GCP), infrastructure automation, and management.
- Expertise in containerization and orchestration technologies (Docker, Kubernetes, OpenShift).
- Advanced proficiency with infrastructure as code tools (Terraform, Ansible, CloudFormation).
- Proven experience implementing comprehensive observability and monitoring platforms (Prometheus, Grafana, Datadog, Dynatrace, Splunk).
- Demonstrated knowledge and application of AIOps tools and practices (Dynatrace, Moogsoft, Splunk ITSI, BigPanda).
- Strong scripting and programming abilities (Python, Bash, Ruby, Go).
- Exceptional leadership, analytical, problem-solving, and interpersonal skills.
Preferred Qualifications:
- Experience leading large-scale system deployments in high-availability environments.
- Familiarity with security standard processes and DevSecOps methodologies.
- Active involvement in industry communities and contributions to DevOps innovation.
- Professional certifications in cloud platforms (AWS, Azure, GCP) or Kubernetes (CKA).