Site Reliability Engineering (SRE) Lead

VDart
United States

Description

We are looking for an experienced Site Reliability Engineering (SRE) Lead to manage and improve the reliability, performance, and scalability of our systems. You will lead a small team, work with developers and operations, and ensure smooth running of production environments.

Key Responsibilities:

Lead and guide the SRE team.
Monitor, maintain, and improve system reliability and uptime.
Automate operational processes wherever possible.
Troubleshoot production issues and perform root cause analysis.
Collaborate with development and DevOps teams for deployments and upgrades.
Create and maintain documentation.

Required Skills:

Proven experience as an SRE or similar role.
Strong skills in cloud platforms (AWS, Azure, or GCP).
Proficiency in automation tools (Terraform, Ansible, etc.).
Knowledge of CI/CD pipelines.
Strong scripting skills (Python, Bash, etc.).
Good understanding of monitoring tools (Prometheus, Grafana, Datadog, etc.).
Excellent problem-solving and communication skills

Key Skills

Aws Azure Gcp Terraform Ansible Python Bash Sre Ci/cd Pipelines Prometheus

Education

Any Gradute

Apply Now

Back To Jobs

Posted On: Today
Experience: 5+ years of experience
Availability: Remote
Openings: 1
Category: site reliability engineering
Tenure: Contract - Corp-to-Corp Position