- We are looking for a highly skilled Site Reliability Engineer (SRE) with a strong background in performance engineering to lead and champion best practices across a cross-functional team.
- This role goes beyond traditional automation and dashboarding—we need an SRE evangelist who is hands-on in defining and implementing SLIs, SLOs, and distributed tracing while collaborating with senior managers and application teams.
Key Responsibilities:
- Lead program-level initiatives within a cross-functional SRE team.
- Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to enhance reliability and performance.
- Educate teams on SRE maturity, starting from basic monitoring to distributed tracing and beyond.
- Partner with senior managers and application teams to define, refine, and implement SLOs that align with business needs.
- Act as an SRE evangelist, advocating best practices and ensuring adherence across teams.
- Build and optimize New Relic dashboards for performance insights.
- Drive the evolution of performance engineering into modern SRE practices.
Required Skills:
Strong background in performance engineering with hands-on SRE experience.
Expertise in defining SLIs, SLOs, and error budgets.
Experience with monitoring, distributed tracing, and observability tools (e.g., New Relic, Datadog, Prometheus).
Ability to influence and educate teams on SRE best practices.
Proven ability to work with senior stakeholders to define and implement reliability objectives.
Hands-on experience in site reliability engineering, incident management, and system resilience.
Strong programming and scripting skills
- Strong experience with AWS cloud services