Description

Job Description

Requirements

·         Lead and manage the RCA process for all SRE incidents, ensuring a thorough and timely investigation.
 

·         Facilitate RCA workshops, guiding teams through a structured analysis to identify the root cause of incidents.
 

·         Document RCA findings and recommendations in a clear and concise manner.
 

·         Work with SRE engineers and developers to implement corrective actions and preventative measures based on RCA findings.
 

·         Analyze trends in incident data to identify areas for improvement in system design, monitoring, and automation.
 

·         Develop and implement best practices for RCA within the SRE organization.
 

·         Stay up-to-date on the latest SRE practices and incident response methodologies.
 

·         Collaborate with other teams (e.g., security, product) to ensure a holistic approach to incident management.
 

·         Mentor and coach SRE engineers on effective RCA techniques.
 

·         Track and report on key metrics related to incident management and RCA effectiveness.
 


 

Education

Any Graduate