Description

Job Summary:
We seek an experienced SRE Lead to lead our team in ensuring system reliability, performance, and scalability. The candidate will drive infrastructure automation, optimize performance, and lead incident management, while fostering a culture of continuous improvement

Key Responsibilities:
·       Technical Leadership: Build and mentor a team of SREs; set goals, conduct reviews, and drive SRE best practices.
·       System Reliability: Oversee the design and maintenance of high-availability systems; lead performance monitoring and issue resolution.
·       Automation & CI/CD: Lead development of automation scripts and enhance CI/CD pipelines using tools like Terraform, Ansible, and others
·       Observability: Deploy and manage tools (e.g., New Relic) for system monitoring; develop dashboards and alerts
·       Incident Management: Lead Root Cause Analysis (RCA) and refine incident response processes
·       Performance Optimization: Provide strategic insights to enhance application and database performance (Java, Kafka, SQL) 

Qualifications:
·       Proven experience managing SRE or related teams in an eCommerce or highly distributed systems environment.
·       Strong skills in automation tools (Terraform, Ansible) and observability solutions (New Relic), with an emphasis on managing large-scale distributed systems.
·       Experience working with SAP modules in conjunction with custom applications or microservices architectures.
·       Good understanding of storage technologies (SAN/NAS), network infrastructure (load balancers, firewalls), and their impact on system performance in high-throughput environments.
·       Background in optimizing performance for Java-based applications, Spring Boot services, Kafka message brokers, SQL/NoSQL databases, and middleware components.

Education

Any Graduate