Description

Key Responsibilities:
·        Technical Leadership: Build and mentor a team of SREs; set goals, conduct reviews, and drive SRE best practices.
·        System Reliability: Oversee the design and maintenance of high-availability systems; lead performance monitoring and issue resolution.
·        Automation & CI/CD: Lead development of automation scripts and enhance CI/CD pipelines using tools like Terraform, Ansible, and others
·        Observability: Deploy and manage tools (e.g., New Relic) for system monitoring; develop dashboards and alerts
·        Incident Management: Lead Root Cause Analysis (RCA) and refine incident response processes
·        Performance Optimization: Provide strategic insights to enhance application and database performance (Java, Kafka, SQL) 
Qualifications:
·        Proven experience managing SRE or related teams in an eCommerce or highly distributed systems environment.
·        Strong skills in automation tools (Terraform, Ansible) and observability solutions (New Relic), with an emphasis on managing large-scale distributed systems.
·        Experience working with SAP modules in conjunction with custom applications or microservices architectures.
·        Good understanding of storage technologies (SAN/NAS), network infrastructure (load balancers, firewalls), and their impact on system performance in high-throughput environments.
·        Background in optimizing performance for Java-based applications, Spring Boot services, Kafka message brokers, SQL/NoSQL databases, and middleware components.
·        Familiarity with middleware technologies such as Kafka in distributed environments.
·        Excellent leadership, problem-solving, communication skills with experience working cross-functionally between development teams, infrastructure teams, and business stakeholders

Education

Any Gradute