Description

Role Description

This is a full-time on-site role for a Site Reliability Engineer located in Bengaluru. The Site Reliability Engineer will be responsible for day-to-day tasks such as troubleshooting, software development, system administration, and infrastructure management.


Qualifications

Responsibilities

· Overall Technical experience should be 8+ years
· Partner with the observability engineers to build out platform observability
· Incorporating SRE principles and practices to infuse reliability and observability frameworks with the goal of building resilient platforms.

· Streamlining active service management processes and practices related to change management, continuous deployments, incident management, blameless post-mortems, and problem management.
· Developing operational playbooks and frameworks to maintain standardized operating procedures, high availability of data platform services, reduction of failures, and elimination of toil.
· Partner with Business Applications/delivery teams to orchestrate and automate web application deployments
· Provide assistance to Build automation and process to enable self-serve production deployments
· Provide assistance to Code, configure, integrate, manage, and enhance various automation tools.
· Standardizing and rationalizing adopted tools and techniques for observability (SLI/SLO blueprints, instrumentation and collection of systems health, monitoring, alerting frameworks, and dashboarding). Partner with the observability engineers to build out platform observability End to End. Like leveraging Open Telemetry

· Establishing a readily available knowledge base, issue trend analysis, runbook / toil automation backlog, and transformational roadmap and approach.
· Providing data driven insights into systems health with self-healing capabilities, as well as real-time visibility of critical workloads via business dashboards.
· Lead Training/Capability development initiates around Site Reliability Engineering within the Organization
· Engineers worked on SRE Framework building, SRE transformation of traditional IT Support, adoption of MLOPS/AIOPS are the most preferred

 

 

Qualifications

· A mix between Software Development and System Administration experience.
· Proficiency in Java (Python)- Should have development (Java) background with strong code handling capabilities or at supporting experience (L3) of applications and scaled infrastructure with any cloud technology. Experience in working with Oracle Databases/PostgreSQL, DB Query optimization, performance tuning.

· Strong understanding of Microservices.
· Should have experience on JMS/Messaging and middleware like IBM MQ/Apache Kafka;
· AIX System Admin experience, scheduling, monitoring, and reporting functionalities for AutoSys jobs scheduling

· Experience working with Source Control Management Systems (GitLab, GitHub)
· Experience working with build, test, and deployment tools (Maven, Gradle, Jenkins)
· Hands on Experience with Observability Configuration: OpenTelemetry AppDynamics/Prometheus/DataDog/New Relic/Dynatrace, Splunk/Kibana, Grafana, Alert Manager. Should be able to build & configure customized metric-exporters/dashboards.
· Hands on experience with Docker and Kubernetes and their corresponding provider management services (Azure/ GCP/ AWS, Google Container Engine, Azure Container Service, AKS, GKE...).
· Hands on Experience with infrastructure as code tools (e.g., CloudFormation, Puppet, Chef, Ansible, Terraform)

· Hands on Experience on Toil automation or SRE implementation in large Banks, Financial Orgs with legacy infrastructure is an advantage.
· A strong ability to design and execute cutting edge System Testing strategies (smoke tests, performance/load tests, regression tests, capacity tests).
· Hands on experience in administering high availability and high-performance environments, as well as managing large-scale deployments of traffic-heavy applications.
· Excellent understanding of Scalability processes and techniques.
· Proven ability to work remotely with teams of various sizes in same/different time zones, from anywhere and remain highly motivated, productive, and organized.

Education

Any Graduate