Description

Key Responsibilities

Production Application Management:

- Monitor and maintain the health of production applications.

- Respond to system alerts and logs to ensure high availability and performance.

 

Code Troubleshooting and Bug Fixing:

- Analyze, troubleshoot, and resolve code issues in Go and Kotlin.

- Collaborate with the development team to implement fixes and improvements.

 

Infrastructure and Monitoring:

- Design, implement, and manage infrastructure using Terraform.

- Set up and maintain monitoring, logging, and alerting systems to proactively identify and address issues.

 

Collaboration and Communication:

- Work closely with cross-functional teams to ensure seamless integration and deployment of applications.

- Participate in on-call rotations and provide support as needed.

 

Required Qualifications

Technical Skills:

- Proficiency in Go and/or Kotlin programming languages preferred.

- Experience with Google Cloud Platform (GCP) services and architecture is a must

- Strong understanding of infrastructure as code (IaC) principles, particularly with Terraform.

- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).

 

Experience:

- Previous experience in a DevOps or Site Reliability Engineering role with a focus on cloud environments.

- Demonstrated ability to troubleshoot complex systems and code issues.

Education

Any Graduate