Key Responsibilities
Production Application Management:
- Monitor and maintain the health of production applications.
- Respond to system alerts and logs to ensure high availability and performance.
Code Troubleshooting and Bug Fixing:
- Analyze, troubleshoot, and resolve code issues in Go and Kotlin.
- Collaborate with the development team to implement fixes and improvements.
Infrastructure and Monitoring:
- Design, implement, and manage infrastructure using Terraform.
- Set up and maintain monitoring, logging, and alerting systems to proactively identify and address issues.
Collaboration and Communication:
- Work closely with cross-functional teams to ensure seamless integration and deployment of applications.
- Participate in on-call rotations and provide support as needed.
Required Qualifications
Technical Skills:
- Proficiency in Go and/or Kotlin programming languages preferred.
- Experience with Google Cloud Platform (GCP) services and architecture is a must
- Strong understanding of infrastructure as code (IaC) principles, particularly with Terraform.
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
Experience:
- Previous experience in a DevOps or Site Reliability Engineering role with a focus on cloud environments.
- Demonstrated ability to troubleshoot complex systems and code issues.
Any Graduate