Responsibilities:
- Design, build, and maintain reliable infrastructure and CI/CD pipelines
- Implement and manage observability tools (monitoring, alerting, logging)
- Automate routine tasks using scripting and configuration management tools
- Collaborate with development and operations teams to ensure high availability and performance
- Identify and resolve system reliability issues, and participate in incident response
- Ensure security and compliance standards are integrated into DevOps practices
- Participate in on-call rotations as needed
Required Skills & Qualifications:
- 5+ years of experience in DevOps/SRE roles
- Strong knowledge of cloud platforms (AWS, Azure, or GCP)
- Expertise with tools like Docker, Kubernetes, Terraform, Jenkins, Git, and Helm
- Experience with monitoring/logging tools (Prometheus, Grafana, ELK, etc.)
- Proficiency in scripting languages (Python, Bash, etc.)
- Solid understanding of networking, system administration, and security best practices
- Excellent problem-solving and communication skills