- Provide operational support and troubleshooting for cloud infrastructure deployed on AWS.
- Maintain and support Kubernetes clusters (EKS or self-managed), ensuring high availability and performance.
- Support infrastructure-as-code configurations and environments built using Terraform.
- Respond to incidents and service requests, ensuring timely resolution or escalation.
- Monitor system performance using tools like CloudWatch, Splunk or Datadog.
- Assist in deploying new environments, services, and applications using CI/CD pipelines.
- Collaborate with DevOps and engineering teams to improve automation and reduce manual intervention.
- Document runbooks, procedures, and known issues to improve operational readiness.
Required Skills and Qualifications:
- 4+ years of hands-on experience with AWS, especially in support and SRE roles.
- Experience with Kubernetes (EKS) for managing containerized workloads.
- Familiarity with Terraform for managing infrastructure-as-code.
- Basic scripting skills in Python and Ansible
- Understanding of networking concepts such as VPCs, DNS, Load Balancing, and Security Groups.
- Strong troubleshooting and problem-solving skills.
- Familiarity with monitoring, alerting, and logging tools.
- Good communication skills and ability to work in a collaborative environment.
Nice to Have:
- AWS Certified Solution Architect Associate.
- Certified Kubernetes Administrator (CKA)
- Experience with ITIL practices or ticketing systems (e.g., Jira, ServiceNow).
- Exposure to CI/CD tools like Jenkins, Harness and AWS Code pipeline