Key Responsibilities
Design, implement, and maintain scalable and reliable cloud-based solutions using AWS.
Develop automation scripts and tools using Python to improve system reliability and efficiency.
Monitor system performance and reliability using New Relic, Grafana, or similar reporting tools.
Implement proactive monitoring, alerting, and observability best practices.
Collaborate with development and operations teams to ensure seamless deployment and system resilience.
Troubleshoot, analyze, and resolve performance issues and incidents.
Optimize infrastructure to improve availability, scalability, and cost-efficiency.
Contribute to CI/CD pipeline enhancements and automation efforts.
Ensure adherence to security best practices and compliance requirements.
Required Skills & Qualifications
Proficiency in Python for scripting and automation.
Strong experience with AWS services such as EC2, S3, Lambda, CloudWatch, and IAM.
Hands-on experience with monitoring and observability tools like New Relic, Grafana, Prometheus, or similar.
Solid understanding of DevOps principles and Site Reliability Engineering (SRE) practices.
Experience with containerization tools like Docker and orchestration tools like Kubernetes.
Knowledge of infrastructure-as-code (IaC) tools such as Terraform or CloudFormation.
Familiarity with CI/CD pipelines and automation frameworks.
Strong troubleshooting and problem-solving skills.
Excellent communication and collaboration abilities.
Nice to Have
Experience with logging and analytics tools such as ELK Stack (Elasticsearch, Logstash, Kibana).
Knowledge of networking, security best practices, and compliance frameworks.
Exposure to incident management and on-call support processes.
Familiarity with GitOps practices.
Any Graduate