Must have/required experience and skills:
• 8+ years of experience on DevOps and Site Reliability Engineering.
• Hands-on with containerization and orchestration: Docker, Kubernetes/EKS.
• Proficiency in infrastructure as code tools: Terraform, Ansible, or CloudFormation.
• Experience setting up and managing services running on Kubernetes.
• In-depth understanding of SRE principals including monitoring, alerting, error budgets, fault analysis, and automation.
• In-depth knowledge of monitoring and observability tools: Apache Splunk
• Knowledge of Linux operating system principles, networking fundamentals, and systems management
• Demonstrable fluency in at least one of the following languages: Java or Python
• Ability to identify and communicate technical and architectural problems, while working with partners and their team to iteratively find solutions.
• Building and managing CI/CD pipeline – gatekeeping production deployments, develop and implement GIT branching strategies, branch protection rules, network policies, scale up/ scale down the load on AWS.
• Strong problem-solving and analytical skills
• Solve performance issues and scalability issues in the system
Any Graduate