Infrastructure Management: Design, implement, and manage scalable and secure infrastructure using Infrastructure as Code (IaC) tools like Terraform, CloudFormation, or Ansible.
CI/CD Pipelines: Develop and maintain continuous integration and continuous deployment (CI/CD) pipelines using tools like Jenkins, GitLab CI, or CircleCI to ensure efficient and reliable software delivery.
Monitoring and Alerting: Implement robust monitoring and alerting systems using tools such as Prometheus, Grafana, Datadog, or New Relic to ensure system reliability and performance.
Incident Management: Lead incident response efforts, conduct post-mortem analyses, and implement improvements to prevent future occurrences.
Automation: Automate routine tasks and processes to enhance operational efficiency and reduce manual intervention.
Security: Knowledge of security best practices and tools for securing infrastructure and applications.
Collaboration: Work closely with development teams to ensure that systems are designed for reliability and scalability from the ground up.
Performance Optimization: Identify and resolve performance bottlenecks in applications and infrastructure.
Documentation: Create and maintain comprehensive documentation for infrastructure, processes, and procedures.
Requirements:
Experience: 5 to 7 years of experience in DevOps or SRE roles, with a strong understanding of both disciplines.
Cloud Platforms: Proficiency in public cloud platforms such as AWS, Azure, or Google Cloud.
Scripting: Strong scripting skills in languages like Python, Bash, or Ruby.
Configuration Management: Experience with configuration management tools like Chef, Puppet, or Ansible.
Containerization: Expertise in containerization technologies such as Docker and Kubernetes.
Problem Solving: Excellent problem-solving skills and the ability to work under tight deadlines.
Communication: Strong communication skills and the ability to work collaboratively with cross-functional teams