Azure Site Reliability Engineer

Job Description:

Monitoring and Alerting: Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users.
Incident Response: Respond to incidents and outages, diagnose problems, and implement solutions to minimize downtime and restore service.
Automation: Automate repetitive tasks and processes to improve efficiency and reduce manual effort.
Performance Optimization: Identify and address performance bottlenecks to ensure systems run efficiently and effectively.
Infrastructure Management: Manage and maintain the underlying infrastructure, including servers, networks, and cloud resources.
Capacity Planning: Plan for future capacity needs to ensure systems can handle anticipated workloads.
Release Engineering: Develop and maintain processes for deploying software updates and releases.
Collaboration: Work closely with developers, operations teams, and other stakeholders to ensure system reliability and availability.
Documentation: Maintain clear and concise documentation of systems, processes, and procedures.
Continuous Improvement: Identify areas for improvement and implement changes to enhance system reliability and performance.

Skills and Qualifications:

Any Graduate

Back To Jobs