Key Skills: Azure, Site Reliability Engineer.
Roles & Responsibilities:
- Implement and manage cloud infrastructure using Azure services.
- Develop and maintain scripts in Bash, PowerShell, Python, and other programming languages.
- Collaborate with multiple support groups to ensure seamless service delivery.
- Troubleshoot complex issues across cloud services and application stacks.
- Ensure security and compliance of cloud infrastructure.
- Document processes and changes, adhering to agile development practices.
- Continuously seek improvements in processes and infrastructure management.
- Monitor system performance, reliability, and availability using tools like Azure Monitor, Application Insights, and Log Analytics.
- Automate operational tasks and deployments using Azure DevOps, ARM templates, or Terraform.
- Conduct regular performance tuning and capacity planning for cloud resources.
- Participate in incident response, root cause analysis, and post-incident reviews to enhance system resilience.
Experience Required:
- 5 - 8 years of experience managing and maintaining production workloads on Azure.
- Proven expertise in writing infrastructure-as-code using Azure Resource Manager (ARM), Bicep, or Terraform.
- Experience with CI/CD pipelines, release management, and automation in Azure DevOps.
- In-depth understanding of cloud security practices, including identity management and network security in Azure.
- Familiarity with monitoring, logging, and alerting best practices using Azure-native and third-party tools.
- Demonstrated ability to work in Agile/Scrum teams and contribute to continuous integration and delivery cycles.
- Experience with incident management, root cause analysis, and improving MTTR in production environments.
Education: Any Graduation