Job Description
• Participate in on-call rotations for 24/7 Azure Production support
• Respond to and resolve production incidents and outages
• Conduct root cause analysis for major incidents and implement preventive measures.
• Provide advanced technical support for Azure Databricks, including troubleshooting and resolving complex issues.
• Monitor system logs, detect issues, and resolve technical problems related to the Kubernetes infrastructure.
• Maintain and optimize Azure infrastructure for production environments.
• Monitor system health, performance, and security.
• Implement and manage backup and disaster recovery solutions
• Perform capacity planning and cost optimization
• Automate routine tasks and create self-healing systems
• Implement and maintain security controls and compliance measures
• Collaborate with development teams to improve application performance and scalability
• Manage and optimize Azure resources utilization
• Stay updated with new Azure features and best practices
• Manage and optimize costs through FinOps practices
• Assist in the design and implementation of new Azure-based solutions
• Document processes, configurations, and system architectures
• Strong understanding of Azure services, including Azure Kubernetes Service (AKS)
• Create / Maintain Scripts: Bash and PowerShell, to maintain the infrastructure.
• Create / Maintain Terraform Scripts
• Mentor junior team members and share knowledge
• Azure Administrator Associate (AZ-104) & (AZ-204)
• Azure Network Engineer Associate (AZ-700)
Any Graduate