We are seeking a skilled and proactive DevOps Lead to join our team and play a critical role in shaping, maintaining, and scaling our infrastructure and deployment pipelines. This role requires hands-on experience with cloud platforms (Azure and AWS), a deep understanding of DevOps principles, and a focus on enabling seamless AI/ML deployments in a secure, cost-effective, and scalable environment
Minimum Qualifications
- 15+ years of experience in DevOps practices, CI/CD processes, and cloud-native technologies.
- Strong hands-on experience with Azure and AWS, including services like ECS, EKS, AKS, EC2, IAM, and networking.
- Experience with containerization and orchestration tools such as Docker and Kubernetes.
- Familiarity with AI/ML deployment pipelines, including versioning, rollbacks, and automation.
- Proficiency with monitoring tools (e.g., Prometheus, Grafana, CloudWatch) and log aggregation systems.
- Strong problem-solving skills and ability to troubleshoot and optimize complex systems.
- Excellent communication and collaboration skills to effectively interact with cross-functional teams and clients.
Key Responsibilities
- Design, develop, and maintain infrastructure-as-code (IaC) using tools like Terraform, Helm, and ARM templates across Azure and AWS platforms.
- Provision and manage cloud infrastructure to support scalable, high-performance deployments.
- Optimize cloud resources for cost efficiency, performance, and resilience.
- Ensure security, compliance, and governance of infrastructure supporting GenAI and AI/ML workloads.
- Collaborate closely with the Customer Success Team to resolve infrastructure-related technical issues and provide support for client deployments.
- Build and maintain monitoring, logging, and analytics tools to ensure system health and rapid issue resolution.
- Develop and manage CI/CD pipelines that support AI/ML model deployment workflows