Key Responsibilities:
Cloud Infrastructure Leadership
- Lead the design, implementation, and governance of cloud infrastructure across Azure and AWS.
- Establish and maintain multi-account/multi-subscription architecture, landing zones, and security baselines.
- Define and enforce cloud architecture standards, best practices, and reusable templates.
Project & Operations Management
- Own and lead the cloud operations project from initiation through steady-state BAU support.
- Define and manage SLAs, KPIs, and operational metrics for cloud services.
- Coordinate with internal teams and external partners to ensure timely delivery of milestones.
- Conduct governance reviews, risk assessments, and capacity planning.
BAU Support & Service Delivery
- Oversee day-to-day cloud operations, including incident management, change control, and problem resolution.
- Ensure 24x7 availability and performance of cloud services through monitoring and alerting systems.
- Maintain and evolve runbooks, SOPs, and operational documentation.
- Lead root cause analysis (RCA) and implement preventive measures for recurring issues.
Automation & Continuous Improvement
- Drive infrastructure automation using Terraform and CI/CD pipelines.
- Implement AI-based autonomous operations and self-healing capabilities.
- Continuously improve operational efficiency through automation, tooling, and process optimization.
Disaster Recovery & Resilience
- Design and implement cloud-native disaster recovery (DR) and business continuity strategies.
- Define and test RTO/RPO objectives and ensure DR readiness across environments.
- Integrate DR into infrastructure blueprints and operational workflows.
Security, Compliance & Cost Governance
- Apply and enforce security and compliance policies across cloud platforms.
- Leverage tools like ServiceNow, Dynatrace, Flexera, and Azure Cost Management for governance.
- Monitor and optimize cloud spend, resource utilization, and license management.
Team Leadership & Stakeholder Engagement
- Lead a cross-functional cloud operations team with unified skills across compute, network, storage, and DevSecOps.
- Collaborate with application, security, and infrastructure teams to support migrations and deployments.
- Act as the primary point of contact for cloud operations with business and technical stakeholders.
Required Skills and Qualifications:
- Master's degree in Computer Science, Engineering, or related field.
- 10 + years of experience in IT infrastructure, with 3+ years in cloud architecture and operations leadership.
- Proven experience managing cloud operations projects and delivering against defined SLAs.
- Deep expertise in Azure and AWS cloud platforms.
- Strong hands-on experience with Terraform, ServiceNow, Dynatrace, and Flexera.
- Proficiency in DevSecOps, ITSM, and governance frameworks.
- Excellent leadership, communication, and stakeholder management skills.
- Familiarity with immutable infrastructure, auto-scaling, and release strategies.
- Excellent communication and stakeholder management skills.
- Ability to work in a cross-functional, agile environment.
- Architect programmable infrastructure interfaces to enable developer self-service provisioning.
- Implement and manage infrastructure as code (IaC) using tools like Terraform.
- Integrate and optimize monitoring and automation tools (e.g., ServiceNow, Dynatrace, Flexera).
- Oversee configuration management, state orchestration, and deployment pipelines.
- Background in supporting cloud operations at scale with unified skillsets across compute, network, storage, and security.
Preferred Qualifications:
- Certifications: AWS Solutions Architect, Azure Solutions Architect Expert, ITIL Foundation.
- Experience leading cloud transformation or managed services engagements.
- Familiarity with application performance monitoring, immutable infrastructure, and cost optimization