Description

Key Responsibilities:

Cloud Infrastructure Leadership

  • Lead the design, implementation, and governance of cloud infrastructure across Azure and AWS.
  • Establish and maintain multi-account/multi-subscription architecture, landing zones, and security baselines.
  • Define and enforce cloud architecture standards, best practices, and reusable templates.

Project & Operations Management

  • Own and lead the cloud operations project from initiation through steady-state BAU support.
  • Define and manage SLAs, KPIs, and operational metrics for cloud services.
  • Coordinate with internal teams and external partners to ensure timely delivery of milestones.
  • Conduct governance reviews, risk assessments, and capacity planning.

BAU Support & Service Delivery

  • Oversee day-to-day cloud operations, including incident management, change control, and problem resolution.
  • Ensure 24x7 availability and performance of cloud services through monitoring and alerting systems.
  • Maintain and evolve runbooks, SOPs, and operational documentation.
  • Lead root cause analysis (RCA) and implement preventive measures for recurring issues.

Automation & Continuous Improvement

  • Drive infrastructure automation using Terraform and CI/CD pipelines.
  • Implement AI-based autonomous operations and self-healing capabilities.
  • Continuously improve operational efficiency through automation, tooling, and process optimization.

Disaster Recovery & Resilience

  • Design and implement cloud-native disaster recovery (DR) and business continuity strategies.
  • Define and test RTO/RPO objectives and ensure DR readiness across environments.
  • Integrate DR into infrastructure blueprints and operational workflows.

Security, Compliance & Cost Governance

  • Apply and enforce security and compliance policies across cloud platforms.
  • Leverage tools like ServiceNow, Dynatrace, Flexera, and Azure Cost Management for governance.
  • Monitor and optimize cloud spend, resource utilization, and license management.

Team Leadership & Stakeholder Engagement

  • Lead a cross-functional cloud operations team with unified skills across compute, network, storage, and DevSecOps.
  • Collaborate with application, security, and infrastructure teams to support migrations and deployments.
  • Act as the primary point of contact for cloud operations with business and technical stakeholders.

Required Skills and Qualifications:

  • Master's degree in Computer Science, Engineering, or related field.
  • 10 + years of experience in IT infrastructure, with 3+ years in cloud architecture and operations leadership.
  • Proven experience managing cloud operations projects and delivering against defined SLAs.
  • Deep expertise in Azure and AWS cloud platforms.
  • Strong hands-on experience with Terraform, ServiceNow, Dynatrace, and Flexera.
  • Proficiency in DevSecOps, ITSM, and governance frameworks.
  • Excellent leadership, communication, and stakeholder management skills.
  • Familiarity with immutable infrastructure, auto-scaling, and release strategies.
  • Excellent communication and stakeholder management skills.
  • Ability to work in a cross-functional, agile environment.
  • Architect programmable infrastructure interfaces to enable developer self-service provisioning.
  • Implement and manage infrastructure as code (IaC) using tools like Terraform.
  • Integrate and optimize monitoring and automation tools (e.g., ServiceNow, Dynatrace, Flexera).
  • Oversee configuration management, state orchestration, and deployment pipelines.
  • Background in supporting cloud operations at scale with unified skillsets across compute, network, storage, and security.

Preferred Qualifications:

  • Certifications: AWS Solutions Architect, Azure Solutions Architect Expert, ITIL Foundation.
  • Experience leading cloud transformation or managed services engagements.
  • Familiarity with application performance monitoring, immutable infrastructure, and cost optimization

Education

Master's degree