Description

We are seeking a highly skilled and proactive Infrastructure Architect to lead incident management and technical problem-solving efforts across our enterprise systems. This role requires a hands-on leader with deep technical expertise, strong communication skills, and the ability to operate under pressure in a fast-paced production environment.

 

Key Responsibilities:

  • Incident Management & Resolution
    • Lead and coordinate high-severity incident response and root cause analysis.
    • Facilitate technical war rooms and drive resolution across cross-functional teams.
    • Provide clear, timely updates to stakeholders and leadership during outages or critical issues.
  • Technical Leadership
    • Guide troubleshooting sessions involving AWS Cloud, Salesforce, databases, and networking infrastructure.
    • Analyze complex infrastructure issues and propose creative, actionable solutions.
    • Collaborate with vendors to evaluate options and recommend the best course of action.
  • Infrastructure & Cloud Expertise
    • Design and support scalable, secure, and resilient cloud infrastructure (primarily AWS).
    • Understand and troubleshoot across systems including:
      • AWS Cloud Services (Must have it)
      • Azure Cloud Service (Preferred)
      • Snowflake Cloud  (nice to have) 
      • Salesforce platform (Preferred)
      • Relational and NoSQL Databases
      • Datadog monitoring and observability tools
      • Cisco networking (switches, routing, connectivity)
  • Operational Excellence
    • Bring strong production support experience, including after-hours availability when needed.
    • Monitor system health and performance, and proactively address potential issues.
    • Maintain and improve incident response playbooks and escalation procedures.
  • Communication & Leadership
    • Communicate effectively with technical and non-technical stakeholders.
    • Provide leadership in planning, prioritizing, and executing infrastructure initiatives.
    • Mentor junior engineers and foster a culture of accountability and continuous improvement.

Required Qualifications:

  • 10+ years of experience in IT infrastructure, cloud operations, or related roles.
  • Proven experience leading incident response and technical troubleshooting.
  • Strong hands-on knowledge of:
    • AWS (EC2, VPC, S3, CloudWatch, etc.)
    • Salesforce administration and integration
    • Databases (SQL, NoSQL)
    • Datadog or similar observability platforms
    • Cisco networking (switches, VLANs, routing)
  • Familiarity with programming/scripting (Python, Bash, etc.) and infrastructure-as-code tools.
  • Excellent analytical, problem-solving, and decision-making skills.
  • Strong leadership and stakeholder management capabilities.
  • Willingness to work extended hours during critical incidents.

Preferred Qualifications:

  • Experience working in a hybrid cloud environment.
  • Exposure to DevOps practices and CI/CD pipelines.
  • Prior experience in a consulting or vendor-facing role

Education

Any Gradute