Qualifications:
10+ years of technical support experience in enterprise environments.
AWS Solutions Architect or Certified Kubernetes Administrator (CKA) certification required.
Strong expertise in monitoring tools. One or more of the following - Datadog, Nagios, Prometheus, AWS CloudWatch, Splunk.
Proficiency in scripting and automation (Python, PowerShell or Bash).
Experience with cloud networking, security, IAM policies, and infrastructure optimization.
Familiarity with CI/CD pipelines, DevOps methodologies, and infrastructure as code (Terraform, CloudFormation).
ITIL Foundation certification (preferred).
Hands-on experience with ServiceNow or similar ITSM platforms - nice to have.
Strong analytical thinking and problem-solving skills with a proactive mindset.
Excellent communication skills and ability to collaborate effectively across teams.
Responsibilities:
Resolve complex technical incidents related to AWS infrastructure, networking, and applications within SLA targets.
Perform root cause analysis (RCA) and implement long-term solutions to prevent recurring issues.
Monitor system health using Datadog, Prometheus, AWS CloudWatch, and Splunk, responding proactively to alerts.
Automate operational tasks and incident response using Python, PowerShell, or Bash scripting.
Optimize AWS resources, configurations, and cost efficiency, ensuring reliability and security.
Collaborate with DevOps and engineering teams to enhance CI/CD pipelines and automate deployments.
Maintain operational runbooks, SOPs, and knowledge base articles for efficient troubleshooting.
Mentor junior engineers and drive continuous service improvement through SRE best practices
Any Gradute