This job will have the following responsibilities:
- Monitor infrastructure using Datadog, Splunk, and AWS-native tools.
- Ensure uptime and performance SLAs for cloud and on-prem systems.
- Troubleshoot and resolve incidents across applications, services, and infrastructure.
- Perform impact assessments and root cause analysis.
- Implement remediation and escalate complex issues to engineering teams.
- Follow ITIL processes: incident, change, release, and problem management.
- Support disaster recovery, patching, and automation tasks.
- Assist with onboarding and connectivity for new hotel properties.
- Coordinate technical releases and contribute to project work.
Qualifications & Requirements:
- Minimum of 4 years IT/IS experience.
- 2+ years in incident management and infrastructure monitoring.
- Hands-on Linux administration experience.
- Familiarity with AWS services (EC2, CloudWatch, Lambda, VPC, Logs, etc.).
- Strong problem-solving and analytical skills.
- Self-driven with a passion for learning and innovation.
- Effective communicator and team collaborator.