Description

Responsibilities
System Monitoring and Maintenance:

Continuously monitor system performance and availability.
Identify and resolve issues proactively to ensure system reliability.
Incident Management:

Respond to and resolve incidents promptly.
Conduct post-incident reviews to prevent future occurrences.
Automation and Tooling:

Develop and maintain automation tools to streamline operations.
Implement automated solutions for repetitive tasks.
Performance Optimization:

Analyze system performance metrics.
Optimize system performance to meet service-level objectives.
Qualifications
Proficiency in Unix systems, including shell scripting and Python programming.
Strong SQL skills, with experience in both transactional and analytical (big data) databases.
Familiarity with monitoring tools such as AppDynamics, Splunk, Prometheus/Grafana, and Moogsoft.
Experience with service management tools like ServiceNow and Remedy.
Knowledge of cloud technologies and AI is a plus.
Highly important: Demonstrates ownership and the ability to work independently, with a problem-solving mindset rather than just following orders. Must be intellectually curious and committed to continuous learning.
 

Education

Any Graduate