Key Skills: ITIL, ITSM, Production Support, Linux, UNIX.
Roles & Responsibilities:
- Manage production incidents to resolution in a 24/7/365 environment, utilizing incident management processes and keeping management informed of status, impact, and resolution actions.
- Lead and guide incident triage calls from a technical perspective, analyzing infrastructure and application components using event monitoring solutions like APM.
- Influence technical teams during calls and articulate troubleshooting steps effectively.
- Conduct technical follow-up calls for high-profile incidents.
- Ensure proper functional and management escalation as per standards and procedures.
- Follow up on items that may negatively impact production operations, assist with post-mortem activities, and support operational improvements.
- Implement new and improved processes based on management recommendations, create reports, and address ad-hoc requests.
- Analyze infrastructure and application components during incident triage calls.
- Communicate effectively with all management levels, translating technical issues into non-technical terms, and manage large conference calls during incidents.
- Hands-on experience with ServiceNow or other ticketing tools is required.
Experience Requirement:
- 6 - 9 years of experience in incident management and ITSM practices.
- Proven track record of managing critical production incidents in a high-pressure 24/7 environment.
- Strong technical acumen in infrastructure and applications, with the ability to lead troubleshooting calls effectively.
- Experience working with cross-functional technical teams and coordinating escalations efficiently.
- Background in production support and working knowledge of Linux/UNIX systems is preferred.
Education: Any Graduation