The primary function of an Incident Commander is to direct Subject Matter Experts (SMEs) and Service(s) Leaders to restore service as quickly as possible during Major Incidents while keeping accurate and timely data on the progress of such incidents and keeping senior leaders, stakeholders, Customers and end users updated.
Incident Commanders are also responsible for building and evolving the practice of Incident Management across Cloud Success Service Org, using Post Incident Reviews, developing processes, framework and systems to leverage and continual service Improvements globally and mainly work with Oracle Cloud Command Center Team.
The ideal candidate for this engaging and visible technical & situational leadership role would have the experience of Systems/Network Engineering, with at least one Oracle Product knowledge and hands-on experience or have in-depth of Oracle Cloud Infrastructure hand-on experience along with the "wits of a systems and infrastructure whiz".
The candidate would be expected to have at least 5 to 6 years experience and around 2 to 4 years of experience in Incident Commander and/or Incident Scribe role.
Essential Job Functions:
Champion Service reliability and prevention
You will be part of the Cloud Infrastructure team, whose mission is to improve the efficiency of hosting operations and cloud-based business services, in partnership with our service partners.
Outage Resolution
You will be doing initial triage of Events & Incidents raised in the monitoring and do investigation on alert raised and if need will involve next level of team to deep dive and working towards the resolution.
Prevention
Once you have expertly resolved an issue, you will immediately work on how to more quickly resolve the problem next time, with the goal to eventually prevent the problem happening ever again
Qualifications:
Candidate who deliver the role of Incident Commanders are a rare mix of Sysadmins and Application knowledge engineers, have the ability to understand and explain the effect of a complex architecture decisions.
You are driven by professional curiosity and a desire to develop a deep understanding of the services and the technologies they depend upon.
You are passionate about automation and can demonstrate practical knowledge of various aspects of distributed service design, including messaging protocols, caching strategies, persistence technologies, and queuing.
BS or MS in Computer Science, or equivalent is a must
Managing and triaging tickets.
Driving prioritization and execution of work based on impact
Passionate about Cloud, customer focused, have done incident management or problem management and thrive in a dynamic team culture
Experience of driving change within an organization, pushing through resistance and success in adopting new ways of working
Systematic problem-solving approach, strong communication skills, a sense of ownership and drive
a must to have knowledge of standard Internet services, such as DNS, HTTP, etc.
Working knowledge of OCI architecture and components like LBaaS, VCN, Hypervisors etc.., would be an added advantage
Infrastructure Security and Compliance knowledge
Able to work unsupervised, independently and within a global team
Strong leadership skills to direct service teams during Major Incidents
Exceptional written and verbal communication skills with meticulous attention to detail
Willingness to work in 24x7 shifts including on holidays and weekends
Ability to follow standard engineering principles using agile development methodology and Automation practices
Co-ordinate with Functional, Infrastructure, Product Support and clients business units
Strong Technical background with an ability to troubleshoot issues impacting large scale service architectures and application stacks.
They should be able to 'think on their feet' and be able to effectively analyze problems, so good troubleshooting skills with decent communication are the paramount
Resolving Oracle Cloud Services customer outages by identifying, analyzing, and resolving technical problems related to Oracle software systems. Troubleshooting skills are the key for success in this role and Position