Daily sanity/Health check/Monitoring for high availability of Java.
Monitoring Alerts from Enterprise Monitoring Operations , action on valid alerts , supress false alert and work on removing unwanted alerts
Observability -proactive, investigating system component interactions and identifying root causes of problems based on alert analysis
Incident Resolution arising from Operation Command Center Alerts
Work Orders - Resolve Work orders
Change Implementation - Deploying the application related artifacts to the production environments in the slotted approved release window
Raise new change tickets and arrange for approvals, including CAB approvals
Work with Development / Testing team for defect analysis (with Production simulated data)
Build automation scripts that reduce the number of Incidents and/or improves processes followed
Reporting the issues with the deployments and coordinating with the Development Teams to fix any deployment issue
Qualifications:
7 to 10 years of expereince
Primary Skill :
Java - Application Support+ PL/SQL +Linux
Technical proficiency in troubleshooting Java and Unix Shell Scripting, SQL Queries , networking, and cloud technologies
Application/Production Support for Web-based Java.
Must understand the different stages - Build, Unit Test, Functional Test, Sonar Scan, Deployment, Artifactory Upload
Excellent analytical and problem-solving skills, with the ability to diagnose and resolve complex issues quickly
Knowledge in building CI/CD, familiar using DevOps tools like Jenkins, Git & Maven
Familiarity with monitoring and logging tools such as Splunk , Dynatrace, Prometheus, Grafana etc.
Must have created application deployment pipeline using Jenkins.
ITSM Process knowledge is essential – Incident, Change and Problem Management
Experience in configuration management tools like Chef/Ansible
Good to know skill -PCF Cloud knowledge
Excellent verbal and written communication skills.
Ability to convey technical information to non-technical stakeholders and collaborate effectively within a team
Exhibit SRE culture explore all means to improve reliability of the application by reviewing & managing all change request applied to application meeting SLO objective