Description

  • Lead and manage L1/L2 production support activities for critical applications and systems.
  • Monitor key dashboards and proactively alert relevant teams of potential production issues.
  • Perform root cause analysis (RCA) and assist in debugging production problems.
  • Represent SRE (Site Reliability Engineering) in client and daily standup calls, providing updates on tickets and issues.
  • Work with tools such as GCP, Kubernetes, Dynatrace, and log monitoring solutions (Splunk, Sumo Logic).
  • Manage and update JIRA tickets with findings and resolutions.
  • Create alerts and dashboards, especially for new features or changes.
  • Support onsite teams by supplying data from various monitoring and diagnostic tools.
  • Ensure adherence to SOPs and processes for incident management and alert handling.
  • Coordinate with external vendors for integration issues and track front-end site metrics using appropriate tools.
  • Experience in support projects is required; knowledge of log monitoring tools is a plus

Education

Any Gradute