Description

Key Responsibilities
• Design and implement observability strategies using OpenTelemetry for distributed tracing, metrics, and logging.
• Instrument microservices written in Java and Python using Otel SDKs and auto-instrumentation tools.
• Develop and maintain Splunk dashboards, alerts, and reports to provide actionable insights into system performance and reliability.
• Collaborate with development and operations teams to ensure consistent and effective telemetry across services.
• Automate monitoring and alerting pipelines to proactively detect and resolve issues.
• Participate in on-call rotations, incident response, and postmortem analysis to improve system resilience.
• Drive adoption of SRE best practices including SLIs, SLOs and error budgets.
• Continuously evaluate and improve observability tools and practices.

Required Qualifications
• Certifications: Splunk Certified Developer, Admin (At least one of them)
• 3+ years of experience in Splunk development (Create Dashboards, Visualizations, Statistical reports, scheduled searches, alerts, custom applications using Python and knowledge objects)
• Experience with both XML and dashboard studio development is a must
• Expert level knowledge and understanding of Splunk "Search" language and building complex queries
• Implement KV stores, lookups, and data model acceleration to optimize search performance and reporting
• Knowledge of how to customize Dashboards via the simple XML, advanced XML source, JavaScript, CSS, advanced HTML
• Expert-level capabilities with regular expression and statistical functions
• Experience with creating Splunk knowledge objects (field extractions, macros, event types, etc.)
• Strong problem solving, logic, and analytical skills
• Prior experience as web developer using Java, XML, JavaScript, AJAX, or other programming languages is a plus

Education

Any Gradute