Description

Qualifications

· MUST have work experience with Datadog or alternative products like Dynatrace set up in critical production environments.

· Has experience working with AWS-hosted applications and services

· Experience using observability dashboards particularly centered around APM

· Experience with Ansible automation in production and non-production environments

· Working knowledge of coding .NET applications and log frameworks

 

Core Capabilities:

· Expert level knowledge on Datadog integration with agents as well as APM and RUM

· Ability to convert existing ElasticSearch Grok patterns and filters to Datadog and set up a new forwarder

· Proficient in AWS, particularly CloudWatch, CloudTrail and EC2

· Ability to deploy Datadog agents across on-prem and cloud-hosted instances

· Understand how to build observability into monolithic applications to expose telemetry via the OpenTelemetry SDK or Zipkin traces

· Understand the customer service requirement to define and create service level objectives (SLOs) as well as corresponding dashboards

· Strong knowledge of Ansible and Powershell to automate the deployment of Datadog agents

· Experienced in running R&D labs as well as incubating new solutions

· Strong grasp of 4 Golden Signals, MELT, and RED approaches to observability

· Creating actionable dashboards and associated alerts based on thresholds using Datadog

Ability to write IaC using Terraform or Packer

 

Qualification:

· Datadog Fundamentals certification or certification from any other product vendor

· AWS associate-level certifications are a bonus, but not mandatory

 

Role & Responsibilities:

· Design the entire observability solution using Datadog for a .NET application and its infrastructure

· Conduct R&D activities in the form of labs on Datadog RUM, APM setup, and Log consolidation

· Build integrations for observability into on-prem and cloud-hosted applications using Datadog and ensure the deployment as well as the continuous running of agents

· Instrument and expose traces from monolithic .NET applications

· Connecting telemetry data from cloud and on-prem applications into a single source of truth

· Direct liaison with the application architecture/design team to accept guidance around the application and ongoing migration to AWS

· Contribute towards improving the new logging standards set out by the design team based on feedback after integration with Datadog and alignment to observability standards

· Document the observability solution, guidance for new agent deployment, dashboard amendments, and alert modifications

· Train a new team and hand over in-life maintenance of the Datadog solution built

· Test the entire solution for proper data flow and visibility of telemetry using the dashboards as well as common observability schema

· Perform an analysis of the uplifted ecosystem and suggest any improvements to the application/deployment architecture for self-healing capability via AIOps

  • · Build a fault-tolerant solution to automatically deploy new agents for any new instances provisioned

Education

Any Graduate