Qualifications
· MUST have work experience with Datadog or alternative products like Dynatrace set up in critical production environments.
· Has experience working with AWS-hosted applications and services
· Experience using observability dashboards particularly centered around APM
· Experience with Ansible automation in production and non-production environments
· Working knowledge of coding .NET applications and log frameworks
Core Capabilities:
· Expert level knowledge on Datadog integration with agents as well as APM and RUM
· Ability to convert existing ElasticSearch Grok patterns and filters to Datadog and set up a new forwarder
· Proficient in AWS, particularly CloudWatch, CloudTrail and EC2
· Ability to deploy Datadog agents across on-prem and cloud-hosted instances
· Understand how to build observability into monolithic applications to expose telemetry via the OpenTelemetry SDK or Zipkin traces
· Understand the customer service requirement to define and create service level objectives (SLOs) as well as corresponding dashboards
· Strong knowledge of Ansible and Powershell to automate the deployment of Datadog agents
· Experienced in running R&D labs as well as incubating new solutions
· Strong grasp of 4 Golden Signals, MELT, and RED approaches to observability
· Creating actionable dashboards and associated alerts based on thresholds using Datadog
Ability to write IaC using Terraform or Packer
Qualification:
· Datadog Fundamentals certification or certification from any other product vendor
· AWS associate-level certifications are a bonus, but not mandatory
Role & Responsibilities:
· Design the entire observability solution using Datadog for a .NET application and its infrastructure
· Conduct R&D activities in the form of labs on Datadog RUM, APM setup, and Log consolidation
· Build integrations for observability into on-prem and cloud-hosted applications using Datadog and ensure the deployment as well as the continuous running of agents
· Instrument and expose traces from monolithic .NET applications
· Connecting telemetry data from cloud and on-prem applications into a single source of truth
· Direct liaison with the application architecture/design team to accept guidance around the application and ongoing migration to AWS
· Contribute towards improving the new logging standards set out by the design team based on feedback after integration with Datadog and alignment to observability standards
· Document the observability solution, guidance for new agent deployment, dashboard amendments, and alert modifications
· Train a new team and hand over in-life maintenance of the Datadog solution built
· Test the entire solution for proper data flow and visibility of telemetry using the dashboards as well as common observability schema
· Perform an analysis of the uplifted ecosystem and suggest any improvements to the application/deployment architecture for self-healing capability via AIOps
Any Graduate