Description

  • Serve as Lead Systems Engineer specializing in Datadog within an AWS environment for CareFirst BCBS.
  • Oversee architecture, design, and implementation of end-to-end monitoring and observability solutions using Datadog.
  • Provide backend administration and engineering support for the Datadog tool, including dashboards, monitors, custom metrics, and integrations.
  • Manage deployment, configuration, and optimization of AWS resources (EC2, RDS, Lambda, ECS/EKS, S3, etc.) with a focus on scalability, security, and cost efficiency.
  • Define monitoring strategies and best practices for cloud infrastructure and applications.
  • Architect and manage integration of Datadog with ServiceNow for automated incident management, event correlation, and CMDB synchronization.
  • Lead and mentor junior engineers in monitoring, logging, and observability best practices.
  • Collaborate with cross-functional teams to integrate monitoring and logging into CI/CD pipelines.
  • Drive continuous improvement in system reliability (SLO/SLI definitions, synthetic monitoring, anomaly detection).
  • Contribute to Infrastructure as Code (IaC) standards using tools like Terraform or CloudFormation.
  • Participate in high-severity incident management and root cause analysis, implementing corrective actions.
  • Requires 5+ years of AWS cloud experience and 3+ years of hands-on Datadog backend administration experience.
  • Requires strong scripting/automation (Python, Bash), cloud infrastructure, and troubleshooting skills.
  • Preferred: IaC experience, AWS certifications, Kubernetes monitoring, observability tools (Prometheus, Grafana), ServiceNow ITOM experience/certifications, and ServiceNow-Datadog integration experience.
  • Must have current/recent health insurance industry experience.
  • Remote role with occasional required meetings; candidates must be from approved states

Education

Any Gradute