Serve as Lead Systems Engineer specializing in Datadog within an AWS environment for CareFirst BCBS.
Oversee architecture, design, and implementation of end-to-end monitoring and observability solutions using Datadog.
Provide backend administration and engineering support for the Datadog tool, including dashboards, monitors, custom metrics, and integrations.
Manage deployment, configuration, and optimization of AWS resources (EC2, RDS, Lambda, ECS/EKS, S3, etc.) with a focus on scalability, security, and cost efficiency.
Define monitoring strategies and best practices for cloud infrastructure and applications.
Architect and manage integration of Datadog with ServiceNow for automated incident management, event correlation, and CMDB synchronization.
Lead and mentor junior engineers in monitoring, logging, and observability best practices.
Collaborate with cross-functional teams to integrate monitoring and logging into CI/CD pipelines.
Drive continuous improvement in system reliability (SLO/SLI definitions, synthetic monitoring, anomaly detection).
Contribute to Infrastructure as Code (IaC) standards using tools like Terraform or CloudFormation.
Participate in high-severity incident management and root cause analysis, implementing corrective actions.
Requires 5+ years of AWS cloud experience and 3+ years of hands-on Datadog backend administration experience.
Requires strong scripting/automation (Python, Bash), cloud infrastructure, and troubleshooting skills.