Description

  • Serve as Lead Systems Engineer specializing in Datadog within AWS cloud environments.
  • Oversee backend administration, engineering support, design, and implementation of Datadog-based monitoring and observability solutions.
  • Ensure high availability, performance, and reliability of cloud-based services and infrastructure.
  • Architect, deploy, and manage AWS resources (EC2, RDS, Lambda, ECS/EKS, S3, etc.) following best practices for scalability, security, and cost optimization.
  • Define and enforce monitoring strategies, dashboards, alerts, custom metrics, and best practices using Datadog.
  • Lead integration of Datadog with ServiceNow for automated incident management, event correlation, and CMDB synchronization.
  • Provide technical leadership, mentorship, and guidance to junior engineers and cross-functional teams.
  • Collaborate with IT and engineering teams to integrate monitoring/logging into CI/CD pipelines and cloud infrastructure.
  • Drive continuous improvement in system reliability (including SLO/SLI definitions, synthetic monitoring, and anomaly detection).
  • Contribute to and enforce Infrastructure as Code (IaC) standards using tools like Terraform or CloudFormation.
  • Participate in high-severity incident management, root cause analysis, and implementation of corrective actions.
  • Work remotely, with possible occasional meetings; must be located in approved states.
  • Report to and support health insurance customers; must have relevant industry experience

Education

Any Gradute