Description

The Observability Platform team is building a state of the art system for logging, motoring, and tracing across cloud and on-prem data centers. We’re looking for an experienced Senior DevOps engineer to lead our Logging and Monitoring, ensuring robust, scalable solutions within our Google Cloud Platform. In this role, you will be helping to bring systems to life that give superpowers to an entire organization of software developers.

You will:

  • Lead the planning, execution, and manage our observability infrastructure, which processes trillions of observability events (logs, traces, metrics) daily.
  • Create and manage monitoring, logging and alerting systems utilizing various technologies such as GrafanaLab, CaptainHook, Zabbix, fluentd, filebeat, ELK, Kafka, Prometheus, OpenTelemetry and other related tools.
  • Design and develop parts of a highly scalable software observability platform which manages trillions of observability events (logs, traces, metrics) per day.
  • Develop and maintain Kubernetes Helm charts that deploy hundreds of pods across nodes every day.
  • Collaborate closely with DevOps teams in delivering cloud solutions aligned with our observability platform.
  • Ensure high availability and performance of observability platforms and tools.
  • Design and develop end-to-end Synthetic Tests Monitoring solutions on GCP. with self-service capabilities for engineering teams.
  • Participate in on-call rotations.

You have:

  • Bachelor's degree in Computer Science, Engineering, or related work experience.
  • 3+ years as DevOps Engineer (or equal role) with a passion for technology and strong motivation and responsibility for high reliability and service level
  • Proficient in Kubernetes and containerization technologies (, etc.)
  • Extensive experience with observability tools such as GrafanaLab, CaptainHook, Zabbix, Fluentd, ELK, Kafka, andDocker Prometheus.
  • Familiarity with infrastructure as code (IaC) tools like Terraform, Ansible, or CloudFormation.
  • Experience with cloud platforms (AWS, Azure, GCP) and their services related to computing, storage, and networking - preferred GCP.
  • Strong programming skills in one or more languages (Bash, Python, Go, etc.).
  • The ideal candidate will have experience with OpenTelemetry Collector and Grafana Agent.

Education

Bachelor's degree