Description

What you’ll do

Develop and maintain observability using AWS/GCP tools and Datadog.

Keep monitoring tool software currency up to date across Cloud/Legacy landscape

Keep Engineering updated with logging/tracing standards

Good knowledge of Splunk or other logging tools like ELK stack

Have good understanding of Application Performance Management

Implement best practices for observability, including metrics, logging, and tracing.

Collaborate with engineering and operations teams to troubleshoot and resolve performance issues.

Automate observability processes and integrate them into CI/CD pipelines.

Analyze and interpret monitoring data to provide actionable insights and recommendations.

Stay updated with the latest advancements in GCP and Datadog to continuously improve our observability capabilities

Good knowledge of linux/windows environment

Work in Scaled Agile Framework

Solve problems and triage complex distributed architecture service maps. On call for high severity application incidents and improving run books to improve MTTR

Lead availability blameless postmortem and own the call to action to remediate recurrences

What experience you need

BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job

experience required

7-10 years experience with monitoring tools Google/AWS Cloud Monitoring, Appdynamics, DataDog, Splunk , Elastic Search or similar

7+ years’ experience in system support, coding or operations.

Hands-on experience with Windows/Linux environments

Excellent problem-solving and communication skills

Provide step-by-step technical help, both written and verbal

Experience in languages such as Python, Bash, Java, Go JavaScript and/or node.js

Demonstrable cross-functional knowledge with systems, storage, networking, security and databases

System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansibleand/or containers (Docker, Kubernetes, etc.)

Proficiency with continuous integration and continuous delivery tooling and practices

Cloud Certification Strongly Preferred


What could set you apart

You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive

Experience managing Infrastructure as code via tools such as Terraform or CloudFormation

Passion for automation with a desire to eliminate toil whenever possible

You’ve built software or maintained systems in a highly secure, regulated or compliant industry

Experience and passion for working within a DevOps culture and as part of a team

Proficiency with continuous integration and continuous delivery tooling and practices

Education

Any Graduate