What you’ll do
Develop and maintain observability using AWS/GCP tools and Datadog.
Keep monitoring tool software currency up to date across Cloud/Legacy landscape
Keep Engineering updated with logging/tracing standards
Good knowledge of Splunk or other logging tools like ELK stack
Have good understanding of Application Performance Management
Implement best practices for observability, including metrics, logging, and tracing.
Collaborate with engineering and operations teams to troubleshoot and resolve performance issues.
Automate observability processes and integrate them into CI/CD pipelines.
Analyze and interpret monitoring data to provide actionable insights and recommendations.
Stay updated with the latest advancements in GCP and Datadog to continuously improve our observability capabilities
Good knowledge of linux/windows environment
Work in Scaled Agile Framework
Solve problems and triage complex distributed architecture service maps. On call for high severity application incidents and improving run books to improve MTTR
Lead availability blameless postmortem and own the call to action to remediate recurrences
What experience you need
BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job
experience required
7-10 years experience with monitoring tools Google/AWS Cloud Monitoring, Appdynamics, DataDog, Splunk , Elastic Search or similar
7+ years’ experience in system support, coding or operations.
Hands-on experience with Windows/Linux environments
Excellent problem-solving and communication skills
Provide step-by-step technical help, both written and verbal
Experience in languages such as Python, Bash, Java, Go JavaScript and/or node.js
Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
System administration skills, including automation and orchestration of Linux/Windows using Terraform, Chef, Ansibleand/or containers (Docker, Kubernetes, etc.)
Proficiency with continuous integration and continuous delivery tooling and practices
Cloud Certification Strongly Preferred
What could set you apart
You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
Experience managing Infrastructure as code via tools such as Terraform or CloudFormation
Passion for automation with a desire to eliminate toil whenever possible
You’ve built software or maintained systems in a highly secure, regulated or compliant industry
Experience and passion for working within a DevOps culture and as part of a team
Proficiency with continuous integration and continuous delivery tooling and practices
Any Graduate