Job Requirements:
• 4-5+ years production level experience with distributed applications at scale in public and/or private cloud
• Experience architecting and implementing large scale Observability platforms
• B.S. degree in Computer Science or related technical field
• Work in a diverse and distributed team environment!
Must Have
• Programming experience with languages like Go, Python, Java; Experience building integrations and applications to large-scale environments.
• Experience with UI technologies like Javascript, React, backstage etc.
• Experience with internally hosted Observability and Tooling systems like Splunk, Prometheus, Github, Jenkins, Artifactory, assisting clients and improving environment performance and stability
• Experience with container platforms like Kubernetes
• Experience designing and implementing systems for fault tolerance, scalability and stability.
• Experience developing, deploying and running distributed applications on cloud platforms. Experience with container and orchestration technologies (Docker, Kubernetes)
• Ensure the highest level of up-time and Quality of Service (QoS) to Client’s customers through operational excellence
• Knowledge in defining service level objectives (SLOs) and service level indicators (SLIs) to represent and measure service quality
• Knowledge of (public and/or private) cloud
• Collaborate with SRE and Engineering/Product teams in driving critical initiatives.
• Experience in solving performance and stability issues using a wide variety of tools
• Exceptional communicator in and across teams, driving projects to completion
• Impacts the organization through contribution to technical direction and strategic decisions.
Good to Have
• Experience with other Observability tooling like Grafana, Cortex, Tempo, Jaeger is helpful
• Experience with Open-Source products/community like Open telemetry
• Familiar with a variety of the cloud security and automation concepts, practices and procedures.
• Promote the DevOps/SRE approach
Any Graduate