Design, implement, and maintain complex data systems supporting millions of customers with Cloud Native principles and best practices to ensure highly available, secure, performant and scalable database systems
" Build and maintain CI/CD pipelines in Jenkins
" Build and deploy services in Kubernetes cluster using helm, kustomize, etc
" Contribute to infrastructure changes to AWS with deep understanding of AWS services
" Engage in on-call for pre-production and production systems supporting multi-million users
" Write/Review RCA docs to prevent recurrence of Incidents in future and share the learnings
" Contribute to major system upgrades, deployment automation, monitoring enhancements and Production changes
" Create operational playbooks, contribute to how-to articles, and gain domain knowledge to drive changes in the team
" Participate and contribute in FMEA/Chaos testing, Security remediations, etc
" Share best practices and patterns for operational excellence and cost optimization
" Reduce or eliminate manual steps by automating as much as possible
" Continuously look for opportunities to increase developer velocity and productivity
Qualifications
" Bachelor's or master's degree in computer science or a related technical field. Equivalent experience will be considered
" 4+ years of hands-on development & operational experience with building and maintaining infrastructure in AWS
" Extensive performance monitoring, troubleshooting & tuning experience
" Experience with AWS services and hands-on knowledge of hosting on Cloud
" Experience with scripting languages for DevOps automation
" Experience with any one of the programming languages: Java/Python/Ruby
" Knowledge of Docker & Kubernetes, ArgoCD,
" Experience with monitoring and observability using Splunk, Wavefront, AppDynamics, Prometheus, Tracing, etc
Desired Skills and Experience
SITE RELIABILITY ENGINEER, JENKINS, KUBERNETES, AWS, PERFORMANCE MONITORING, TUNING, SPLUNK, WAVEFRONT, APPDYNAMICS
Any Graduate