We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have extensive experience in maintaining production systems, particularly in cloud environments such as AWS or GCP. You will be responsible for ensuring the reliability, availability, and performance of our services while collaborating with development teams to enhance our infrastructure and deployment processes.
Responsibilities
Maintain and optimize production systems on AWS and/or GCP.
Develop and implement automation scripts using Python, Shell, or other scripting languages.
Manage and troubleshoot Kubernetes clusters and containerized applications.
Monitor system performance and reliability using tools like Prometheus and Grafana.
Collaborate with development teams to improve application performance and reliability.
Implement infrastructure as code using tools such as Terraform or CloudFormation.
Participate in incident response and post mortem analysis to improve system reliability.
Stay updated with industry trends and best practices in site reliability engineering.
Mandatory Skills
Strong experience with Linux operating systems.
Proficient in AWS or GCP cloud platforms.
Hands on experience with Kubernetes for managing containerized applications.
Proficient in Python, Linux/Unix Shell Scripting, or any other scripting language.
Experience in maintaining production systems as an SRE.
Preferred Skills
Familiarity with monitoring solutions such as AWS CloudWatch, Stackdriver, Prometheus, and Grafana.
Experience with continuous integration tools like Jenkins, Travis CI, or CircleCI.
Knowledge of Kafka, Spark, Storm, Cassandra, AWS Elasticsearch Service, PostgreSQL, Redis, and Nginx.
Understanding of logging service solutions.
Experience with infrastructure as code tools (e.g., Terraform, CloudFormation).
Qualifications
Bachelor's degree in Computer Science, Engineering, or a related field.
8 to 15 years of relevant experience in Site Reliability Engineering or a similar role.
Strong problem solving skills and the ability to work under pressure.
Excellent communication and collaboration skills
Bachelor's degree