Senior Site Reliability/DevOps Engineer

AutoRABIT
San Francisco, CA, USA

Description

Broadly, Site Reliability or DevSecOps engineer with a passion for, automation, reliability, scalability, monitoring, and capacity planning. But you have the breadth of knowledge necessary to support a wide variety of software and systems.

Contribute to the development and maintenance of frameworks for monitoring, automation and code to increase the scalability and reliability of the service

Assist both internal and customer facing teams with deployment of new software releases, VPN and other related security infrastructure interfacing.

Assist with resolution of AutoRABIT service or customer issues as required

Participate in and practice sustainable incident response and blameless postmortems

Contribute to the automation of manual tasks, such as the provisioning of users in production and test environments.

Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration

Work within a small agile team to develop and improve SRE software, support your peers, plan and self-improve

Participate in a regular on-call or rotational schedule needed to support AutoRABIT servers, including weekends and holidays
Responsibility to adhere to set internal controls

Design, implement, and maintain scalable, resilient, and secure infrastructure using AWS.

Develop and manage infrastructure as code using Terraform.

Implement and manage CI/CD pipelines to automate deployments and ensure smooth delivery of applications.

Monitor system performance, identify bottlenecks, and implement solutions to improve reliability and performance.

Troubleshoot, resolve, and perform RCAs for incidents, while ensuring minimal disruption to services.

Collaborate with development teams to ensure applications are designed for reliability and performance.

Working Experience with Shell Scripting (Bash), Python or equivalent is required

Good Knowledge of programming languages such as Python, Go, or Java.

Working Experience with configuration management tools such as Ansible or Chef.

Implement and maintain monitoring, logging, and alerting systems to ensure the health and performance of our infrastructure.

Ensure security best practices are followed and compliance requirements are met.

Responsibility to adhere to set internal controls.

Can-do attitude: challenging status, leading, and contributing to key improvements and innovations, while maintaining accountability

Excellent written and verbal US English communication skills for working across a global team environment

Bachelors in Computer Science, Engineering, or equivalent degree or experience

5+ years of experience in site reliability engineering, DevOps, or a related field.

AWS, GCP and/or Azure Certified

3+ Years of Kubernetes experience

3+ years' experience managing Linux-based systems in a public cloud such as AWS, GCP, or Azure

3+ years of experience with systems monitoring and logging; knowledge of ELK is preferred

Solid understanding of standard TCP/IP networking and common protocols like DNS, load balancers, HTTP, etc.

Key Skills

Aws Terraform Ci/cd Pipelines Python Go Java Ansible Chef Bash

Education

Any Graduate

Apply Now

Back To Jobs

Posted On: 15+ Days Ago
Experience: 5+ years of experience
Availability: Remote
Openings: 1
Category: Senior Site Reliability/DevOps Engineer
Tenure: Flexible Position