Description

Serve as a primary point responsible for the security, health, performance, and capacity of our business systems 

Develop tools to optimize our ability to rapidly deploy and effectively monitor our application stack. 

Update / Define SLAs aligned to our Service Model. 

Lead Incident Response for live issues , identify areas of enhancement in tools and software. 

Work closely with software engineers to ensure our applications are designed with "operability" and scalability in mind 

Develop, implement and maintain highly scalable Kubernetes-based infrastructure to meet our business needs and managed via IaC tools such as Terraform (Infrastructure as Code), Helm, Kustomize and AWS Cloud Development Kit + Typescript language (for SaaS infrastructure) 

We are looking for: 

Prior experience in an enterprise-facing technical operations role. 

Monitoring & Alerts definition, ownership and setup. 

Experience with cloud technologies such as AWS 

5 to 8 years supporting deployments and managing AWS 

Strong troubleshooting skills that span systems, network, and code 

Demonstrated programming skills in one or more of: Python, NodeJS, Java, C, Shell 

Experience with AWS Inspector, AWS Detective, Lacework, or similar security tools is nice to have 

5+ years in a UNIX-based operations role 

Deep UNIX/Linux systems knowledge and/or systems administration background managing large business critical deployments 

Ability to work in the office

Education

Any Gradute