Serve as a primary point responsible for the security, health, performance, and capacity of our business systems
Develop tools to optimize our ability to rapidly deploy and effectively monitor our application stack.
Update / Define SLAs aligned to our Service Model.
Lead Incident Response for live issues , identify areas of enhancement in tools and software.
Work closely with software engineers to ensure our applications are designed with "operability" and scalability in mind
Develop, implement and maintain highly scalable Kubernetes-based infrastructure to meet our business needs and managed via IaC tools such as Terraform (Infrastructure as Code), Helm, Kustomize and AWS Cloud Development Kit + Typescript language (for SaaS infrastructure)
We are looking for:
Prior experience in an enterprise-facing technical operations role.
Monitoring & Alerts definition, ownership and setup.
Experience with cloud technologies such as AWS
5 to 8 years supporting deployments and managing AWS
Strong troubleshooting skills that span systems, network, and code
Demonstrated programming skills in one or more of: Python, NodeJS, Java, C, Shell
Experience with AWS Inspector, AWS Detective, Lacework, or similar security tools is nice to have
5+ years in a UNIX-based operations role
Deep UNIX/Linux systems knowledge and/or systems administration background managing large business critical deployments
Ability to work in the office
Any Gradute