Job Description:
We are looking for a talented and driven Cloud Security Engineer – SRE to join our dynamic team.
The ideal candidate will bring a strong foundation in systems administration, cloud technologies, and infrastructure as code, with a focus on solution engineering and site reliability.
In this role, you will collaborate with cross-functional teams to strengthen our security posture and optimize operations through automation.
- Programming and Scripting: Strong proficiency in languages like Python, Go, Bash, or Ruby. SREs often need to write automation scripts and build tooling.
- Systems Administration: Deep understanding of operating systems (Linux/Unix), file systems, processes, and system configurations.
- Infrastructure as Code (IaC): Experience with IaC tools like Terraform, Ansible, or Chef to manage infrastructure.
- Cloud Computing: Knowledge of cloud platforms such as AWS, Azure, or Google Cloud Platform, including services like EC2, S3, Kubernetes, and serverless functions.
- Containers and Orchestration: Expertise in containerization (Docker) and container orchestration (Kubernetes, OpenShift).
- Networking: Understanding of networking concepts, including DNS, firewalls, load balancing, and VPNs.
- Monitoring and Observability: Experience with monitoring and observability tools like Prometheus, Grafana, Datadog, or New Relic. Ability to set up and maintain monitoring dashboards, alerts, and logs.
- Continuous Integration/Continuous Deployment (CI/CD): Familiarity with CI/CD tools like Jenkins, GitLab CI, GitHub Actions, or CircleCI. A strong understanding of HashiCorp Vault and Terraform will make you stand out.
- Incident Management: Ability to manage and respond to incidents, perform root cause analysis, and implement post-mortem reviews.
- Automation: Focus on automating repetitive tasks to improve efficiency and reduce human error.
- Performance Tuning: Skills in identifying and resolving performance bottlenecks in systems and applications.
- Documentation: Skill in creating clear and comprehensive documentation for systems, processes, and incident reports.
- Reliability and Scalability Service-Level Objectives (SLOs) and Service-Level Agreements (SLAs): Understanding of setting, monitoring, and maintaining SLOs and SLAs for system reliability.
- Scalability: Knowledge of best practices for designing and scaling systems to handle increased loads and demands.
- Redundancy and Resilience: Experience in designing systems with redundancy and fault tolerance to minimize downtime.
- Security and Compliance Security Best Practices: Understanding of security principles, such as access control, data encryption, and secure coding practices.
- Compliance: Familiarity with compliance standards like GDPR, HIPAA, or PCI-DSS, depending on the industry.
Minimum Job Qualifications:
Bachelor’s degree in business or equivalent work experience
10 years of previous program leadership and/or relevant consulting experience
Knowledge of and demonstrated experience in program management framework, knowledge groups & life cycle