Description
We are looking for an experienced Kubernetes Operator to join our team. The ideal candidate will have expertise in Kubernetes operator-based applications, deep knowledge of CRD-based deployments, and the ability to optimize and troubleshoot complex cloud-native environments. The SRE will be responsible for ensuring high availability, performance, and scalability of our infrastructure while working closely with development and operations teams.
Job Responsibilities:
- Kubernetes Operator Expertise – Deploy, manage, and maintain Kubernetes operator-based applications in cloud and on-prem environments.
- CRD-Based Deployments – Implement and troubleshoot Custom Resource Definition (CRD)-based deployments to enhance automation and operational efficiency.
- Region Awareness & Pod Topology Spread Constraints – Configure Kubernetes workloads with pod topology spread constraints to achieve high availability and fault tolerance across multiple regions.
- Node Affinity & Scheduling Policies – Apply node selector and affinity rules to optimize pod scheduling and resource allocation across nodes.
- Cluster Deployment & Upgrades – Troubleshoot and optimize cluster deployments, operator installations, and rolling updates to ensure smooth and reliable system upgrades.
- Incident Management & Troubleshooting – Diagnose and resolve infrastructure and application issues by analyzing logs, metrics, and alerts.
- Customer Support & Ticket Handling – Work on customer tickets, provide effective solutions, and collaborate with development teams to resolve issues efficiently.
- Application Monitoring & Optimization – Utilize monitoring tools to analyse application performance and implement improvements.
- Documentation & Knowledge Sharing – Create and maintain technical documentation, troubleshooting guides, and best practices for internal teams and customers.
- Automation & CI/CD Integration – Improve deployment efficiency by implementing automation, Infrastructure as Code (IaC), and CI/CD pipelines using tools.
Requirements
Must Have Skills:
- Education: B.Tech in computer engineering, Information Technology, or related field.
- Experience: 5+ years of Experience with Kubernetes Operator Expertise. Having in depth knowledge on deploy, manage, maintain and pod topology.
- CRD-Based Deployments: 3+ Years of in-depth experience to implement and trouble shoot CRD.
- Application Monitoring & Optimization: 3+ Years of experience in using tools such as Grafana, Prometheus
- Terraform or Helm: 2+ years of experience in using terraform or Helm for infrastructure Automation & CI/CD Integration.
- Bash, Python, or Golang: 2+ years of experience and in depth understanding of scripting tools.
Nice-to-Have Skills:
- CAK certification will be good to have.
- Familiarity with incident response and disaster recovery planning.
- Strong understanding of container security and best practices for securing Kubernetes workloads.
- Experience working with log aggregation tools like ELK Stack