Skills And Key Responsibilities:
- Design and implement scalable, fault-tolerant streaming data platforms using Apache Kafka and Apache Flink.
- Lead architectural decisions and establish best practices for real-time data processing and delivery.
- Develop and maintain self-service infrastructure patterns and tools to enable internal teams to consume, process, and produce streaming data efficiently.
- Optimize performance, reliability, and observability within a Kubernetes-based environment.
- Drive infrastructure-as-code practices and automate deployment workflows using tools such as Terraform, Helm, and CI/CD pipelines.
- Collaborate with data and engineering teams to support analytics, machine learning, and operational use cases.
- Champion platform reliability, scalability, and cost-efficiency across public cloud platforms (AWS, GCP, or Azure).
- Mentor junior engineers and contribute to shaping the platform’s technical roadmap.
Required Qualifications:
- 7+ years of backend/platform engineering experience, with a strong focus on distributed systems.
- Deep expertise in Apache Kafka (Kafka Streams, Connect) and Apache Flink (DataStream API, state management, CEP, etc.).
- Hands-on experience running and managing workloads in Kubernetes.
- Solid experience with cloud-native technologies and services in AWS, Google Cloud, or Azure.
- Strong programming skills in Java, Scala, or Python.
- Proficiency with observability tools such as Prometheus, Grafana, and Open Telemetry, and strong debugging skills for distributed systems.
- Familiarity with infrastructure-as-code tools like Terraform, Pulumi, or similar.
- Excellent communication skills with the ability to drive technical initiatives across teams