Role Overview
Partner with enterprise clients to architect, build, and optimize data‑lakehouse solutions on Databricks. Leverage Java/scala, Apache Spark, and Delta Lake to deliver high‑performance, secure, and cost‑effective pipelines—while leading workshops, troubleshooting in production, and mentoring client teams.
Key Responsibilities
Data Pipeline Architecture: Design and implement end‑to‑end ETL/ELT workflows combining Java/Scala with Spark (batch & streaming).Performance Tuning: Use Spark UI metrics, AQE, partitioning to eliminate bottlenecks and scale to multi‑TB datasets.Delta Lake & Governance: Configure OPTIMIZE, Z‑Ordering, time travel, and Unity Catalog for secure, compliant data access.Client Engagement: Lead discovery sessions, deliver demos/PoCs, and translate business requirements into robust technical solutions.Incident Resolution: Rapidly triage and fix production issues—data skew, executor failures, memory errors—ensuring SLA adherence.
Qualifications
4+ years Java (Spring, Maven/Gradle)5+ years Apache Spark (DataFrame/SQL/RDD) at scale5+ years Databricks (Jobs, Workflows, Unity Catalog)Strong performance‑tuning and Spark‑internals expertiseExcellent communication and client‑facing skills
Certifications: Databricks Certified Professional Data Engineer (mandatory)
Any Graduate