We are looking for a highly skilled Data Engineer with deep expertise in Azure Databricks, PySpark, and data warehouse design. The ideal candidate will have strong experience in building scalable data pipelines, managing data lakes, and implementing medallion architecture (Bronze, Silver, Gold). You will play a key role in architecting, developing, and optimizing our data platforms while ensuring high performance, security, and governance.
Key Responsibilities
- Data Warehouse Design – Architect and implement scalable data warehouses on Azure Databricks using data and dimensional modeling techniques to support analytics and reporting.
- Pipeline Development – Design, develop, and optimize ETL/ELT pipelines using Python, PySpark, and Databricks for large-scale data processing.
- Medallion Architecture – Establish best practices for ingestion, transformation, and storage following the Bronze, Silver, Gold architecture.
- Scalable Data Applications – Build and deploy distributed data applications on Azure Databricks for performance and reliability.
- Performance Optimization – Optimize Databricks clusters, jobs, and ETL workflows for efficiency and scalability.
- Data Storage & Governance – Manage Azure Data Lake Storage (ADLS) and Delta Lake with Unity Catalog for data governance, security, and compliance.
- Automation & Scheduling – Develop and schedule Databricks notebooks/jobs, with monitoring, alerting, and automated recovery for failures.
- Code Optimization – Troubleshoot, resolve bottlenecks, and follow best coding practices to improve performance and maintainability.
- Version Control & DevOps – Use GitHub for version control and implement CI/CD pipelines for deployment/testing using Azure DevOps.
- Documentation – Create comprehensive documentation for data architecture, pipelines, and business logic.
Required Skills
- 8+ years of experience as a Data Engineer with strong expertise in Azure Databricks & PySpark.
- Proven experience in ETL/ELT pipeline development and large-scale data processing.
- Strong understanding of Medallion architecture (Bronze, Silver, Gold).
- Hands-on with ADLS, Delta Lake, and Unity Catalog.
- Strong knowledge of Python, Spark optimization, and distributed computing.
- Experience in cluster optimization and performance tuning.
- Proficiency in GitHub, CI/CD pipelines, and Azure DevOps.
- Excellent problem-solving skills with the ability to troubleshoot and optimize code.
- Strong communication and documentation skills