Design and develop ETL/ELT data pipelines using Azure Databricks and PySpark to ingest, transform, and load data from diverse sources.
Write efficient and optimized SQL and Advanced SQL queries for data transformation, aggregation, and analysis.
Work with structured and unstructured data stored in Azure Data Lake (ADLS Gen2) and integrate it with Azure services like Azure SQL DB, Synapse Analytics, and Data Factory.
Ensure data quality, consistency, and reliability across development, testing, and production environments.
Collaborate with data architects, analysts, and stakeholders to translate business requirements into technical specifications.
Optimize pipeline performance and troubleshoot data issues, bottlenecks, and failures.
Implement best practices for data governance, security, and performance tuning.
Maintain documentation for data pipelines, workflows, and schemas.
Required Skills:
6+ years of hands-on experience with Azure Databricks and PySpark.
Strong knowledge of SQL and Advanced SQL techniques (window functions, CTEs, indexing, partitioning, etc.).
Solid experience with Azure Data Services including Data Lake Storage Gen2, Data Factory, Synapse Analytics, and Azure SQL.
Experience with performance tuning and data quality frameworks in cloud-based data pipelines.
Good understanding of data modeling, big data architectures, and DevOps practices in the Azure ecosystem.
Familiarity with Git, CI/CD pipelines, and Agile methodologies