Description

  • Design and develop ETL/ELT data pipelines using Azure Databricks and PySpark to ingest, transform, and load data from diverse sources.
  • Write efficient and optimized SQL and Advanced SQL queries for data transformation, aggregation, and analysis.
  • Work with structured and unstructured data stored in Azure Data Lake (ADLS Gen2) and integrate it with Azure services like Azure SQL DB, Synapse Analytics, and Data Factory.
  • Ensure data quality, consistency, and reliability across development, testing, and production environments.
  • Collaborate with data architects, analysts, and stakeholders to translate business requirements into technical specifications.
  • Optimize pipeline performance and troubleshoot data issues, bottlenecks, and failures.
  • Implement best practices for data governance, security, and performance tuning.
  • Maintain documentation for data pipelines, workflows, and schemas.


 

Required Skills:

  • 6+ years of hands-on experience with Azure Databricks and PySpark.
  • Strong knowledge of SQL and Advanced SQL techniques (window functions, CTEs, indexing, partitioning, etc.).
  • Solid experience with Azure Data Services including Data Lake Storage Gen2, Data Factory, Synapse Analytics, and Azure SQL.
  • Experience with performance tuning and data quality frameworks in cloud-based data pipelines.
  • Good understanding of data modeling, big data architectures, and DevOps practices in the Azure ecosystem.
  • Familiarity with Git, CI/CD pipelines, and Agile methodologies

Education

Any Gradute