We are seeking a skilled Data Engineer proficient in Python, SQL, and Azure Databricks to design, develop, and maintain scalable data pipelines and ETL processes.
The ideal candidate will work closely with cross-functional teams to ensure high-quality, efficient data integration and transformation solutions within a cloud environment.
This role demands strong problem-solving skills, a solid understanding of data governance, and hands-on experience with Azure cloud services.
Key Responsibilities:
- Design, develop, and maintain scalable data pipelines and ETL processes using Azure Databricks, Data Factory, and other Azure services.
- Implement and optimize Spark jobs, data transformations, and workflows within Databricks.
- Develop and maintain data models and data dictionaries using Python and SQL.
- Develop and maintain data quality checks, governance policies, and security procedures.
- Design and create ETL processes to supply data to various destinations, including data warehouses.
- Integrate data from various sources into Azure Databricks.
- Collaborate with data engineers, data scientists, and analysts to ensure data quality and consistency.
- Implement monitoring processes to track performance and optimize workflows.
- Contribute to the design and implementation of data lakehouse solutions using Databricks.
Required Qualifications:
- Proficiency with Azure Databricks, including PySpark and Spark.
- Strong programming skills in Python, SQL, and Scala.
- Solid understanding of ETL processes, data warehousing concepts, and data modeling.
- Experience working with cloud platforms, particularly Microsoft Azure.
- Proven experience in data engineering, including data pipelines and data integration.
- Knowledge of data governance policies and procedures.
- Excellent problem-solving and debugging skills.
- Strong communication and teamwork skills.
Preferred Qualifications:
- Experience with Azure Data Factory and other Azure analytics services.
- Familiarity with DevOps tools and CI/CD pipelines for data workflows.
- Exposure to big data technologies such as Hadoop or Kafka.
- Experience with containerization tools like Docker and orchestration with Kubernetes.
- Knowledge of machine learning pipelines and integration with data engineering workflows.
- Prior experience working in Agile or Scrum environments