Job Description
About the Role:
We are looking for a skilled Python Data Engineer with expertise in Test-Driven Development (TDD) to build, optimize, and maintain scalable data pipelines. The ideal candidate will have strong experience in Python, SQL, data engineering workflows, and automated testing to ensure high-quality, reliable data solutions.
Key Responsibilities
- Design, develop, and maintain ETL/ELT pipelines using Python and SQL.
- Implement TDD practices, writing unit tests before developing data pipelines.
- Develop scalable, efficient, and reusable data processing solutions.
- Work with data warehouses (e.g., Azure Databricks) and databases (PostgreSQL, MySQL).
- Ensure data quality, validation, and integrity through rigorous testing.
- Collaborate with data scientists, analysts, and engineers to define data requirements.
- Automate data workflows using orchestration tools (Airflow, Prefect, Dagster).
- Optimize data pipelines for performance, reliability, and scalability.
- Document data models, pipelines, and test cases for maintainability.
Required Skills & Qualifications
- 8+ years of experience in data engineering.
- Strong proficiency in Python and SQL.
- Expertise in Test-Driven Development (TDD) using pytest, unittest, or other frameworks.
- Hands-on experience with ETL/ELT processes and data pipeline automation.
- Experience working with data warehouses (e.g., Azure Databricks).
- Familiarity with data modeling principles (e.g., star schema, normalization).
- Experience with version control (Git) and CI/CD practices.
- Strong problem-solving skills and ability to optimize database performance.
Preferred Qualifications
- Experience with orchestration tools (Airflow, Prefect, Dagster).
- Knowledge of cloud platforms (Azure).
- Exposure to containerization tools (Docker, Kubernetes).
- Understanding of data governance, security, and compliance.