We are seeking a highly skilled Senior Data Engineer to join our team and help build scalable data pipelines, integrate machine learning workflows, and optimize data platforms for actionable insights. This role plays a critical part in enabling data-driven solutions for sustainability initiatives and innovation.
Responsibilities:
- Design, implement, and optimize scalable data pipelines using SQL, Python, and PySpark for efficient data processing.
- Collaborate with data scientists to integrate machine learning models into production pipelines, optimizing for performance.
- Manage and enhance ETL workflows to ensure timely and accurate transformation of raw data into structured formats.
- Work with cloud platforms like AWS, Azure, Databricks, and Snowflake to manage and scale data infrastructure.
- Implement and maintain data orchestration tools like Apache AirFlow to automate ETL processes.
- Utilize Terraform to manage infrastructure as code for scalable cloud solutions.
- Work on data warehousing solutions to optimize storage and retrieval of data.
- Ensure strong data governance practices, including data quality, compliance, and cataloging using tools like Unity Catalog or Hive Metastore.
Primary Skills:
- Strong expertise in SQL and Python for data manipulation and pipeline creation.
- Proficiency in PySpark and hands-on experience with ETL processes.
- Hands-on experience with cloud platforms (AWS, Azure) and big data tools like Databricks and Snowflake.
- Experience with machine learning and AI integration into data pipelines.
- Proficiency with orchestration tools like Apache AirFlow.
- Experience using Terraform for infrastructure as code.
- Expertise in data warehousing concepts and solutions