- Experience software development/ software engineering experience
- 5+ years with Python
- Experience with Data Build Tool (DBT)
- Knowledge of Data Lakehouse technologies Apache Iceberg or Delta Lake
- Working with S3 object storage
Nice to Haves:
- Python UI development experience, Dash
- Dremio experience
- Kubernetes/AWS EKS
- AWS Cloud experience
Responsibilities include:
- Design and implement reliable data pipelines to integrate disparate data sources into a single Data Lakehouse
- Design and implement data quality pipelines to ensure data correctness and building trusted datasets
- Design and implement a Data Lakehouse solution which accurately reflects business operations
- Assist with data platform performance tuning and physical data model support including partitioning and compaction
- Provide guidance in data visualizations and reporting efforts to ensure solutions are aligned to business objectives
The successful candidate will meet the following qualifications:
- 5+ years of experience as a Data Engineer designing and maintaining data pipeline architectures
- 5+ years of programming experience in Python and ANSI SQL
- 2+ years of development experience with DBT, Data Build Tool
- Experience in various data modelling methods such as Star Schema, Snowflake, and Data Vault design
- Experience in implementing a Data Lakehouse using a Medallion Architecture with Apache Iceberg on S3 Object Storage
- Experience in various data integration patterns including ELT, Pub/Sub, and Change Data Capture
- Experience with common Python Data Engineering packages including Pandas, Numpy, Pyarrow Pytest, Scikit-Learn, and Boto3
- Excellent communication skills with experience presenting complex concepts to technical and non-technical stakeholders
- Experience in software development practices such as Design Principles and Patterns, Testing, Refactoring, CI/CD, and version control
- Experience with Dremio, Apache Airflow, and Airbyte is preferred