Summary:
12+ years experienced lead level candidates. Recent experience (multiple years) in healthcare domain.
Work in an agile environment with a development team to:
Design, develop, and maintain scalable data pipelines using software development patterns.
Implement data processing solutions using Python and PySpark on a cloud-native Lakehouse data platform.
Write efficient SQL queries to extract, transform, and load data.
Collaborate with product management and analysts to understand data requirements and deliver solutions.
Optimize and troubleshoot data pipelines for performance and reliability.
Ensure data quality and integrity through comprehensive testing and validation processes.
Follow DevOps principles and use CI/CD to deploy and operate data pipelines.
Required Skills:
Proficiency in Python and PySpark.
Strong experience with SQL and database management.
Knowledge of software development patterns and best practices in data engineering.
Experience with ETL/ELT processes and data pipeline orchestration.
Proficiency developing using version control, automated testing, and deployments using git-based tools like GitHub and GitHub Actions.
Bachelor's degree in Computer Science