Description

We are looking for a Senior Data Engineer to design, build, and optimize large-scale data processing systems supporting healthcare analytics and operational reporting. This role will involve working closely with DataOps, DevOps, and QA teams to enable scalable and reliable data pipelines.


 

Key Responsibilities:

Design and implement ETL/ELT pipelines using Python and PySpark Develop scalable data workflows using Apache Spark and AWS Glue Collaborate with QA and DevOps to integrate CI/CD and testing automation Manage data lake structures and ensure data quality, lineage, and auditability Optimize and monitor performance of batch and streaming pipelines Build infrastructure as code (IaC) using tools like Terraform, GitHub Actions Work across structured, semi-structured, and unstructured healthcare datasets


 

Required Technical Skills:

Core & Deep Knowledge Assessment: Python PySpark SQL (including Window functions and CASE) AWS Glue, S3, Lambda Apache Spark Apache Airflow Delta Lake/Data Lakehouse Architecture CI/CD (Terraform, GitHub Actions) ETL/ELT pipeline design and optimization Basic Overall Knowledge Assessment: Kafka Data modeling and normalization Unix/Linux Infrastructure as Code (IaC) Cloud storage, IAM, and networking fundamentals (AWS) Git version control Healthcare data domain knowledge

Education

Any Gradute