Description

Required Skills:


 

  • PySpark, Python (Pandas), and Advanced SQL (Window Functions, CASE)
  • Experience with AWS Lakehouse components (S3, Glue, Redshift, Step Functions; limited Lambda).
  • Ability to develop and maintain automation for data validation, pipeline testing, and transformation logic.
  • Comfortable working without strict Dev/QA separation — able to code, test, and troubleshoot independently.
  • Proficient in handling nested JSON structures, large datasets, and local data processing with Pandas.
  • ETL/ELT testing, data pipeline validation.
  • AWS: S3, Glue, Lambda, Data Lake testing
  • Apache Spark, Kafka, Airflow, and Delta Lake familiarity
  • CI/CD with Terraform, GitHub Actions, and TDD practices
  • Shell scripting, Linux, and multi-threaded debugging
  • Experience with data modeling, streaming/batch architectures, and healthcare data compliance.
  • AWS Services

Education

Any Gradute