Description

Key Skills: Python, SQL, PySpark, Databricks, AWS, Data Pipeline, Data Integration, Airflow, Delta Lake, Redshift, S3, Data Security, Cloud Platforms, Life Sciences.

Roles & Responsibilities:

  • Develop and maintain robust, scalable data pipelines for ingesting, transforming, and optimizing large datasets from diverse sources.
  • Integrate multi-source data into performant, query-optimized formats such as Delta Lake, Redshift, and S3.
  • Tune data processing jobs and storage layers to ensure cost efficiency and high throughput.
  • Automate data workflows using orchestration tools like Airflow and Databricks APIs for ingestion, transformation, and reporting.
  • Implement data validation and quality checks to ensure reliable and accurate data.
  • Manage and optimize AWS and Databricks infrastructure to support scalable data operations.
  • Lead cloud platform migrations and upgrades, transitioning legacy systems to modern, cloud-native solutions.
  • Enforce security best practices, ensuring compliance with regulatory standards such as IAM and data encryption.
  • Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders to deliver data solutions.

Experience Requirement:

  • 4-6 years of hands-on experience in data engineering with expertise in Python, SQL, PySpark, Databricks, and AWS.
  • Strong background in designing and building data pipelines, and optimizing data storage and processing.
  • Proficiency in using cloud services such as AWS (S3, Redshift, Lambda) for building scalable data solutions.
  • Hands-on experience with containerized environments and orchestration tools like Airflow for automating data workflows.
  • Expertise in data migration strategies and transitioning legacy data systems to modern cloud platforms.
  • Experience with performance tuning, cost optimization, and lifecycle management of cloud data solutions.
  • Familiarity with regulatory compliance (GDPR, HIPAA) and security practices (IAM, encryption).
  • Experience in the Life Sciences or Pharma domain is highly preferred, with an understanding of industry-specific data requirements.
  • Strong problem-solving abilities with a focus on delivering high-quality data solutions that meet business needs.

Education: Any Graduation

Education

Any Graduate