We are seeking a highly skilled Databricks Pipeline Developer with a strong background in Epic EMR (Electronic Medical Records) integration to design, develop, and optimize data pipelines that support healthcare analytics and operational reporting. The ideal candidate will have hands-on experience working with Databricks, Apache Spark, and be familiar with healthcare datasets, especially those sourced from Epic EMR systems.
This role is ideal for someone who thrives in a data-driven healthcare environment, has a passion for improving patient outcomes through data, and understands the complexities of working with regulated healthcare data.
Key Responsibilities:
Design, develop, and deploy end-to-end data pipelines using Databricks and Apache Spark to extract, transform, and load (ETL) data from Epic EMR and other healthcare systems.
Build reusable, scalable, and secure pipelines to support analytics, dashboards, and real-time reporting.
Collaborate with data engineers, data analysts, clinical informaticists, and other stakeholders to define and refine data requirements.
Create and maintain documentation for pipeline design, data models, data dictionaries, and technical specifications.
Work with FHIR, HL7, Clarity, Caboodle, and other Epic data sources to ensure high-quality data integration.
Optimize performance of data processing jobs and tune Spark clusters in Databricks for efficiency and cost management.
Ensure data governance, privacy, and security requirements are met according to HIPAA and other healthcare regulations.
Participate in Agile development cycles, including story grooming, estimation, development, testing, and deployment.
Required Skills & Experience:
Years
Requirement
5+
Strong experience in building data pipelines using Databricks and Apache Spark
3+
Hands-on experience working with Epic EMR data (Clarity, Caboodle, FHIR APIs, HL7, etc.)
5+
Proficiency in PySpark or Scala, SQL, and notebook development in Databricks
3+
Experience with healthcare data formats, terminology (ICD, CPT, LOINC, etc.), and compliance
2+
Experience with cloud platforms (Azure, AWS, or GCP) for data pipeline deployments
2+
Familiarity with DevOps practices, CI/CD, and version control (Git, Azure DevOps)
--
Strong understanding of ETL frameworks, data modeling, and performance optimization
Preferred Qualifications:
Epic certification (Clarity, Caboodle, or Bridges) is a strong plus
Experience with Delta Lake, Unity Catalog, or Databricks SQL
Knowledge of real-time streaming data pipelines using Structured Streaming
Experience with Databricks Lakehouse architecture
Familiarity with dbt (data build tool) or other transformation frameworks
Prior experience working in a HIPAA-regulated environment