Role Overview
We are seeking a highly skilled Solution Architect with extensive experience in the healthcare domain and expertise in designing and implementing data solutions on the GCP data stack. This role involves leading the development of a data lake for a hospital organization, integrating data from on-premises and cloud sources. The candidate must have a strong understanding of healthcare codes, terminologies (e.g., ICD, CPT, LOINC), and data standards like HL7 and FHIR. They will be responsible for designing the destination data model, building ingestion and transformation pipelines, and creating cloud-native infrastructure using Terraform.
Key Responsibilities
1. Solution Architecture:
· Design the architecture for a robust and scalable data lake on GCP.
· Develop a detailed and efficient data model tailored for healthcare use cases, including clinical and operational analytics.
· Architect pipelines for batch and streaming data ingestion and transformation.
2. Healthcare Data Expertise:
· Ensure seamless integration of healthcare data standards such as HL7 and FHIR.
· Work with healthcare terminologies and coding systems, including ICD, CPT, and LOINC, ensuring accurate representation and mapping of clinical data.
· Address healthcare-specific challenges, such as patient data privacy and regulatory compliance (HIPAA).
3. Infrastructure Development:
· Design and implement infrastructure for DataStream, Dataflow, BigQuery, Cloud Composer, and Cloud Storage using Terraform.
· Ensure the infrastructure is optimized for scalability, security, and cost-efficiency.
4. Pipeline and Workflow Design:
· Develop and implement Dataflow pipelines for data transformation.
· Use DataStream for real-time data replication from on-premises systems to GCP.
· Orchestrate workflows using Cloud Composer and Apache Airflow.
· Leverage Dataform for scheduled data transformations and query automation.
5. Collaboration and Leadership:
· Collaborate with hospital stakeholders, clinicians, and IT teams to understand requirements and propose tailored solutions.
· Guide data engineers and developers in implementing best practices for data modeling and pipeline development.
6. Documentation and Governance:
· Document architectural designs, data models, and pipeline workflows.
· Define governance policies to ensure compliance with healthcare regulations and maintain data integrity.
Required Skills and Qualifications
Healthcare Domain Expertise:
· Strong knowledge of healthcare data standards, including HL7, FHIR, and terminologies like ICD, CPT, and LOINC.
· Experience working with hospital data, including clinical, operational, and financial datasets.
· Familiarity with healthcare privacy regulations such as HIPAA.
GCP Expertise:
· Proficiency in BigQuery, DataStream, Dataflow, Cloud Composer, and Cloud Storage.
· Experience in designing scalable, secure solutions on the GCP platform.
Data Engineering and Modeling:
· Strong expertise in creating and managing data models for healthcare use cases.
· Proven ability to design data lakes and implement streaming and batch pipelines.
Terraform and Automation:
· Hands-on experience with Terraform for provisioning cloud infrastructure.
· Understanding of Infrastructure as Code (IaC) principles.
Programming and Tools:
· Proficiency in Python and SQL for data processing and orchestration.
· Experience with workflow management tools like Apache Airflow.
Soft Skills:
· Strong communication skills to engage with clinical and technical teams effectively.
· Leadership capabilities to guide teams and drive project success
Any Graduate