Description

About the Role

We are seeking a highly skilled Databricks Architect with deep expertise in designing and building real-time, high-throughput data pipelines in Databricks across Azure, AWS, and GCP. The ideal candidate will have strong hands-on experience in Scala, Apache Spark, and deep familiarity with modern data lake architectures. Experience working with data requirements from financial institutions—especially regarding security, governance, and performance—is highly preferred.

Key Responsibilities

  • Design and implement real-time streaming and batch data pipelines using Databricks + Spark (Scala preferred)
  • Architect and optimize data lakehouse platforms on Azure (primary), AWS, and GCP
  • Collaborate with data engineering, analytics, DevOps, and governance teams to ensure secure and performant data architectures
  • Lead end-to-end architecture for ingestion, transformation, enrichment, and publishing of large volumes of structured and unstructured data
  • Design for low-latency processing, auto-scaling, and fault tolerance for high-throughput pipelines
  • Drive infrastructure-as-code adoption (Terraform, CI/CD) to automate deployments and platform provisioning
  • Apply best practices for data quality, lineage, governance (Unity Catalog / Purview), and security (RBAC, encryption, private networking)
  • Interface with financial clients to understand regulatory constraints, auditability, and secure access needs
  • Mentor and guide engineering teams on best practices for Databricks, Spark, and real-time data solutions

Required Skills and Experience

Core Technical Expertise:

  • Databricks (Azure, AWS, GCP) – deep understanding of cluster management, workspace security, Delta Lake, Unity Catalog
  • Scala (Strong proficiency) – for high-performance data processing jobs
  • Apache Spark (Structured Streaming, Core APIs)
  • Real-time Data Pipelines – using Kafka, Delta Live Tables, or Apache Flink
  • Cloud Platforms:
  • Azure: ADF, Synapse, Event Hubs, Azure Data Lake Gen2, Azure Key Vault, Private Link, Azure Monitor
  • AWS: S3, Glue, Kinesis, Redshift, IAM, CloudWatch
  • GCP: GCS, BigQuery, Dataflow, Pub/Sub

Data Engineering & Orchestration:

  • Delta Lake architecture (ACID transactions, schema evolution, time travel)
  • Data modeling (Star, Snowflake, Data Vault)
  • Orchestration with Azure Data Factory, Airflow, dbt, or Dagster
  • Proficiency in SQL, Python for scripting and notebooks

Security and Governance:

  • Role-based access control (RBAC), token authentication
  • Data encryption (in transit and at rest), network security (Private Link, VNet/Security Groups)
  • Unity Catalog, Azure Purview, or Alation for data governance

DevOps & Infra:

  • CI/CD pipelines using GitHub Actions, Azure DevOps, Jenkins
  • Infrastructure-as-Code (IaC) using Terraform, Pulumi, or ARM templates
  • Monitoring with Datadog, Azure Monitor, or Prometheus/Grafana

Preferred Qualifications

  • Experience working with financial institutions or in regulated environments
  • Strong understanding of data compliance (GDPR, CCPA, SOX, PCI, etc.)
  • Familiarity with machine learning pipelines on Databricks
  • Knowledge of event-driven architectures and microservices

Soft Skills

  • Excellent communication and stakeholder management skills
  • Leadership experience in mentoring and guiding engineering teams
  • Ability to translate complex requirements into scalable designs
  • Strong documentation and solutioning mindset

Education

Bachelor's or Master’s degree in Computer Science, Engineering, or related field

Certifications (Preferred)

  • Databricks Certified Data Engineer Professional
  • Azure Solutions Architect / AWS Certified Solutions Architect / GCP Cloud Architect
  • Terraform Associate / HashiCorp Certified

 

Education

Any Graduate