- Databricks Platform: Act as a subject matter expert for the Databricks platform within the Digital Capital team, provide technical guidance, best practices, and innovative solutions.
- Databricks Workflows and Orchestration: Design and implement complex data pipelines using Azure Data Factory or Qlik replicate.
- End-to-End Data Pipeline Development: Design, develop, and implement highly scalable and efficient ETL/ELT processes using Databricks notebooks (Python/Spark or SQL) and other Databricks-native tools.
- Delta Lake Expertise: Utilize Delta Lake for building reliable data lake architecture, implementing ACID transactions, schema enforcement, time travel, and optimizing data storage for performance.
- Spark Optimization: Optimize Spark jobs and queries for performance and cost efficiency within the Databricks environment. Demonstrate a deep understanding of Spark architecture, partitioning, caching, and shuffle operations.
- Data Governance and Security: Implement and enforce data governance policies, access controls, and security measures within the Databricks environment using Unity Catalog and other Databricks security features
- Collaborative Development: Work closely with data scientists, data analysts, and business stakeholders to understand data requirements and translate them into Databricks based data solutions.
- Monitoring and Troubleshooting: Establish and maintain monitoring, alerting, and logging for Databricks jobs and clusters, proactively identifying and resolving data pipeline issues.
- Code Quality and Best Practices: Champion best practices for Databricks development, including version control (Git), code reviews, testing frameworks, and documentation
- Performance Tuning: Continuously identify and implement performance improvements for existing Databricks data pipelines and data models.
- Cloud Integration: Experience integrating Databricks with other cloud services (e.g., Azure Data Lake Storage Gen2, Azure Synapse Analytics, Azure Key Vault) for a seamless data ecosystem.
- Traditional Data Warehousing & SQL: Design, develop, and maintain schemas and ETL processes for traditional enterprise data warehouses. Demonstrate expert-level proficiency in SQL for complex data manipulation, querying, and optimization within relational database systems
Qualifications:
- Bachelor's degree in Computer Science, Engineering, Information Technology, or a related quantitative field.
- Minimum of 6+ years of relevant experience in data engineering, with a significant portion dedicated to building and managing data solutions.
- Demonstrable expert-level proficiency with Databricks, including:
- Extensive experience with Spark (PySpark, Spark SQL) for large-scale data processing.
- Deep understanding and practical application of Delta Lake.
- Hands-on experience with Databricks Notebooks, Jobs, and Workflows. o Experience with Unity Catalog for data governance and security.
- Proficiency in optimizing Databricks cluster configurations and Spark job performance.
- Strong programming skills in Python.
- Expert-level SQL proficiency with a strong understanding of relational databases, data warehousing concepts, and data modeling techniques (e.g., Kimball, Inmon).
- Solid understanding of relational and NoSQL databases.
- Experience with cloud platforms (preferably Azure, but AWS or GCP with Databricks experience is also valuable).
- Excellent problem-solving, analytical, and communication skills.
- Ability to work independently and collaboratively in a fast-paced environment
Mandatory Skills
- Azure Databricks
- Azure Data Factory (ADF)
- GITHub
- SQL
- PySpark
- CI/CD
- Data Modelling - 3NF and Dimensional
- Lakehouse/Medallion Architecture
- ADLS Gen2
Nice to have skills
- Scala
- Qlik Replicate
- Talend Data Integration
- AWS Aurora Postgres/Redshift/S3
- DevOps
- Agile
- Service Now
Preferred Qualifications:
- Databricks Certifications (e.g., Databricks Certified Data Engineer Associate/Professional).
- Experience with CI/CD pipelines for data engineering projects