Job Duties:
- Design, develop, and optimize scalable, robust ETL data pipelines using Azure Databricks and Python to enable efficient extraction, transformation, and loading of large-scale structured and semi-structured datasets from various sources into cloud-based data lakes and analytics platforms
- Build and maintain automated, modular ETL workflows within Azure to support batch and near real-time data integration, leveraging technologies such as Azure Data Factory, Databricks notebooks, and Delta Lake for high-performance data processing and storage.
- Implement data solutions aligned with enterprise data strategies.
- Develop and implement best practices for data ingestion and processing from external systems, including integration with SAP systems using middleware solutions such as MuleSoft for seamless data transfer and harmonization.
- Design and optimize backend data lake architectures to support analytics and reporting requirements, ensuring data lineage, traceability, and reusability across multiple downstream applications.
- Implement scalable data transformation and automation processes in Databricks using PySpark and SQL, ensuring performance optimization through effective use of cluster configurations, partitioning strategies, and caching techniques.
- Utilize Azure services including Azure Data Lake Storage, Azure Data Factory, Azure Event Hubs, and Azure Key Vault in combination with Databricks to create secure, scalable, and compliant data ecosystems.
- Ensure data security and privacy compliance, particularly with PHI/PII in healthcare datasets, by applying techniques such as encryption, tokenization, and role-based access control within Azure and Databricks.
- Establish monitoring, alerting, and error-handling mechanisms using tools like Azure Monitor, Log Analytics, and custom Python scripts to track pipeline performance, failures, and metrics in real-time.
- Participate in code reviews, architectural discussions, and agile ceremonies, contributing to technical decision-making, performance tuning, and continuous improvement of data engineering practices and processes.
All the responsibilities mentioned above are in line with the professional background and requires an absolute minimum of a Bachelor’s degree in computer science, computer information systems, information technology, or a combination of education and experience equating to the U.S. equivalent of a Bachelor’s degree in one of the aforementioned subjects.