Description

Job Duties:

  • Design, develop, and optimize scalable, robust ETL data pipelines using Azure Databricks and Python to enable efficient extraction, transformation, and loading of large-scale structured and semi-structured datasets from various sources into cloud-based data lakes and analytics platforms
  • Build and maintain automated, modular ETL workflows within Azure to support batch and near real-time data integration, leveraging technologies such as Azure Data Factory, Databricks notebooks, and Delta Lake for high-performance data processing and storage.
  • Implement data solutions aligned with enterprise data strategies.
  • Develop and implement best practices for data ingestion and processing from external systems, including integration with SAP systems using middleware solutions such as MuleSoft for seamless data transfer and harmonization.
  • Design and optimize backend data lake architectures to support analytics and reporting requirements, ensuring data lineage, traceability, and reusability across multiple downstream applications.
  • Implement scalable data transformation and automation processes in Databricks using PySpark and SQL, ensuring performance optimization through effective use of cluster configurations, partitioning strategies, and caching techniques.
  • Utilize Azure services including Azure Data Lake Storage, Azure Data Factory, Azure Event Hubs, and Azure Key Vault in combination with Databricks to create secure, scalable, and compliant data ecosystems.
  • Ensure data security and privacy compliance, particularly with PHI/PII in healthcare datasets, by applying techniques such as encryption, tokenization, and role-based access control within Azure and Databricks.
  • Establish monitoring, alerting, and error-handling mechanisms using tools like Azure Monitor, Log Analytics, and custom Python scripts to track pipeline performance, failures, and metrics in real-time.
  • Participate in code reviews, architectural discussions, and agile ceremonies, contributing to technical decision-making, performance tuning, and continuous improvement of data engineering practices and processes.

All the responsibilities mentioned above are in line with the professional background and requires an absolute minimum of a Bachelor’s degree in computer science, computer information systems, information technology, or a combination of education and experience equating to the U.S. equivalent of a Bachelor’s degree in one of the aforementioned subjects.

Education

Any Graduate