Description

Key Responsibilities

  • Lead and mentor a team of data engineers in designing, building, and maintaining scalable data pipelines.
  • Develop, optimize, and maintain ETL/ELT processes using PySpark and Apache Spark.
  • Architect and implement Databricks-based solutions for big data processing and analytics.
  • Work with cloud platforms (AWS, Azure, or GCP) to build robust, scalable, and cost-effective data solutions.
  • Collaborate with data scientists, analysts, and business stakeholders to understand data needs and deliver high-quality solutions.
  • Ensure data security, governance, and compliance with industry standards.
  • Optimize data processing performance and troubleshoot data pipeline issues.
  • Drive best practices in data engineering, including CI/CD, automation, and monitoring.

Required Qualifications

  • Bachelors or Master’s degree in Computer Science, Engineering, or a related field.
  • 6+ years of experience in data engineering with a focus on big data technologies.
  • Strong expertise in PySpark, Apache Spark, and Databricks.
  • Hands-on experience with cloud platforms such as AWS (Glue, EMR, Redshift), Azure (Data Factory, Synapse, Databricks), or GCP (BigQuery, Dataflow).
  • Proficiency in SQL, Python, and Scala for data processing.
  • Experience in building scalable ETL/ELT data pipelines.
  • Knowledge of CI/CD for data pipelines and automation tools.
  • Strong understanding of data governance, security, and compliance.
  • Experience in leading and mentoring data engineering teams.

Preferred Qualifications

  • Experience with Kafka, Airflow, or other data orchestration tools.
  • Knowledge of machine learning model deployment in a big data environment.
  • Familiarity with containerization (Docker, Kubernetes).
  • Certifications in cloud technologies (AWS, Azure, or GCP)

Education

Bachelor's or Master's degrees