Description

  • Design and develop ETL/data pipelines using Databricks and Apache Spark.
  • Optimize and manage Spark-based workloads for scalability and performance.
  • Work with structured and unstructured data from multiple sources.
  • Implement Delta Lake for data reliability and ACID transactions and Unity Catalog for centralized governance and cataloging service in Databricks.
  • Develop and maintain SQL-based transformations, queries, and performance tuning.
  • Collaborate with data engineers, analysts, and business teams to meet data requirements.
  • Implement job orchestration using Airflow, Databricks Workflows, or other scheduling tools.
  • Ensure data security, governance, and compliance best practices.
  • Monitor, debug, and resolve performance bottlenecks in Databricks jobs.
  • Work with cloud storage solutions (AWS S3).

Qualifications:

  • 4 to 6 years Data Engineer

Required Skills & Experience:

  • Strong experience in Databricks and AWS.
  • Hands-on experience with Python, PySpark, Scala, SQL.
  • Experience with Spark performance tuning and optimization.
  • Knowledge of Delta Lake, Lakehouse Architecture, and Medallion Architecture.
  • Familiarity with orchestration tools (Airflow, Databricks Workflows).
  • Hands-on experience with data modeling and transformation techniques.
  • Experience with cloud platform AWS.
  • Proficiency in CI/CD for Databricks using Git, DevOps tools.
  • Strong understanding of data security, governance, and access control in Databricks.
  • Good knowledge of APIs, REST services, and integrating Databricks with external systems.

Preferred Qualifications:

  • Databricks Certification (Databricks Certified Associate/Professional).
  • Knowledge of BI tools like Power BI, Looker, ThoughtSpot.
  • Bachelor's or Master's degree in Computer Science, Information Systems, Engineering or equivalent

Education

Bachelor's or Master's degrees