- Design and develop ETL/data pipelines using Databricks and Apache Spark.
- Optimize and manage Spark-based workloads for scalability and performance.
- Work with structured and unstructured data from multiple sources.
- Implement Delta Lake for data reliability and ACID transactions and Unity Catalog for centralized governance and cataloging service in Databricks.
- Develop and maintain SQL-based transformations, queries, and performance tuning.
- Collaborate with data engineers, analysts, and business teams to meet data requirements.
- Implement job orchestration using Airflow, Databricks Workflows, or other scheduling tools.
- Ensure data security, governance, and compliance best practices.
- Monitor, debug, and resolve performance bottlenecks in Databricks jobs.
- Work with cloud storage solutions (AWS S3).
Qualifications:
- 4 to 6 years Data Engineer
Required Skills & Experience:
- Strong experience in Databricks and AWS.
- Hands-on experience with Python, PySpark, Scala, SQL.
- Experience with Spark performance tuning and optimization.
- Knowledge of Delta Lake, Lakehouse Architecture, and Medallion Architecture.
- Familiarity with orchestration tools (Airflow, Databricks Workflows).
- Hands-on experience with data modeling and transformation techniques.
- Experience with cloud platform AWS.
- Proficiency in CI/CD for Databricks using Git, DevOps tools.
- Strong understanding of data security, governance, and access control in Databricks.
- Good knowledge of APIs, REST services, and integrating Databricks with external systems.
Preferred Qualifications:
- Databricks Certification (Databricks Certified Associate/Professional).
- Knowledge of BI tools like Power BI, Looker, ThoughtSpot.
- Bachelor's or Master's degree in Computer Science, Information Systems, Engineering or equivalent