Description

We are looking for an experienced Data Engineer to provide data engineering expertise and support to various analytical products of LivePerson, and assist in migrating our existing data processing ecosystem from Hadoop (Spark, MapReduce, Java, and Scala) to Databricks on GCP. The goal is to leverage Databricks’ scalability, performance, and ease of use to enhance our current workflows.

You will:

  • Assessment and Planning:
    • Review the existing Hadoop infrastructure, including Spark and MapReduce jobs.
    • Analyze Java and Scala codebases for compatibility with Databricks.
    • Identify dependencies, libraries, and configurations that may require modification.
    • Propose a migration plan with clear timelines and milestones.
  • Code Migration:
    • Refactor Spark jobs to run efficiently on Databricks.
    • Migrate MapReduce jobs where applicable or rewrite them using Spark DataFrame/Dataset API.
    • Update Java and Scala code to comply with Databricks' runtime environment.
  • Testing and Validation:
    • Develop unit and integration tests to ensure parity between the existing and new systems.
    • Compare performance metrics before and after migration.
    • Implement error handling and logging consistent with best practices in Databricks.
  • Optimization and Performance Tuning:
    • Fine-tune Spark configurations for performance improvements on Databricks.
    • Optimize data ingestion and transformation processes.
  • Deployment and Documentation:
    • Deploy migrated jobs to production in Databricks.
    • Document changes, configurations, and processes thoroughly.
    • Provide knowledge transfer to internal teams if required.

Required skills:

  • 6+ years of experience in Data Engineering with focus on building data pipelines, data platforms and ETL (Extract, transform, Load) processes on Hadoop and Databricks.
  • Strong Expertise in Databricks (Spark on Databricks, Delta Lake, etc.) preferably on GCP.
  • Strong expertise in the Hadoop ecosystem (Spark, MapReduce, HDFS) with solid foundations of Spark and its internals.
  • Proficiency in Scala and Java.
  • Strong SQL knowledge.
  • Strong understanding of data engineering and optimization techniques.
  • Solid understanding on Data governance, Data modeling and enterprise scale data lakehouse platform
  • Experience with test frameworks like Great Expectations

Minimum Qualifications :

  • Bachelor's degree in Computer Science or a related field
  • Certified Databricks Engineer- Preferred

You should be an expert in:

  • Databricks with spark and its internals (3 years) - MUST
  • Data engineering in Hadoop ecosystem (5 years) - MUST
  • Scala and Java (5 years) - MUST
  • SQL - MUST


 

Education

Any Graduate