Description

Key Responsibilities

  • Design, implement, and maintain Data Lakehouse solutions, integrating structured and unstructured data sources.
  • Develop scalable ETL/ELT pipelines using tools like Apache Iceberg, Trino, Apache Spark, Delta Lake, Databricks, or Snowflake.
  • Optimize data storage formats and query performance across large datasets.
  • Implement security and compliance best practices in data management (role-based access control, data masking, etc.).
  • Collaborate with cloud and DevOps teams to support data infrastructure automation and monitoring.

Required Skills & Qualifications

  • 8+ years of hands-on experience with Apache Iceberg, Trino, Databricks, Delta Lake, or Snowflake.
  • Proficiency in Apache Spark, Python/Scala, and SQL
  • Strong working experience of data modeling, data partitioning, and performance tuning.
  • Familiarity with data governance, data lineage, and metadata management tools.
  • Experience working in Agile/Scrum teams.
  • Work with structured and semi-structured data stored in object storage systems like S3, GCS.
  • Experience with Apache Iceberg, SQL, and Python.
  • Familiarity with data orchestration tools like Apache Airflow.
  • Must be eligible for up to a Top Secret Security Clearance

Education

Any Gradute