Key Responsibilities
- Design, implement, and maintain Data Lakehouse solutions, integrating structured and unstructured data sources.
- Develop scalable ETL/ELT pipelines using tools like Apache Iceberg, Trino, Apache Spark, Delta Lake, Databricks, or Snowflake.
- Optimize data storage formats and query performance across large datasets.
- Implement security and compliance best practices in data management (role-based access control, data masking, etc.).
- Collaborate with cloud and DevOps teams to support data infrastructure automation and monitoring.
Required Skills & Qualifications
- 8+ years of hands-on experience with Apache Iceberg, Trino, Databricks, Delta Lake, or Snowflake.
- Proficiency in Apache Spark, Python/Scala, and SQL
- Strong working experience of data modeling, data partitioning, and performance tuning.
- Familiarity with data governance, data lineage, and metadata management tools.
- Experience working in Agile/Scrum teams.
- Work with structured and semi-structured data stored in object storage systems like S3, GCS.
- Experience with Apache Iceberg, SQL, and Python.
- Familiarity with data orchestration tools like Apache Airflow.
- Must be eligible for up to a Top Secret Security Clearance