Description

  • Design, build, and maintain reliable, high-volume data pipelines and ETL processes.
  • Develop and optimize complex SQL queries for analytics, reporting, and application support.
  • Implement robust data architectures using star schemas, fact tables, and dimension tables.
  • Process massive datasets efficiently using Apache Spark and the Hadoop ecosystem.
  • Build and manage real-time streaming pipelines using Kafka.
  • Leverage AWS services — S3, EMR, Redshift, DynamoDB — for storage and processing.
  • Automate workflows with bash scripting.
  • Use Python to support various data engineering tasks as needed.
  • Collaborate with analysts, engineers, and stakeholders to deliver secure, reliable data solutions.
  • (Nice to have) Work with Databricks and delta tables for advanced analytics.

     

Must-Have Skills — No Exceptions

Candidates must have ALL of the following:

 

  • Minimum of 7 years of professional data engineering experience.
  • U.S. Citizenship and proof you currently hold or have previously held a minimum of a Public Trust clearance.
  • Commitment to work onsite in Ashburn, VA 2–3 days per week (non-negotiable).
  • Strong Java programming experience in production.
  • Advanced SQL skills with a track record of performance tuning on large datasets.
  • Proficiency in bash scripting for automation.
  • Extensive AWS experience — S3, EMR, Redshift, DynamoDB.
  • Strong understanding of star schemas and data warehousing best practices.
  • Proven experience with Apache Spark and the Hadoop ecosystem.
  • Hands-on experience with Kafka for real-time streaming pipelines.
  • Demonstrated success building and maintaining production-grade data pipelines.
  • Proficiency with Python.
  • B.S. or M.S. in Computer Science, Engineering, or related field

Education

Any Gradute