- Design, build, and maintain reliable, high-volume data pipelines and ETL processes.
- Develop and optimize complex SQL queries for analytics, reporting, and application support.
- Implement robust data architectures using star schemas, fact tables, and dimension tables.
- Process massive datasets efficiently using Apache Spark and the Hadoop ecosystem.
- Build and manage real-time streaming pipelines using Kafka.
- Leverage AWS services — S3, EMR, Redshift, DynamoDB — for storage and processing.
- Automate workflows with bash scripting.
- Use Python to support various data engineering tasks as needed.
- Collaborate with analysts, engineers, and stakeholders to deliver secure, reliable data solutions.
- (Nice to have) Work with Databricks and delta tables for advanced analytics.
Must-Have Skills — No Exceptions
Candidates must have ALL of the following:
- Minimum of 7 years of professional data engineering experience.
- U.S. Citizenship and proof you currently hold or have previously held a minimum of a Public Trust clearance.
- Commitment to work onsite in Ashburn, VA 2–3 days per week (non-negotiable).
- Strong Java programming experience in production.
- Advanced SQL skills with a track record of performance tuning on large datasets.
- Proficiency in bash scripting for automation.
- Extensive AWS experience — S3, EMR, Redshift, DynamoDB.
- Strong understanding of star schemas and data warehousing best practices.
- Proven experience with Apache Spark and the Hadoop ecosystem.
- Hands-on experience with Kafka for real-time streaming pipelines.
- Demonstrated success building and maintaining production-grade data pipelines.
- Proficiency with Python.
- B.S. or M.S. in Computer Science, Engineering, or related field