Description

We are looking for an experienced ML Data Infrastructure Engineer to build and maintain the data foundations that power machine learning capabilities. This role is focused on designing and developing scalable data pipelines, feature stores, and dataset management systems to support efficient and reproducible ML workflows. The ideal candidate has deep experience with GCP, data processing frameworks, and ML infrastructure components.


 

Job Responsibilities:

  • Design and implement scalable data pipelines for ML training and validation
  • Build and manage feature stores supporting both batch and real-time use cases
  • Develop frameworks for data quality monitoring, validation, and testing
  • Create tools for dataset versioning, lineage tracking, and reproducibility
  • Implement automated data documentation and data discovery solutions
  • Optimize data storage and access patterns for machine learning workflows
  • Collaborate with data scientists to streamline and enhance data preparation processes


 

Required Skills:

  • 7+ years of software engineering experience (3+ years in data infrastructure)
  • Strong experience with Google Cloud Platform (GCP) services:
  • Big Query, Dataflow, Cloud Storage
  • Vertex AI Feature Store
  • Cloud Composer (Airflow), Dataproc
  • Expertise in data processing frameworks: Spark, Beam, or Flink
  • Experience with feature stores like Feast or Tecton
  • Knowledge of data versioning tools and systems
  • Strong proficiency in Python and SQL
  • Experience with data quality frameworks and data testing
  • Familiarity with orchestration tools like Airflow or Dagster

Preferred Skills:

  • Experience with streaming systems (Kafka, Kinesis, Pub/Sub)
  • Familiarity with GCP-specific IAM and security best practices
  • Experience with Cloud Logging and Monitoring for pipeline observability
  • Understanding of CI/CD tools like Cloud Build and Cloud Deploy
  • Familiarity with ML metadata management systems
  • Exposure to data governance and security practices
  • Experience with dbt or similar data transformation tools


 

Education:

Bachelor’s or master’s degree in computer science, Data Engineering, or a related field (or equivalent experience)

 

Education

Bachelor's or Master's degrees