Description

Responsibilities:

  • Build and maintain CI/CD pipelines for ML model development, testing, and deployment.
  • Develop reusable tools and frameworks for data processing, model training, validation, and monitoring.
  • Collaborate closely with data scientists to operationalize models, ensuring they are scalable, reliable, and reproducible.
  • Manage and optimize compute infrastructure, including cloud and on-prem GPU/CPU clusters.
  • Implement observability and monitoring systems to track model performance, drift, and data integrity in production.
  • Ensure governance and compliance through model versioning, reproducibility, and auditability.

Requirements:

  • 3+ years of experience in ML Engineering, DevOps, or Infrastructure Engineering with a focus on ML workflows.
  • Proficiency with cloud platforms (AWS, Google Cloud Platform, Azure) and orchestration tools (Kubernetes, Airflow, etc.).
  • Experience with MLOps frameworks such as MLflow, Kubeflow, Metaflow, or SageMaker.
  • Strong coding skills in Python and experience with infrastructure-as-code tools (e.g., Terraform, Helm).
  • Solid understanding of CI/CD practices and monitoring tools (e.g., Prometheus, Grafana, Datadog).

Nice to Have:

  • Experience deploying real-time inference services and batch prediction pipelines.
  • Familiarity with model explainability, fairness, and responsible AI practices.
  • Exposure to feature stores (e.g., Feast, Tecton) and experiment tracking platforms

Education

Any Gradute