Description

  • Build and maintain secure, scalable infrastructure for ML model training, testing, and deployment using open-source tools.
  • Create reusable deployment templates that standardize the path to production across teams.
  • Translate prototype models into resilient, monitored, and observable production systems.
  • Implement guardrails and controls that ensure compliance with internal standards (e.g., SR 11-7, ISO 42001).
  • Partner with data scientists to simplify onboarding to platform capabilities.
  • Establish CI/CD pipelines with hooks for testing, scanning, and validation of model code and artifacts.
  • Serve as a technical lead for cross-functional delivery efforts involving model onboarding and platform integration.

Required Qualifications

  • 6+ years of experience in software, data, or ML engineering roles.
  • Strong hands-on experience with tools like MLflow, Metaflow, Airflow, or similar orchestration frameworks.
  • Production experience with Kubernetes, Docker, and Helm.
  • Deep understanding of Python and software engineering best practices.
  • Experience implementing CI/CD pipelines and infrastructure-as-code in a cloud or hybrid environment.

Preferred Qualifications

  • Experience working in regulated industries or environments with strong risk and compliance expectations.
  • Familiarity with open-source model monitoring, drift detection, or lineage tools (e.g., Evidently AI, Feast, LakeFS).
  • Hands-on experience serving models using KServe, Ray Serve, or Triton Inference Server.
  • Familiarity with enterprise security tools like Trivy, Aqua, or Snyk for code and container scanning.
  • Exposure to LLM/RAG architecture or GenAI platform integration

Education

Any Gradute