Role Description
We are seeking an experienced MLOps Engineer with 4–5years of hands-on experience in deploying, monitoring, and maintaining machine learning models in production. You will work closely with data scientists, ML engineers, DevOps teams, and clients to ensure scalable, reliable, and efficient ML systems.
Key Responsibilities:
Collaborate with data scientists to operationalize ML models and build CI/CD pipelines for ML workflows.
Design, build, and maintain scalable model deployment architectures using Docker, Kubernetes, and infrastructure-as-code tools.
Monitor model performance and data drift using tools like Prometheus, Grafana, or custom solutions.
Automate data pipelines and model retraining workflows using orchestration tools (Airflow, Kubeflow, Prefect, etc.).
Implement and maintain version control for data, models, and code using DVC, MLflow, Git, etc.
Ensure reproducibility, scalability, and robustness of ML workflows.
Maintain security, compliance, and documentation standards across ML systems.
Provision and manage infrastructure using Terraform and AWS CloudFormation.
Conduct internal and external trainings, build mockups, deliver presentations, and lead workshops related to MLOps workflows.
Communicate and coordinate directly with clients, especially during Pacific Standard Time (PST) business hours.
Required Skills:
Languages & Scripting: Python (mandatory), Bash/Shell scripting
Cloud Platforms: Experience with at least one—AWS, Azure, or GCP
Infrastructure as Code (IaC): Hands-on experience with Terraform and CloudFormation
Containerization & Orchestration: Docker, Kubernetes
CI/CD Tools: Jenkins, GitHub Actions, GitLab CI/CD, or similar
ML Lifecycle Tools: MLflow, DVC, Kubeflow, SageMaker, or similar
Workflow Orchestration: Apache Airflow, Prefect, etc.
Monitoring & Logging: Prometheus, Grafana, ELK stack, or similar
Collaboration: Proven experience working cross-functionally with data scientists and DevOps teams
Client Engagement: Experience in client communication and coordination, especially across time zones (PST preferred)
Presentation & Training: Ability to deliver technical content through workshops, demos, mockups, and training sessions
Preferred Qualifications:
Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or a related field.
Experience with large-scale data systems and distributed computing.
Exposure to model governance, auditing, and responsible AI practices.
Bachelor’s or Master’s degree in Computer Science