Description

Job Description:

Responsibilities:

  • Design, implement, and maintain CI/CD pipelines for machine learning applications using AWS CodePipeline, CodeCommit, and CodeBuild.
  • Automate the deployment of ML models into production using Amazon SageMaker, Databricks, and MLflow for model versioning, tracking, and lifecycle management.
  • Develop, test, and deploy AWS Lambda functions for triggering model workflows, automating pre/post-processing, and integrating with other AWS services.
  • Maintain and monitor Databricks model serving endpoints, ensuring scalable and low-latency inference workloads.
  • Use Airflow (MWAA) or Databricks Workflows to orchestrate complex, multi-stage ML pipelines, including data ingestion, model training, evaluation, and deployment.
  • Collaborate with Data Scientists and ML Engineers to productionize models and convert notebooks into reproducible and version-controlled ML pipelines.
  • Integrate and automate model monitoring (drift detection, performance logging) and alerting mechanisms using tools like CloudWatch, Prometheus, or Datadog.
  • Optimize compute workloads by managing infrastructure-as-code (IaC) via CloudFormation or Terraform for reproducible, secure deployments across environments.
  • Ensure secure and compliant deployment pipelines using IAM roles, VPC, and secrets management with AWS Secrets Manager or SSM Parameter Store.
  • Champion DevOps best practices across the ML lifecycle, including canary deployments, rollback strategies, and audit logging for model changes.

Minimum Requirements:

  • hands-on experience in MLOps deploying ML applications in production at scale.
  • Proficient in AWS services: SageMaker, Lambda, CodePipeline, CodeCommit, ECR, ECS/Fargate, and CloudWatch.
  • Strong experience with Databricks workflows and Databricks Model Serving, including MLflow for model tracking, packaging, and deployment.
  • Proficient in Python and shell scripting with the ability to containerize applications using Docker.
  • Deep understanding of CI/CD principles for ML, including testing ML pipelines, data validation, and model quality gates.
  • Hands-on experience orchestrating ML workflows using Airflow (open-source or MWAA) or Databricks Workflows.
  • Familiarity with model monitoring and logging stacks (e.g., Prometheus, ELK, Datadog, or OpenTelemetry).
  • Experience deploying models as REST endpoints, batch jobs, and asynchronous workflows.
  • Version control expertise with Git/GitHub and experience in automated deployment reviews and rollback strategies.

Nice to Have:

  • Experience with Feature Store (e.g., AWS SageMaker Feature Store, Feast).
  • Familiarity with Kubeflow, SageMaker Pipelines, or Vertex AI (if multi-cloud).
  • Exposure to LLM-based models, vector databases, or retrieval-augmented generation (RAG) pipelines.
  • Knowledge of Terraform or AWS CDK for infrastructure automation.
  • Experience with A/B testing or shadow deployments for ML models

Education

Any Graduate