Key Responsibilities:
Model Deployment & Management:
Deploy trained models into production environments (batch or real-time).
Manage model versioning, including datasets, and set up environments to register models.
Monitor model performance, track data drift, and address errors promptly.
Automation & Pipelines:
Automate training pipelines and retraining workflows.
Set up and manage CI/CD pipelines specifically for ML systems.
Utilize orchestration tools like Airflow and Azure ML Pipelines for end-to-end automation.
Infrastructure Management:
Manage infrastructure using cloud platforms such as Azure, AWS, or GCP (experience with Azure ML, AWS SageMaker, Kubernetes, and Databricks is preferred).
Use containerization and orchestration tools (Docker, Kubernetes) and Infrastructure as Code tools (Terraform).
Monitoring & Logging:
Develop and maintain robust logging, error handling, and alerting systems (experience with Prometheus, Grafana, and custom dashboards).
Ensure continuous monitoring and performance tracking of deployed models.
Required Skills and Expertise:
Infrastructure & DevOps:
Proficiency in Docker, Kubernetes, Terraform, and working with cloud platforms (Azure/AWS/GCP).
Model Deployment:
Experience with REST APIs (using frameworks such as FastAPI or Flask) and model packaging formats (ONNX, TorchScript, etc.).
Pipelines & Automation:
Expertise with orchestration and automation tools such as Airflow, Azure ML Pipelines, MLflow, and Jenkins.
Monitoring:
Strong skills in setting up logging, alerting, and performance tracking systems.
Programming & Data Engineering:
Proficient in Python, with strong familiarity in ML libraries (scikit-learn, PyTorch, TensorFlow).
Experience with ETL processes, SQL, data validation, and data formats (Delta Lake/Parquet).
Version Control:
Competency with Git, DVC, and tools for model and dataset versioning.
Additional Preferred Experience:
Hands-on experience writing production-ready code, with a solid understanding of error handling and logging practices.
Experience with Azure Data Factory for managing data pipelines and orchestration.
Any Graduate