As a Senior Software Engineer, you will be responsible for developing and maintaining the infrastructure required to deploy, monitor, and manage machine learning models efficiently and effectively.
This role is focused on building ML-Ops solutions, but general software engineering skills are sufficient.
The work is critical in bridging the gap between research and engineering, ensuring that our AI solutions are scalable, reliable, and seamlessly integrated into our products.
This role requires you to thrive in a fast-paced environment, be passionate about AI/ML, and be constantly looking for ways to optimize and automate machine learning workflows.
Responsibilities
Pipeline Development: Implement, optimize, and maintain CI/CD pipelines for ML systems, including integrations with GitHub workflows and Jenkins.
Collaboration: Partner with data scientists, frontend engineers, and platform teams to deliver seamless integration of ML models into core evaluation platforms.
Environment Management: Administer ML development/production environments using cloud-native solutions; optimize for scalability, reliability, and cost.
Tooling and Automation: Evaluate, build, and deploy automation tools to streamline the end-to-end ML lifecycle.
Quality & Monitoring: Enhance and develop quality evaluation features and ensure robust monitoring via dashboards and automated alerts.
Documentation & Best Practices: Champion engineering best practices, promote code quality, and document workflows, tools, and processes for effective team adoption.
Qualifications:
Python, Typescript, Shell script languages
Experience with ML pipeline tools (Kubeflow, Airflow, MLflow)
Master's in computer science or related STEM field
Minimum 5 years in software engineering; at least 2 years dedicated to DevOps/MLOps in cloud and production environments.
Industry experiences building end-to-end software pipelines and infrastructure with deep experience with Kubernetes, Infrastructure as Code (Terraform, CloudFormation), AWS, and GCP.
Expert proficiency in Python; working knowledge of ML frameworks (e.g., PyTorch, TensorFlow, MLflow)
Practical experience with cloud and NoSQL databases such as DynamoDB; SQL databases a plus.
Skilled with GitHub Actions, Jenkins, GitLab CI, Docker, and related automation platforms.
Exposure to Computer Vision, Generative AI (GAN, CLIP, Diffusion, MLLM), and their practical deployment for evaluation systems.
Experience in integrating ML workflows with user-facing features and backend pipelines.
Strong problem-solving, excellent written/verbal communication, and the ability to lead and collaborate effectively across teams