Description

As an ML Ops Engineer, you will be part of an Agile team focused on building healthcare applications and implementing new features while adhering to the best coding development standards. Initially, you will engage in substantial software engineering tasks, with the opportunity to transition into more ML Ops responsibilities as the team evolves.

Responsibilities:

  • Design and write service code for various applications.
  • Work with Python & Java, creating both Kubernetes and Lambda deployed services.
  • Patch vulnerabilities to comply with security requirements.
  • Manage model deployment pipelines (e.g., updating Docker containers to accommodate model updates or dependencies).
  • Monitor and optimize infrastructure (e.g., investigating unusual latency spikes, setting up alerts for new deployments).
  • Utilize various AWS services used in GenAI solutions, such as Opensearch, Bedrock, S3, SQS, SNS, Snowflake, DynamoDB, and AWS Batch.
  • Collaborate with Machine Learning Scientists on new models/feature deployments.
  • Handle model version control & updates (e.g., managing model artifacts, coordinating rolling updates of models in production).
  • Work closely with DevOps, InfoSec, & other IT teams to ensure that AI infrastructure is scalable, aligned with overall system architecture, & meets current security requirements.

Mandatory Skills:

  • Familiarity with cloud platforms and their AI services (e.g., AWS Bedrock).
  • Experience with containerization technologies like Docker and version control systems such as Git (Gitlab CI is a bonus).
  • Proficiency in programming languages such as Python, Java, Boto3, and Llama.
  • Proven experience successfully deploying AI models into production environments.
  • Proven experience with all phases of the AI/ML development lifecycle, from discovery through development/deployment and monitoring.
  • Strongly self-motivated, strong communicator & team player.

Good to Have Skills:

  • Experience with CI/CD pipelines and infrastructure as code (IaC) tools like Gitlab, Terraform.
  • Experience with a variety of AI frameworks and libraries, such as TensorFlow, PyTorch, Keras, OpenCV, etc.
  • Proficiency deploying and managing AI models across multiple domains (ML, DL, NLP, GenAI).
  • Python (expert).
  • Java (familiarity).
  • Docker (highly proficient).
  • AWS (highly proficient).
  • SQL (fundamental knowledge)

Education

Any Graduate