Description

Job Description

We are seeking a skilled and proactive Software Engineer (Python/MLOps) to help bridge the gap between data science and production engineering. You’ll be responsible for building and maintaining the infrastructure, tooling, and workflows required to develop, test, deploy, and monitor machine learning models at scale.

 

Skillset:  Programming (Python/AI) , Apache Airflow,  Site Reliability Engineering (SRE), CI/CD, Kubernetes, Logging & Monitoring , Kibana or Grafana or Prometheus

 

Job Description

 

  • Professional expertise and technical proficiency in programming language (Python/MLOps) & hands on coding skills to develop scalable application from the scratch.
  • Technical proficiency & experience and hands on with web services (Restful) both developing and consuming these API.
  • Professional hands-on expertise to code for backend frameworks, Apache Kafka & Apache Airflow.
  • Technical expertise in DevOps process, CI/CD Pipeline , GitHub. Kafka as messaging service.
  • Basic understanding of observability /SRE concepts: dashboards, alerts, metrics, and logs.
  • Technical expertise to work on the site reliability and perform observability to validate the dashboards run and monitor migration scripts and support the observability team's day-to-day operations.
  • Technical proficiency in monitoring tools Grafana, Prometheus, and Kibana.
  • Strong knowledge and expertise in product development from scratch.

 

Nice to Have

  • Experience deploying real-time inference services and batch prediction pipelines
  • Familiarity with model explainability, fairness, and responsible AI practices
  • Exposure to feature stores (e.g., Feast, Tecton) and experiment tracking platforms

Education

Any Graduate