Description

Job Description

Key Responsibilities:

  • Design and implement scalable model serving platforms for both batch and real-time inference
  • Build model deployment pipelines with automated testing and validation
  • Develop monitoring, logging, and alerting systems for ML services
  • Create infrastructure for A/B testing and model experimentation
  • Implement model versioning and rollback capabilities
  • Design efficient scaling and load balancing strategies for ML workloads
  • Collaborate with data scientists to optimize model serving performance

Technical Requirements:

  • 7+ years of software engineering experience, with 3+ years in ML serving/infrastructure
  • Strong expertise in container orchestration (Kubernetes) and cloud platforms
  • Experience with model serving technologies (TensorFlow Serving, Triton, KServe)
  • Deep knowledge of distributed systems and microservices architecture
  • Proficiency in Python and experience with high-performance serving
  • Strong background in monitoring and observability tools
  • Experience with CI/CD pipelines and GitOps workflows

Nice to Have:

  • Experience with model serving frameworks:
    • TorchServe for PyTorch models
    • TensorFlow Serving for TF models
    • Triton Inference Server for multi-framework support
    • BentoML for unified model serving
  • Expertise in model runtime optimizations:
    • Model quantization (INT8, FP16)
    • Model pruning and compression
    • Kernel optimizations
    • Batching strategies
    • Hardware-specific optimizations (CPU/GPU)
  • Experience with model inference workflows:
    • Pre/post-processing pipeline optimization
    • Feature transformation at serving time
    • Caching strategies for inference
    • Multi-model inference orchestration
    • Dynamic batching and request routing
  • Experience with GPU infrastructure management
  • Knowledge of low-latency serving architectures
  • Familiarity with ML-specific security requirements
  • Background in performance profiling and optimization
  • Experience with model serving metrics collection and analysis

Education

Any Graduate