Description

Key Responsibilities:

Design, develop, and maintain the next generation of scalable AI platform for the world's best investment management technology platform.

Implement and manage Kubernetes clusters for deploying AI models.

Build platform abstractions to manage cloud-native infrastructure across AWS, GCP, or Azure environments.

Build and maintain automated pipelines for continuous training, testing, and deployment of machine learning models, with integrated enterprise concerns.

Ensure the security and compliance of the platform.

Troubleshoot and resolve issues related to platform performance and reliability.

Refine business and functional requirements and translate them into scalable technical designs.

Apply quality software engineering practices throughout the software development lifecycle.

Work with team members in a multi-office, multi-country environment.

Stay updated with the latest trends and technologies in AI and cloud engineering.

 

Requirements:

B.S./M.S. degree in Computer Science, Engineering, or a related subject area.

10+ years of experience in software and platform engineering.

Proficiency in designing and building scalable APIs and microservices.

Strong proficiency in Kubernetes, including Helm charts, Kustomize, and custom resource definitions (CRDs).

Hands-on experience with cloud platforms such as AWS, GCP, or Azure.

Expertise in containerization technologies (Docker, containerd).

Experience in CI/CD tools (Jenkins, GitHub Actions, ArgoCD).

Knowledge of infrastructure such as code (IaC) tools like Terraform or CloudFormation.

Solid understanding of networking concepts, security policies, and API gateways in cloud environments.

Proficiency in production-grade programming languages such as Rust and C++.

Decent understanding of distributed systems, cluster orchestration and management.

Good knowledge of data science tools (e.g PyTorch, Jax, Numpy) and programming languages such as Python.

Experience with monitoring tools (Prometheus, Grafana).

Experience working in Agile development teams with excellent collaboration skills.

Grit in the face of technical obstacles.

 

Nice to have:

Building SDKs or client libraries to support API consumption.

Knowledge of distributed data processing frameworks (Spark, Dask).

Understanding of GPU orchestration and optimization in Kubernetes.

Familiarity with MLOps and ML model lifecycle pipelines.

Experience with AI model training and fine-tuning.

Familiarity with event-driven architecture and messaging frameworks like Kafka.

Experience with NoSQL datastores like Cassandra

Education

Any Graduate