Lead AI Engineer
Long Description
Location: Bangalore
Experience: 7+ years
Choosing Capgemini joining a team where you’ll be empowered to build cutting-edge AI infrastructure, supported by a collaborative global community, and inspired to reimagine what’s possible. Join us in enabling scalable, fault-tolerant AI systems that power next-generation machine learning workloads.
Your Role
As an AI Runtime Engineer, you will design and optimize distributed AI runtimes that enable high-performance, multi-node, multi-GPU training at scale. You’ll work closely with AI infrastructure teams to build elastic, fault-tolerant systems and ensure seamless orchestration for advanced AI workloads.
In this role, you will:
- Architect and implement distributed AI runtime systems with elastic scaling and job recovery.
- Optimize performance at low levels (CUDA, NCCL, PyTorch internals) for multi-GPU workloads.
- Develop custom runtime architectures for large-scale AI training pipelines.
- Integrate orchestration tools like Kubernetes, Ray, TorchElastic, Horovod for containerized AI workloads.
- Implement fault recovery mechanisms and observability hooks for runtime health monitoring.
- Collaborate with AI researchers and platform engineers to ensure efficient resource utilization and throughput optimization.
- Contribute to CI/CD pipelines for AI infrastructure and runtime deployments.
Your Profile
- Mandatory Skills:
- Hands-on experience in distributed training systems, multi-node/multi-GPU orchestration.
- Expertise in PyTorch internals, CUDA, NCCL, and performance profiling.
- Strong knowledge of Kubernetes, containerization, and orchestration frameworks.
- Preferred Skills:
- Experience with TorchElastic, Ray, Horovod.
- Open-source contributions to PyTorch or runtime libraries.
- Background in HPC, compilers, or systems research.
- Education:
- Bachelor’s/Master’s in Computer Science, Engineering, or related field.
At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world’s most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as they provide unique R&D and engineering services across all industries. Join us for a career full of opportunities. Where you can make a difference. Where no two days are the same.
Bangalore, IN