Apply now »

Lead AI Engineer

Long Description

Location: Bangalore 
Experience: 7+ years

Choosing Capgemini  joining a team where you’ll be empowered to build cutting-edge AI infrastructure, supported by a collaborative global community, and inspired to reimagine what’s possible. Join us in enabling scalable, fault-tolerant AI systems that power next-generation machine learning workloads.


Your Role

As an AI Runtime Engineer, you will design and optimize distributed AI runtimes that enable high-performance, multi-node, multi-GPU training at scale. You’ll work closely with AI infrastructure teams to build elastic, fault-tolerant systems and ensure seamless orchestration for advanced AI workloads.


In this role, you will:

  • Architect and implement distributed AI runtime systems with elastic scaling and job recovery.
  • Optimize performance at low levels (CUDA, NCCL, PyTorch internals) for multi-GPU workloads.
  • Develop custom runtime architectures for large-scale AI training pipelines.
  • Integrate orchestration tools like Kubernetes, Ray, TorchElastic, Horovod for containerized AI workloads.
  • Implement fault recovery mechanisms and observability hooks for runtime health monitoring.
  • Collaborate with AI researchers and platform engineers to ensure efficient resource utilization and throughput optimization.
  • Contribute to CI/CD pipelines for AI infrastructure and runtime deployments.

Your Profile

  • Mandatory Skills:
    • Hands-on experience in distributed training systems, multi-node/multi-GPU orchestration.
    • Expertise in PyTorch internals, CUDA, NCCL, and performance profiling.
    • Strong knowledge of Kubernetes, containerization, and orchestration frameworks.
  • Preferred Skills:
    • Experience with TorchElastic, Ray, Horovod.
    • Open-source contributions to PyTorch or runtime libraries.
    • Background in HPC, compilers, or systems research.
  • Education:
    • Bachelor’s/Master’s in Computer Science, Engineering, or related field.

About Us

At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world’s mostinnovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as theyprovide unique R&D and engineering services across all industries. Join us for a career full of opportunities. Where you can make a difference. Where no two days arethe same.

 

Ref. code:  352835
Posted on:  8 Dec 2025
Experience Level:  Experienced Professionals
Contract Type:  Permanent
Location: 

Bangalore, IN

Brand:  Capgemini Engineering
Professional Community:  Data & AI

Apply now »