Data Science
Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world.
Your Profile
We are looking for experienced Data Engineers to support the design, development and deployment of scalable data pipelines. The ideal candidate will have strong hands-on experience with Azure Data Services and Databricks. They will be expected to:
- Design and implement scalable data pipelines using Azure Data Factory, Data Lake, and Databricks ensuring they are optimized for data processing and analytics.
- Develop and optimize PySpark and SQL workflows for data transformation and orchestration.
- Create and refine processes for data modelling, mining and production to support operational and analytical needs.
- Establish and enforce data quality checks and validation routines to maintain high standards of data accuracy and reliability.
- Collaborate with AI/ML teams to support RAG-based LLM applications.
- Ensure data security and governance using Azure Key Vault and related services.
Your Role
- Over 6 years of hands-on experience with Azure Data Services including Data Lake, Data Factory, Key Vault and Cognitive Search.
- Proficient in Databricks ecosystem including cluster optimization, performance tuning with expertise in Delta Lake, PySpark and orchestrating workflows.
- Experience with any relational SQL (SQL Server/Oracle) and NoSQL (MongoDB/DynamoDB) databases including Snowflake along with strong expertise in Python/PySpark for large‑scale data processing.
- Experience with real-time systems like Event Hubs, Apache Kafka, Spark-Streaming, etc.
- Experience with any Big Data frameworks like Spark/Kafka/ Hive/ Hadoop etc.
- Strong programming skills in Python and SQL for data engineering and analytics.
- Basic understanding of GenAI concepts including RAG and related AI/ML technologies and experience in generating embeddings for both structured and unstructured data sources would be preferred.
- Familiarity with DevOps practices including CI/CD pipelines, automation strategies and experience in technologies like GitHub and Bitbucket will be good to have.
- Working knowledge of BI tools (Tableau, Power BI) and data engineering platforms (Microsoft Fabric, Apache Storm, Apache NiFi) for reporting and pipeline setup will be beneficial.
What you will love about working here
- We recognize the significance of flexible work arrangements to provide support. Be it remote work, or flexible work hours, you will get an environment to maintain healthy work life balance.
- At the heart of our mission is your career growth. Our array of career growth programs and diverse professions are crafted to support you in exploring a world of opportunities.
- Equip yourself with valuable certifications in the latest technologies such as Generative AI.
Capgemini is an AI-powered global business and technology transformation partner, delivering tangible business value. We imagine the future of organizations and make it real with AI, technology and people. With our strong heritage of nearly 60 years, we are a responsible and diverse group of 420,000 team members in more than 50 countries. We deliver end-to-end services and solutions with our deep industry expertise and strong partner ecosystem, leveraging our capabilities across strategy, technology, design, engineering and business operations. The Group reported 2024 global revenues of €22.1 billion.
Make it real | www.capgemini.com
Navi Mumbai, IN