Lead Data Scientist
RH: Lessly Mujica
Location: any state of Mexico
General Description:
As a Lead Data Scientist, you will be at the forefront of our data science initiatives, leveraging advanced machine learning techniques and cloud technologies to deliver impactful solutions for our clients. You will lead the development and deployment of production-grade machine learning models using Azure's ecosystem, driving innovation and excellence in our data science practice.
In this role you will play a key role in:
- Leading the end-to-end machine learning lifecycle from problem formulation to production deployment, ensuring high-quality, scalable, and maintainable solutions
- Architecting and implementing advanced ML pipelines using Azure ML, Databricks, and related services while adhering to best practices in MLOps
- Developing and optimizing tabular machine learning models (XGBoost, LightGBM, Random Forest, Logistic Regression) with focus on explainability, performance tuning, fairness, and robustness
- Building and maintaining scalable data processing pipelines using PySpark and Spark SQL to support model training and inference
- Deploying models as REST APIs through Azure API Management and establishing monitoring frameworks for production models
- Creating comprehensive documentation including model cards, validation reports, runbooks, lineage documentation, and operational guidelines
- Leading cross-functional collaboration with platform teams, data engineers, governance specialists, and domain experts
- Mentoring junior data scientists and establishing best practices for the team
- Communicating complex technical concepts to stakeholders and influencing architectural decisions
Your profile
- Bachelor's or Master's degree in Computer Science, Engineering, Statistics, or related technical field
- 15+ years of overall IT experience with 6+ years delivering production-grade ML models using enterprise cloud platforms, including at least 3+ years in Azure ecosystems
- Demonstrated expertise in Azure ML (workspaces, pipelines, model registry, endpoints), Azure Databricks (PySpark, Spark SQL, MLflow), Azure Data Factory (data pipelines & orchestration), Azure Data Lake (ADLS Gen2), and Azure DevOps CI/CD
- Deep knowledge of tabular machine learning techniques including XGBoost, LightGBM, Random Forest, and Logistic Regression, with focus on explainability, performance tuning, fairness checks, and robustness validation
- Strong Python development skills with experience building scalable data and ML pipelines using PySpark/Spark SQL
- Experience working with Medallion Architecture (Bronze/Silver/Gold) and semantic layers
- Proven ability to deploy models as REST APIs through API gateways or Azure API Management
- Experience producing comprehensive documentation including model cards, validation reports, runbooks, lineage documentation, and operational guidelines
- Familiarity with responsible AI practices, fairness testing, and SHAP explainability
- Experience working in agile environments with SIT, UAT, and phased releases
- Strong stakeholder management skills and ability to lead cross-functional collaboration
- Experience in regulated domains (pharma, healthcare, clinical operations, CRO) is highly desirable
- Knowledge of validation processes like GxP and CSV is a plus
What you'll love about working here
At Capgemini LATAM, we aim to attract the best talent and are committed to creating a diverse and inclusive work environment, so there is no discrimination based on race, sex, sexual orientation, gender identity or expression, or any other characteristic of a person. All applications welcome and will be considered based on merit against the job and/or experience for the position
About Capgemini
Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fuelled by its market leading capabilities in AI, generative AI, cloud and data, combined with its deep industry expertise and partner ecosystem.
Aguascalientes, MX