Senior AIOps SME
Your Role
- Implement and maintain AI-driven systems for real-time monitoring, alerting, anomaly detection, and root cause analysis
- Develop and train machine learning models using operational data (logs, metrics, events, traces)
- Automate incident detection and remediation workflows (e.g., self-healing scripts, intelligent runbooks)
- Collaborate with DevOps, SRE, and IT teams to integrate AIOps tools into CI/CD pipelines and cloud infrastructure
- Design data pipelines that ingest, clean, and analyze high-volume system telemetry data
- Evaluate and deploy AIOps platforms (e.g., Moogsoft, Dynatrace, Splunk, BigPanda, DataDog)
- Monitor model performance, manage retraining cycles, and ensure AI reliability
Your Profile
- Experience in DevOps, SRE, or infrastructure automation roles
- Proficiency in Python, Bash, or other scripting languages
- Familiarity with AI/ML frameworks (e.g., Scikit-learn, TensorFlow, PyTorch)
- Strong understanding of observability stacks (e.g., Prometheus, Grafana, Splunk, OpenTelemetry)
- Experience with public cloud platforms (AWS, Azure, GCP) and infrastructure as code (e.g., Terraform, Ansible)
- Hands-on experience with AIOps platforms or building ML-based incident response systems
- Background in log analysis, time-series forecasting, or unsupervised anomaly detection
- Knowledge of Kubernetes, container orchestration, and service mesh architectures
- Exposure to ITSM/ITIL processes and how they integrate with AIOps
- Ability to communicate ML results to operational teams and implement iterative improvements"a
What will you love working at Capgemini
- Work with modern technologies across enterprise‑grade digital solutions.
- Grow your technical expertise through continuous learning and structured career paths.
- Collaborate in high‑performing engineering teams delivering solutions at scale.
- Benefit from a flexible, professional work environment that supports productivity and innovation.
Noida, IN