Apply now »

Site Reliability Engineer (Fully remote)

DESCRIPTION

At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world’s most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as they provide unique R&D and engineering services across all industries. Join us for a career full of opportunities. Where you can make a difference. Where no two days are the same.

YOUR ROLE

  • Design and maintain Datadog dashboards for monitoring critical system metrics, including:

➜ Kubernetes metrics.

➜ Application performance metrics.

➜ CI/CD pipeline metrics.

➜ AWS infrastructure metrics.

  • Lead troubleshooting efforts for metric collection, visualization, and any issues in Datadog.
  • Analyze Application Performance Monitoring (APM) data to support both technical and business decision-making.
  • Collaborate cross-functionally with engineering, operations, and product teams to implement performance improvements and resolve reliability challenges.
  • Develop and maintain infrastructure as code (Terraform preferred) to automate and streamline cloud operations.

YOUR PROFILE

  • Experience in Site Reliability Engineering, DevOps, or similar roles, with a focus on cloud-native technologies and systems.
  • Deep expertise in Datadog, including dashboard creation, metric ingestion, and APM analysis.
  • Strong hands-on experience with Kubernetes, AWS services, and CI/CD pipelines.
  • Proficient in monitoring and logging tools such as Fluentbit, Loki, Prometheus, and Grafana.
  • Solid understanding of infrastructure as code (Terraform preferred).
  • Excellent troubleshooting skills in distributed systems, especially in cloud-native environments.
  • Strong communication skills and experience working with external vendors and stakeholders.
  • Ability to work effectively in a remote, international team environment.

Nice to have:

  • Experience with Tekton, Jenkins, Kafka, Redis, and PostgreSQL (Patroni).
  • Familiarity with authentication and authorization tools such as Keycloak or Tozny.
  • Knowledge of artifact and container management platforms like Harbor, ECR, or Minio.
  • Experience in security management, including authentication and authorization processes.

WHAT YOU’LL LOVE ABOUT WORKING HERE

  • Join a multicultural and inclusive team environment.
  • Enjoy a supportive atmosphere promoting work-life balance.
  • Engage in exciting national and international projects.
  • Hybrid work.
  • Your career growth is central to our mission. Our array of career growth programs and diverse professionals are crafted to support you in exploring a world of opportunities.
  • Training and certifications programs.
  • Health and life insurance.
  • Referral program with bonuses for talent recommendations.
  • Great office locations.

ABOUT CAPGEMINI

Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world.

Apply now!

Ref. code:  338858
Posted on:  Oct 7, 2025
Experience Level:  Experienced Professionals
Contract Type:  Permanent
Location: 

Lisboa, PT Lisboa, PT  Fundão, PT Vila Nova de Gaia, PT Evora, PT Vila Nova de Gaia, PT

Brand:  Capgemini Engineering
Professional Community:  Products & Systems Engineering

Apply now »