SRE
Job Description
Your role
As a Site Reliability Engineer (SRE), you will bridge the gap between development and operations, ensuring our systems are reliable, scalable, and performing optimally. You'll work in a dynamic L4 Support environment where your expertise in automation, monitoring, and incident response will be crucial for maintaining service excellence.
In this role you will play a key role in:
- Designing, implementing, and maintaining infrastructure automation using Python/Bash scripting and infrastructure-as-code tools
- Managing and optimizing Kubernetes clusters and containerized applications in Linux environments
- Creating and enhancing monitoring systems to ensure high availability and performance of critical services
- Developing automated solutions for incident response, capacity planning, and system recovery
- Collaborating with development teams to improve application reliability and scalability
- Participating in on-call rotations to provide L4 support, including potential weekend coverage when required
Your profile
- +3 years of experience in the role
- Good to have OpenStack Knowledge
- Strong experience with Linux systems administration and troubleshooting
- Proficiency in Python programming (or Bash scripting) with a focus on automation
- Hands-on experience with Kubernetes orchestration and container technologies
- Knowledge of infrastructure monitoring tools and observability practices
- Experience implementing CI/CD pipelines and DevOps methodologies
- Advanced English communication skills, both written and verbal
Job Description - Grade Specific
Buenos Aires, AR