Cloud Infrastructure (SRE)
Work from home
Your role:
Responsible for the performance, availability, and reliability of our cloud-based services and underlying infrastructure, acting as a critical technical subject matter expert.
- Manage, troubleshoot, and optimize containerized applications and infrastructure deployed on Kubernetes, RedHat OpenShift, and OpenStack platforms.
- Serve as the Subject Matter Expert (SME) for core cloud infrastructure technologies, including advanced Linux (CentOS) system administration, Docker/Containers, and complex networking configurations.
- Lead the investigation and resolution of complex, high-severity customer issues, applying strong analytical knowledge to quickly diagnose problems across the entire cloud stack.
- Utilize your expertise to quickly identify root causes and implement effective, durable solutions for customer incidents.
- Prepare and conduct rigorous Root Cause Analysis (RCA) for critical incidents to identify systemic issues and prevent recurrence.
- Develop, test, and maintain robust automation scripts using Python and Ansible to streamline daily operational tasks and improve overall service efficiency.
- Identify and implement automation opportunities to reduce manual effort in maintenance and deployment activities.
- Provide end-to-end Escalation, Monitoring, and Emergency (EME) support, acting as a final escalation point to ensure service availability and meet SLAs.
- Liaise directly with customers team and internal teams to understand requirements and deliver tailored technical solutions.
Your Profile:
- Strong knowledge and proven hands-on experience with Linux administration.
- Strong knowledge of core networking principles (TCP/IP, routing, load balancing, firewalls) in a cloud environment.
- Strong knowledge of Kubernetes orchestration, OpenStack platforms, and Docker/Containerization.
- Solid Python scripting skills for task automation and system management.
- Hands-on experience with Ansible for configuration management.’
- Expertise in preparation and implementation of RCAs.
- Proven experience with EME (Escalation, Monitoring, and Emergency) management processes.
What We Offer
- Stable Employment: Permanent contract offering long-term job security.
- Learning & Development: Access to a wide range of online training platforms and professional development resources.
- Language Training: Weekly virtual English classes and conversation sessions with certified instructors. Online Courses for different languages.
- Health Coverage: Comprehensive prepaid medical and dental plans.
- Insurance Protection: Life and accident insurance for peace of mind.
- Wellness Perks: Discounts and benefits through fitness and technology partnerships.
- Special Occasion benefits.
#LI-DC10
#LI-Remote
Works in the area of Software Engineering, which encompasses the development, maintenance and optimization of software solutions/applications. 1. Applies scientific methods to analyse and solve software engineering problems. 2. He/she is responsible for the development and application of software engineering practice and knowledge, in research, design, development and maintenance. 3. His/her work requires the exercise of original thought and judgement and the ability to supervise the technical and administrative work of other software engineers. 4. The software engineer builds skills and expertise of his/her software engineering discipline to reach standard software engineer skills expectations for the applicable role, as defined in Professional Communities. 5. The software engineer collaborates and acts as team player with other software engineers and stakeholders.Bogota, CO