SRE
Job Description
We are looking for a DevOps Engineer with strong Kubernetes (AKS) expertise to design, build, and operate scalable, secure, and reliable cloud‑native platforms. In this role, you will partner closely with development, data/ML, and security teams to deliver highly available services through modern CI/CD pipelines, container orchestration, and Infrastructure as Code (IaC).
You will be responsible for operating and hardening Azure Kubernetes Service (AKS) environments, automating infrastructure with Terraform, enforcing security best practices, and establishing robust observability and reliability standards.
Key Responsibilities
1. Containerization & Kubernetes Operations
- Design, build, and maintain Docker images using best practices (multi‑stage builds, minimal base images, vulnerability remediation).
- Operate Kubernetes core objects including Namespaces, Deployments, ReplicaSets, Services, Ingress, and DNS.
- Implement liveness/readiness probes, health checks, and zero‑downtime deployment strategies.
- Configure and tune autoscaling strategies (HPA, VPA, resource requests/limits) and ensure metrics reliability.
- Manage Ingress Controllers and LoadBalancers, enforcing routing rules, TLS termination, and secure traffic flow.
- Configure and operate Kubernetes storage (Persistent Volumes, StorageClasses, CSI drivers) for stateful workloads.
2. CI/CD Engineering (Azure DevOps)
- Build and maintain YAML‑based CI/CD pipelines for microservices and data/ML workloads (e.g., .NET, Python, Java, Node.js).
- Implement multi‑stage pipelines (build → test → security scan → package → deploy) using Helm and kubectl.
- Apply advanced deployment strategies such as blue‑green and canary releases.
- Manage service connections, agent pools, secure variables, and secrets.
- Standardize and document reusable pipeline templates to improve delivery speed and reliability.
3. Infrastructure as Code & Cloud Platform
- Provision and manage Azure infrastructure using Terraform (preferred) or Bicep.
- Codify and manage AKS clusters, networking, Azure Key Vault, Azure Container Registry (ACR), Log Analytics, and supporting services.
- Automate operations using Azure CLI and PowerShell.
- Maintain environment parity (dev, test, stage, prod) with repeatable and idempotent deployments.
- Enforce Git‑based change control and immutable infrastructure practices.
4. Security, Compliance & Identity
- Enforce RBAC in Azure and Kubernetes using least‑privilege principles.
- Manage secrets with Azure Key Vault and Kubernetes secrets.
- Integrate image scanning and vulnerability checks into CI/CD pipelines.
- Apply Pod Security Standards and implement cluster, image, and supply‑chain hardening.
- Collaborate with security teams on audits, compliance requirements, and vulnerability remediation.
5. Observability & Reliability
- Implement monitoring and logging using Azure Monitor, Log Analytics, Container Insights, Prometheus, Grafana, and Dynatrace.
- Define and track SLIs and SLOs, and create actionable dashboards and alerts.
- Support incident response, root‑cause analysis, post‑incident reviews, and long‑term remediation.
6. SRE & Troubleshooting
- Diagnose Kubernetes issues including pod crashes, image pull failures, DNS/service routing errors, and performance bottlenecks.
- Troubleshoot and stabilize CI/CD pipeline failures and flaky deployments.
- Optimize cost and performance through right‑sizing, autoscaling policies, and node pool strategies (including spot nodes).
7. Collaboration & Governance
- Collaborate with developers, architects, and data/ML teams to design operable and scalable solutions.
- Create runbooks, deployment guides, and platform standards.
- Deliver knowledge‑sharing sessions and technical enablement.
- Participate in Agile ceremonies, provide estimates, and proactively remove delivery blockers.
Required Qualifications & Skills
Technical Skills
- Azure DevOps Services: YAML pipelines, Git Repos, Boards, Artifacts, service connections, agent pools, multi‑stage releases.
- Kubernetes (AKS): Deployments, Services, Ingress, ConfigMaps, Secrets, RBAC, HPA/VPA, probes.
- Docker & Containerization: Advanced Dockerfile design, multi‑stage builds, image scanning, container debugging.
- Infrastructure as Code: Terraform (preferred), Bicep, Azure CLI, PowerShell.
- Security: Azure AD, Azure RBAC, Kubernetes RBAC, Key Vault, vulnerability management.
- Networking: LoadBalancers, Ingress, TLS, DNS fundamentals; Azure VNets and NSGs (nice to have).
- Storage: Persistent Volumes, StorageClasses, CSI drivers.
- Observability: Azure Monitor, Log Analytics, Prometheus, Grafana, Dynatrace.
Experience
- 7+ years of experience in DevOps / Cloud / SRE roles.
- Strong hands‑on experience operating AKS in production environments.
- Proven background in CI/CD automation, cloud‑native platforms, and infrastructure automation.
Job Description
Job Description - Grade Specific
Aguascalientes, MX