Service Engineer L3
Job Description
Job Description - Grade Specific
We are seeking a highly experienced Service Engineer L3 to join our service operations team. This role is intended for senior professionals with deep expertise in troubleshooting complex online and distributed systems, combined with strong leadership and operational excellence capabilities.
What will you do in the project?
As an L3 engineer, you will lead the resolution of critical incidents, drive root cause analysis at scale, and actively contribute to service reliability, automation, and strategic improvements. You will also play a key role in guiding L1/L2 engineers while remaining hands-on when required.
The position requires shift work and on-call duties as part of a continuous operations model.
Key Responsibilities
Lead the resolution of complex and high-impact incidents in distributed and online service environments
Perform deep-dive diagnosis and advanced debugging of critical issues
Act as the escalation point for L1 and L2 engineers, providing technical guidance and leadership
Drive and oversee root cause analysis (RCA) and post-incident reviews across teams
Identify systemic issues and implement long-term solutions to improve service reliability
Design and develop automation solutions to optimize operational efficiency and reduce manual intervention
Build dashboards to provide visibility into SLA performance, service health, and team workload
Develop reports to provide insights on technology performance and recommend improvements
Collaborate with engineering and product teams to influence service design and resilience
Communicate effectively with stakeholders, including senior leadership, customers, and partners
Ensure compliance with data protection regulations, including GDPR
Required Skills & Experience
Strong college hire or 1-2 years of experience in service operations
6+ years of experience diagnosing/debugging faults in complex online services
Demonstrated experience diagnosing/debugging faults in distributed systems
Proven ability to lead teams while performing hands-on individual contributor work
Working knowledge of enterprise network gear including routers, switches, and load balancers
Working knowledge of enterprise routing protocols and IP subnetting
Experience using diagnostic tools such as Netmon, WinDBG, and Wireshark
Advanced experience with scripting using PowerShell, SQL, and Python
Ability to identify and script automatable problems at scale, with a focus on efficiency and reliability
Ability to build dashboards for SLA tracking and operational visibility
Ability to build analytical reports to drive service and technology improvements
Knowledge of Azure and Microsoft 365 architectural concepts (Azure Portal, Storage Nodes, VMs, etc.)
Strong understanding of GDPR laws and data protection principles
Core Competencies
Expert-level troubleshooting and analytical skills in complex environments
Strong leadership and mentoring capabilities across operational teams
Ability to manage and resolve critical incidents under pressure
Strong communication skills in written and spoken English (fluent level required)
Ability to interact with external customers and partners on behalf of Microsoft
Strong focus on automation, scalability, and continuous improvement
Ability to execute with precision in high-impact, time-sensitive scenarios
Strategic thinking with a focus on long-term service reliability and optimization
High level of ownership, accountability, and decision-making
Working Model
12x5 service coverage (service coverage from 8:00 AM to 8:00 PM) with rotating shifts
Participation in on-call (standby) rotations
Fully on-site role (Madrid, Málaga, or Asturias offices)
Madrid, ES