We are seeking a highly experienced Senior IT Site Reliability Manager to coordinate and continuously improve our IT operations while advancing Site Reliability Engineering (SRE) practices. This role is responsible for ensuring the high availability, reliability, and scalability of our internally developed front-end, back-end, and mobile application systems in a 24/7 environment. Key Responsibilities
Lead and coordinate day-to-day IT operations and service delivery
Ensure maximum system availability, reliability, and performance across all platforms
Drive, evolve, and scale Site Reliability Engineering (SRE) practices, including monitoring, incident response, and automation
Own and improve 24/7 operational readiness, including on-call models and escalation processes
Collaborate closely with development teams in agile environments (e.g., SAFe) to enhance system resilience and scalability
Continuously identify and implement improvements based on incident analysis, KPIs, and operational insights