Infinite pl, is a digital led tech firm driven to become a digital logistics pioneer by harnessing the power of people, data, and platforms. We are enabled through in-house, external, network, & other investment capabilities which we utilize to orchestrate & build innovative platforms that tackle complex problems within logistics & adjacent sectors.
Infinite pl’s mission is nothing short of a logistics revolution! We're here to enrich the experiences of governments, businesses, and residents around the world through cutting-edge digital solutions.
"We're not just players; we're game-changers."
Job Summary:
- We are seeking an experienced Operations Manager to lead the technical operations team responsible for running a mission-critical platform with a target availability of 99.99%. The role will ensure service stability, SLA achievement, ITSM process compliance, security and regulatory compliance, and continuous improvement across operations. The Operations Manager will own day-to-day service operations, major incident leadership, operational readiness, vendor coordination, and reporting to senior stakeholders.
Key Objectives:
- Oversee stable operations for the platform with 99.99% availability.
- Enforce and continuously improve ITSM processes (Incident, Problem, Change, Request, Release, Knowledge, CMDB).
- Ensure SLA / SLO compliance, operational readiness, and performance reporting.
- Maintain strong security posture and ensure adherence to applicable compliance requirements.
Key Responsibilities:
- 1) Service Operations Leadership
- Oversee the platform operations team (NOC/Operations Engineers/SRE-like functions as applicable) to ensure reliable, secure, and high-performing services.
- maintain clear operating rhythms: daily ops reviews, weekly service health checks, monthly SLA reviews, and quarterly service improvement plans.
- Drive on-call readiness, shift coverage, escalation paths, and decision-making during critical events.
- 2) ITSM Process Ownership & Compliance
- Own and enforce ITSM processes end-to-end.
- Audit operational adherence and drive corrective actions for non-compliance.
- 3) SLA, Availability, and Reliability Management
- Ensure continuous tracking and achievement (availability, response time, resolution time, performance).
- Manage availability and resilience practices: redundancy validation, capacity planning, proactive monitoring, and performance tuning.
- Lead post-incident reviews and drive measurable improvements.
- 4) Security & Compliance, Partner with security teams to ensure:
- Timely patching and remediation
- Secure configuration baselines
- Audit readiness and evidence collection
- Incident response alignment and reporting
- Enforce least privilege access and periodic access reviews.
- 5) Monitoring, Observability, and Operational Tooling
- Ensure comprehensive monitoring and alerting coverage for infrastructure, applications, APIs, databases, integrations, and security events.
- Ensure operational toolchain effectiveness (ITSM tool, monitoring, CI/CD visibility, CMDB, asset management).
- 6) Stakeholder & Vendor Management
- Act as the primary operations interface for internal stakeholders and external partners/vendors.
- Manage vendor SLAs and ensure effective collaboration for incident resolution, patching, upgrades, and service improvements.
- Provide clear operational communications during incidents and planned maintenance.
- 7) Reporting & Governance
- Produce weekly/monthly service reports including SLA performance, availability, incidents, trends, risks, and improvement actions.
- Maintain an operational risk register and ensure mitigation plans are executed.
- Present service health and improvement plans to leadership.
Required Qualifications:
- Bachelor’s degree in Computer Science, Information Systems, Engineering, or equivalent experience.
- 5+ years in IT operations / production support roles, with 2+ years leading teams for critical services.
- Strong hands-on understanding of operating high-availability platforms (24/7 environments).
- Proven experience implementing and running ITSM processes in production (ITIL-aligned).
Technical & Professional Skills:
- Deep understanding of incident/problem/change management, operational readiness, and service governance.
- Experience with cloud and modern platform operations (e.g., cloud infrastructure, APIs, containerized services) is preferred.
- Ability to define, track, and improve operational KPIs and reliability metrics.
- Strong stakeholder management, structured communication, and decision-making under pressure.
Preferred Certifications:
- ITIL Foundation / ITIL Managing Professional (or equivalent ITSM certification)
- ISO 27001 awareness/certification or security-related certifications
- Cloud certifications (GCP) is a plus
Working Model:
- Full-time, includes on-call leadership and participation in major incident bridges as required.
Infinite pl ♾️ - where innovation meets logistics, and the journey is Infinitely boundless! Let's disrupt logistics together and explore infinite opportunities!