Overview
DTCC is seeking a Lead, Observability within the Product organization to define, drive, and evolve enterprise observability capabilities across platforms and services. This role is responsible for observability as a product, ensuring delivery of scalable, reusable, and standardized solutions that provide real‑time insight into DTCC’s critical systems and materially reduce client and production impact.
The Lead, Observability partners closely with engineering, platform, operations, and application teams to deliver unified observability outcomes aligned to DTCC’s modernization initiatives and business priorities.
Primary Responsibilities
Product Strategy & Vision
Define and own the enterprise observability strategy and roadmap, aligned to DTCC’s technology modernization, resiliency, and reliability objectives.
Establish observability as a product capability, with clear value propositions, outcomes, and success metrics.
Drive standardization of observability patterns, architectures, and tooling across infrastructure, platforms, and applications.
Ensure observability solutions enable proactive detection, faster root‑cause analysis, and reduced mean time to detect and resolve incidents.
Product Delivery & Execution
Lead the design, development, and rollout of enterprise observability capabilities, including metrics, logs, traces, events, and alerting.
Deliver scalable, reusable observability solutions that support hybrid and cloud‑native environments.
Partner with engineering and platform teams to embed observability by design across the software development lifecycle.
Own delivery outcomes, including milestones, dependencies, risks, and execution timelines.
Ensure observability platforms meet enterprise requirements for resiliency, performance, security, and compliance.
Stakeholder Partnership & Enablement
Act as the primary product lead for observability across engineering, infrastructure, operations, SRE, and application teams.
Collaborate with operations and incident management teams to improve production visibility, diagnostics, and response effectiveness.
Influence adoption through enablement, documentation, onboarding, and best‑practice guidance.
Serve as a key point of escalation for observability‑related production and platform challenges.
Operational Excellence & Reliability
Drive improvements in production stability, client experience, and service reliability through actionable observability insights.
Define and track key observability and reliability metrics (e.g., SLIs, SLOs, error budgets, MTTR).
Ensure observability solutions support regulatory, audit, and operational risk requirements.
Continuously evaluate observability maturity and lead initiatives to close gaps.
Technology & Innovation
Evaluate and evolve observability tooling and platforms to meet current and future enterprise needs.
Champion modern observability practices, including distributed tracing, real‑time analytics, and intelligent alerting.
Partner with automation and AI initiatives to leverage observability data for predictive insights and operational intelligence.
Stay current on industry trends and emerging capabilities in observability and reliability engineering.
Required Qualifications
Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent experience).
8+ years of experience in technology roles spanning software engineering, platform engineering, SRE, or operations.
Demonstrated experience leading or owning observability, monitoring, or reliability initiatives at enterprise scale.
Strong understanding of modern distributed systems, cloud and hybrid architectures, and production operations.
Experience defining product roadmaps, standards, and reusable capabilities in a complex organization.
Proven ability to influence and align cross‑functional teams without direct authority.
Strong communication skills, with the ability to translate technical concepts into business outcomes.
Preferred Qualifications
Experience in financial services or other highly regulated, mission‑critical environments.
Hands‑on experience with observability platforms and tooling (metrics, logging, tracing, alerting).
Familiarity with SRE practices, incident management, and operational risk frameworks.
Exposure to automation or AI‑driven operational intelligence.
Prior experience operating in a product‑led technology organization.
Leadership & Behavioral Competencies
Product‑oriented mindset with a strong focus on outcomes and value realization.
Ability to operate at both strategic and execution levels.
Strong problem‑solving skills and comfort navigating ambiguity.
Collaborative leadership style with the ability to drive alignment across teams.
Commitment to continuous improvement, reliability, and operational excellence.