Senior Observability & SRE Leader
Senior Observability & SRE Leader
Marsh is seeking a visionary, transformational leader to reimagine and rebuild our Observability and Site Reliability Engineering function from the ground up. This is not a role for someone who wants to maintain the status quo. We need a leader who will fundamentally shift this function to a predictive, data-driven engineering discipline that prevents outages before they happen, embeds reliability into every system from design through production, and treats observability data as a strategic asset - not just an operational tool.
This is a career-defining opportunity to build a world-class observability and SRE organization at Fortune 500 scale.
Job Responsibilities
STRATEGIC VISION & PLATFORM TRANSFORMATION
Define and execute an observability and SRE strategy that shifts the organization from reactive operations to predictive reliability engineering.
Architect and deliver a unified, full-stack observability platform covering metrics, traces, logs, real-user monitoring (RUM), synthetic monitoring, and business-level KPIs - across on-prem, multi-cloud (AWS/Azure), containers, and SaaS integrations.
Rationalize and consolidate the current fragmented tooling landscape into a cohesive, cost-optimized platform. Eliminate redundant tools, reduce alert noise by 80%+, and establish a single pane of glass for system health.
Drive adoption of OpenTelemetry as the standard instrumentation framework, ensuring vendor-agnostic telemetry collection and future portability.
PREDICTIVE & PROACTIVE RELIABILITY
Build and operationalize AIOps and ML-driven capabilities to detect anomalies, predict failures, and surface emerging risks before they impact customers. Move beyond threshold-based alerting to intelligent, context-aware detection.
Establish automated correlation engines that link infrastructure signals, application traces, deployment events, and change records to dramatically reduce diagnostic time and identify root cause automatically.
Design and implement self-healing automation that detects, diagnoses, and remediates common failure patterns without human intervention – targeting 40%+ of recurring incidents for autonomous resolution.
Introduce chaos engineering and reliability testing programs (GameDays, fault injection, load testing) to proactively discover weaknesses before production incidents reveal them.
SITE RELIABILITY ENGINEERING CULTURE
Transform the existing operations-centric team into a modern SRE organization with embedded reliability engineers across product and platform squads, operating under a "you build it, you own it" model.
Define and implement SLO/SLI/Error Budget frameworks across critical services, creating a shared language between engineering, product, and business stakeholders for reliability decisions.
Drive the adoption of DevOps practices, CI/CD pipelines, and infrastructure as code using tools like Terraform or CloudFormation to manage infrastructure.
Champion reliability-first design principles - ensuring observability, graceful degradation, circuit breaking, and failure isolation are architected into every system from day one, not bolted on after launch.
INCIDENT PREVENTION & RAPID RECOVERY
Partner with Major Incident Management and Problem Management to build closed-loop feedback systems - every incident produces a reliability improvement, not just a postmortem document.
Drive MTTR toward minutes (not hours) through automated diagnostics, pre-built remediation playbooks, and intelligent correlation that tells responders what is wrong, not just that something is wrong.
Establish "Incidents Prevented" as a primary success metric alongside traditional MTTR/MTTD measures.
BUSINESS-ALIGNED OBSERVABILITY
Elevate observability from infrastructure metrics to business outcomes. Build real-time dashboards that connect system health to revenue impact, customer experience scores, and SLA compliance.
Integrate observability insights into ITSM (ServiceNow), data platforms, and executive reporting - making reliability data a first-class input to business and technology decision-making.
ENGINEERING & OPERATIONAL EXCELLENCE
Own the total cost of ownership of the observability platform. Optimize spend through data tiering, intelligent sampling, retention policies, and vendor negotiations. Deliver more insight per dollar.
Manage strategic vendor relationships (Datadog, Splunk, Logic Monitor, cloud-native tooling) with a focus on maximizing value extraction, not just license management.
Build a platform engineering mindset: observability capabilities are delivered as self-service products to engineering teams – instrumentation libraries, dashboard templates, alerting-as-code, SLO toolkits.
TEAM BUILDING & LEADERSHIP
Recruit, develop, and retain a world-class team of SRE engineers, observability platform engineers, data and performance engineers, and reliability analysts.
Establish an Observability & SRE Centre of Excellence that drives standards, best practices, and enablement across the global enterprise.
Foster a learning culture through internal tech talks, blameless postmortems, chaos engineering programs, and industry engagement.
REQUIRED EXPERIENCE & EXPERTISE
15+ years in technology with 8+ years in progressively senior observability, SRE, or platform reliability leadership roles.
Demonstrated track record of transforming reactive monitoring organizations into proactive, engineering-driven SRE functions at enterprise scale (10,000+ employees, 1,000+ applications).
Deep expertise across the full observability stack: metrics (Prometheus, Datadog, CloudWatch), distributed tracing (Jaeger, OpenTelemetry, Datadog APM), log aggregation (Splunk, ELK, Datadog Logs), synthetic monitoring, and RUM.
Hands-on experience defining and operationalizing SLO/SLI/Error Budget frameworks that drive engineering prioritization and business alignment.
Proven experience building AIOps / ML-driven anomaly detection and automated remediation capabilities - not just evaluating vendor demos, but delivering production systems that prevent real incidents.
Strong background in chaos engineering, resilience testing, and reliability-by-design practices (circuit breakers, bulkheads, graceful degradation, retry/backoff patterns).
Experience operating across hybrid infrastructure: on-premises data centers, AWS, Azure, containerized workloads (Kubernetes), and SaaS platforms.
Demonstrated ability to drive cultural and organizational transformation across large, complex enterprises with multiple business units and hundreds of engineering squads.
Experience managing $5M+ observability platform budgets and optimizing total cost of ownership while expanding coverage and capability.
Executive communication skills - ability to present reliability strategy, risk posture, and investment cases to C-suite and board-level audiences.
Visionary thinker who can articulate a compelling future state and build the roadmap to get there - then execute relentlessly.
Marsh (NYSE: MRSH) is a global leader in risk, reinsurance and capital, people and investments, and management consulting, advising clients in 130 countries. With annual revenue of over $27 billion and more than 95,000 colleagues, Marsh helps build the confidence to thrive through the power of perspective. For more information, visit corporate.marsh.com, or follow us on LinkedIn and X.
Marsh is committed to embracing a diverse, inclusive and flexible work environment. We aim to attract and retain the best people and embrace diversity of age background, disability, ethnic origin, family duties, gender orientation or expression, marital status, nationality, parental status, personal or social status, political affiliation, race, religion and beliefs, sex/gender, sexual orientation or expression, skin color, or any other characteristic protected by applicable law. In accordance with the Accessibility for Ontarians with Disabilities Act, 2005, Marsh will provide a reasonable accommodation to employees and prospective employees to the point of undue hardship upon request and as required in respect of the individual’s particular restrictions and limitations. If you require a specific accommodation because of a disability or medical need, please contact View email address on careers.marsh.com. Marsh is committed to hybrid work, which includes the flexibility of working remotely and the collaboration, connections and professional development benefits of working together in the office. All Marsh colleagues are expected to be in their local office or working onsite with clients at least three days per week. Office-based teams will identify at least one “anchor day” per week on which their full team will be together in person. This is a New position.- We are seeking a highly accomplished Senior Project Manager/Leader for an enterprise-level contract opportunity based in Toronto. In this role, you will take on a premier leadership capacity within the IT project delivery stream, specializing in planning, directing, and executing...SeniorLong term contractContract workInternship
- We are seeking a highly accomplished and politically astute Senior Project Manager/Leader for an enterprise-level contract opportunity based in Toronto. In this role, you will take complete ownership of the planning, governance, and end-to-end delivery of large-scale, high-...SeniorLong term contractContract work
- ...Big Viking Games is looking for a Senior HR & Operations Leader to help build the operational infrastructure that allows the company to scale effectively... ...Effectiveness & Accountability Coach managers and leaders on performance management, communication, accountability, and...Senior
- We are seeking a highly accomplished, strategic, and results-driven Senior Project Manager/Leader for an enterprise-level contract opportunity. In this role, you will take complete ownership of the planning, development, and delivery of high-impact technical initiatives in...SeniorLong term contractContract workInternshipWork at office
- We are seeking an adaptive and delivery-focused Senior Project Manager/Leader to provide integrated project management leadership for high-profile technology... ...prompt decision-making among business and technology leaders. Qualifications Core Seniority: 8+ years of...SeniorLong term contractContract work
- We are seeking a highly accomplished Senior Project Manager/Leader to drive the strategic execution and full-lifecycle delivery of large-scale, complex IT modernization and legacy system replacement initiatives. In this role, you will command end-to-end governance over project...SeniorRemplacementContract workRemote work
- We are seeking a highly accomplished Senior Project Manager/Leader for an exciting hybrid contract opportunity based in Toronto. In this role, you will take complete ownership of the end-to-end delivery of complex Azure cloud application projects, managing timelines, scope,...SeniorContract workWork at office3 days per week
- ...a focus on cost-effectiveness, efficiencies and compliance with OPS project management methodologies and frameworks Interact with senior staff within the ministry and with external partners and manage relationships Lead organizational development, strategy development...SeniorContract workInternshipFlexible hours
$80 - $120 per hour
...Catalyst , Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey . Position: Incident management / reliability / SRE Evaluator Type: Contract Compensation: $80–$120/hour Location: Remote Role Responsibilities...Remote jobContract workSummer workWork at office$105k - $234k per year
...look like? We are seeking a dynamic and strategic B2B Commerce Leader to drive end-to-end transformation across the lead-to-cash lifecycle... ...Management: Serve as a trusted advisor to C-suite and senior executives, guiding enterprise clients on their B2B commerce and...SeniorLong term contractPermanent employmentFlexible hours$94.91 - $113.9 per hour
...RQ00202 - Project Manager/Leader - Senior 6-month contract (131 business days) - possible extension 3 days onsite/2 days remote - 277 Front St West, Toronto, ON Must Haves: · 8+ years of experience in leading end-to-end delivery of Azure cloud application...SeniorContract workRemote work- ...Site Reliability Engineer – APM, Dynatrace, Observability Duration: 12 months Location:... ...Hybrid: 2 days in office a week SRE Lead Deep application and system-level... ...to come up with creative ways to monitor observe systems like IBM Data power where sufficient...Contract workWork at office2 days per week
$141k - $191k per year
...join our team at Thomson Reuters and develop your career. As an SRE Manager, you will lead a team of 10+ engineers, oversee their development... ...and maintenance of new features and updates. ~ Proficiency in Observability tools such as Data Dog or New Relic #LI-MW1 What’s in it...Work at officeLocal areaFlexible hours2 days per week3 days per week$250k per year
...Role: Observability Engineer – Trading Client: Elite FinTech Compensation: $120,000 - $250,000 CAD + Bonus Location: Toronto Overview My client are seeking an Engineer with strong Linux experience and expertise within the Observability space. The organisation...Permanent employmentImmediate start- ...Job Description Role: SRE/ DevOps Engineer Duration: Long Term Location: Toronto, Canada- Hybrid Role Summary... ...engineering, GitOps deployment models, security integration, and observability practices across AWS and Azure environments. The role requires...Long term contractContract work
$80.19 - $86.57 per hour
We are seeking a Senior Project Manager/Leader to drive the delivery of complex I&IT infrastructure initiatives. In this role, you will be responsible for the full lifecycle of IT enhancements, infrastructure lifecycle management, and business applications. You will act as...SeniorRemplacementRemote work$85.44 - $90.4 per hour
We are seeking a technically fluent Senior Project Manager to lead the enterprise rollout of AI enablement tools, specifically Microsoft... ...Summary If you're interested in the Senior Project Manager/Leader role based in Toronto, we encourage you to apply online at Only...SeniorContract workRemote work$104k - $215k per year
...Deloitte’s CoRe Global Business Services organization, the Product Leader provides leadership and strategic direction for Deloitte’s Global... ...needs. Partner with technology delivery team and business leaders to manage programs that modernize agent and customer experiences...SeniorPermanent employmentRemote workFlexible hours$104k - $215k per year
...matter experts through mentoring and on the job coaching -- What will your typical day look like? The Solutions Delivery Development Leader is responsible for advancing and governing the Development Practice across the Solutions Delivery organization. This leader sets...Long term contractPermanent employmentFlexible hours- ...-Jun-2026 Job Summary: To lead project teams and provide senior technical consulting and analysis, development, testing, quality... ...Your application for the role of Corporate Application Technical Leader should describe your qualifications as they relate to: #Post-secondary...SeniorHourly payLong term contractPermanent employmentFull timeTemporary workPart timeInternshipMonday to fridayShift work
- ...As the Canada Transformation Office Leader, you will help shape and deliver the region... ...role that will work closely with Canada’s senior executive team, focused on turning strategy... ...actively managed Partner with Canada business leaders to define transformation roadmaps and...SeniorWork at officeLocal areaRemote workFlexible hours3 days per week1 day per week
- ...We are Committed to Caring About Each Other, Our Communities, Our Environment. Job Description The Customer Operations Leader is responsible for leading the execution of work that contributes to the customer experience and omni sales results. They provide coaching...Full timeFlexible hoursAfternoon shift
$110k - $135k per year
...cars. Could you be the full-time Installation Engineering Leader in Toronto, ON, we’re looking for? Your future role Take... ...documentation and ensuring timely delivery Supporting project leaders in QCD (Quality, Cost, Delivery) reviews and risk management Collaborating...Long term contractFull timeLocal areaWorldwideFlexible hours- ...and building marketing that keeps customers engaged for the long haul — you might have what it takes to be our next Email Marketing Leader. This job will give you ALL the feels: At AlgaeCal, we’re driven by a single idea. To end the fear of bone loss. In the United...Full timeCasual workWork at officeFlexible hours
$96.65 - $115.98 per hour
...RQ10786 - Sr. Project Manager/Leader 1-year 10-month contract (447 business days) - possible extension ONSITE 5 days - 222 Jarvis... ...project management methodologies and frameworks Interact with senior staff within the ministry and with external partners and manage...SeniorContract workInternship$96.65 - $115.98 per hour
...RQ11121 - Sr. Project Manager/Leader 1-year contract (250 business days) - possible extension ONSITE 5 days - 222 Jarvis St... ...Overview: We are seeking a strategic and results-driven Senior Product Manager to lead the planning, development, and delivery of...SeniorLong term contractContract workTemporary work$98.3 - $117.96 per hour
...RQ00673 - Sr. Project Manager/Leader 6-month contract (130 business days - possible extension 3 days onsite/ 2 days remote (subject... ...and reviewing of work. Advises and responds to requests from senior management and Division Head on project issues, status, and...SeniorLong term contractRemplacementContract workRemote work$96.65 - $115.98 per hour
...RQ11183 - Sr. Project Manager/Leader 10-month contract - possible extension ONSITE 5 days - 222 Jarvis St, 5th Floor Must... ...project, program, and/or portfolio reporting to multi-stakeholders at senior executive levels. · Proactively identifies potential risk...SeniorContract work$95k - $125k per year
...smarter mobility worldwide, connecting cities as we reduce carbon and replace cars. Could you be the full-time SCC System V&V Leader in Toronto, ON we’re looking for? Your future role Take on a new challenge and apply your validation and verification expertise...Long term contractFull timeLocal areaWorldwideFlexible hours- ...goal - delivering an exceptional client experience. As a people leader, you'll hire, develop, and coach your team, inspiring them to perform... ...People leadership - Lead and coach a high performing team of Leaders, Advisors, and Representatives to deepen client relationships and...Full timeBank staffFlexible hoursAfternoon shift
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Observability & SRE Leader. Be the first to apply!
