Senior Manager, Site Reliability Engineering
$164.6k - $235.1k per yearTubi - Canada
About the Role:
Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's mindset and toolkit to the challenges of building and running large-scale, distributed systems. Our mission is to engineer resilience from the ground up, enabling our product teams to innovate rapidly while ensuring our users have a stellar experience. We own the availability, latency, performance, and capacity of our platform, and we achieve our goals through a culture of data-driven decision-making, blameless learning, and relentless automation.
We are seeking an experienced and visionary Senior SRE Manager to lead and grow our newly built Site Reliability Engineering team. You are more than a people manager or a tech lead; you are the strategic leader responsible for architecting our reliability roadmap. You will build and mentor a team of talented engineers, foster a culture of blameless learning and continuous improvement, and champion the engineering practices that allow us to balance rapid innovation with rock-solid stability. You will be a key influencer in our engineering leadership, partnering with peers across the organization to ensure reliability is a shared responsibility and a core tenet of our engineering culture.
What You'll Do:
- Team Leadership & Mentorship:
- Lead, mentor, and grow a team of Site Reliability Engineers. Foster a culture of innovation and technical excellence where engineers feel empowered to do their best work. Provide personalized coaching, create professional development plans, and guide the careers of senior and emerging talent within the team.
- Establish equitable, sustainable on-call practices (including global coverage where applicable) that protect focus time and avoid burnout.
- Define team rituals - runbook reviews, game days, and incident retros - that reinforce quality and learning.
- Strategic Planning & Vision: Define and drive the multi-year technical strategy and vision for Tubi’s observability, and automation platforms. Partner with infra lead to align Tubi’s infrastructure & SRE roadmap. Partner with tech leaders to align the SRE roadmap with business objectives. Champion a data-driven approach to reliability, using Service Level Objectives (SLOs) and error budgets to facilitate productive conversations about risk and feature velocity.
- Operational Excellence & Incident Management:
- Own the end-to-end availability, performance, and efficiency of our critical user-facing services. Evolve our incident response practice to reduce Mean Time to Resolution (MTTR) and Mean Time Between Failures (MTBF). Champion a rigorous, blameless, and data-driven post-mortem culture to ensure we learn from both successes and failures, driving eng teams for systemic fixes and automation to prevent the recurrence of incidents.
- Streamline and improve our existing processes and practices, and collaborate with other teams to enhance our production release standards by improving current processes.
- Define and tune a 24×7 on-call rotation for low noise and fast response; act as executive escalation partner during major incidents.
- Own disaster-recovery strategy (playbooks, failover drills, recovery simulations) and track SLO gaps with time-bound remediations.
- Financial & Vendor Management: Own the SRE budget, tooling, and headcount. Manage relationships with key third-party vendors for our observability and SRE related AI platforms, work with infra lead and finance team for contract negotiations and ensure we derive maximum value from our investments.
- Cross-Functional Collaboration: Act as a key influencer and strategic partner to leaders in Software Engineering, Product Management, and Infra/Sec. Drive the adoption of SRE best practices and principles throughout the organization, ensuring new services are designed for reliability, scalability, and observability from day one.
- The AI Mandate : Building the Future of Observability with AI. You will not just manage a team that uses AI; you will lead the charge in building an AI-native SRE function. This is a strategic mandate that requires a forward-thinking leader who understands both the potential and the pitfalls of integrating intelligent systems into critical operations. This includes:
- AIOps Strategy Development: Developing and executing the strategy for integrating AIOps and machine learning into our observability stack. Your goal will be to move the team from a reactive monitoring posture to one of predictive maintenance and automated anomaly detection, fundamentally changing how we ensure reliability.
- Accelerating Automation with AI: Championing the effective and responsible use of AI-assisted coding tools (e.g., Claude Code, Cursor) within the SRE team. You will set the standards and practices to leverage these tools to accelerate the development of automation, operational tooling, and infrastructure code.
- Building the Business Case: Building the techno-economic case for new AI tooling, managing vendor relationships, and ensuring the cost-effective and secure implementation of these powerful systems. You must be able to articulate the ROI of these investments in terms of reduced downtime, improved operational efficiency, and faster incident resolution.
- Fostering Critical AI Literacy: Fostering a culture that can critically evaluate, debug, and learn from the outputs of AI systems. This involves extending our blameless post-mortem philosophy to AI-driven actions and recommendations, ensuring that the team remains in control and understands the "why" behind automated decisions.
Your Background:
- 8+ years of experience in a technical field, with at least a year in an engineering leadership position managing SRE, DevOps, or Production Engineering teams.
- A deep, principled understanding of SRE tenets, including Service Level Indicators (SLIs), SLOs, error budgets, toil reduction, and capacity planning.
- Exceptional communication, negotiation, and influencing skills, with the ability to articulate complex technical concepts and strategies to both technical and non-technical stakeholders at all levels of the organization.
- A strong technical background as a hands-on software engineer or site reliability engineer prior to moving into management. Deep knowledge of AWS services (especially networking, IAM, EKS, ALBs/NLBs, Route 53, CloudWatch). Proven experience with Kubernetes in production (EKS preferred), including service exposure, networking, and availability engineering.
- Hands-on familiarity with modern SRE tools and technologies, including Infrastructure as Code (e.g., Terraform, Ansible), container orchestration (Kubernetes), observability platforms (e.g., Prometheus, Grafana, Datadog, Splunk), and incident tooling (e.g., PagerDuty, FireHydrant), deployment-safety tooling (e.g., Argo Rollouts, LaunchDarkly), and observability standards (e.g., OpenTelemetry).
#LI-BT1
#LI-Hybrid
Pursuant to local pay disclosure requirements, the pay range for this role, with final offer amount dependent on education, skills, experience, and location is as listed annually below.
This role is also eligible for an annual discretionary bonus, long-term incentive plan, and various benefits including medical/dental/vision, insurance, vacation/paid time off and other benefits in accordance with applicable plan documents.
Toronto, Canada
$164,600—$235,100 CAD
Tubi Media Group is a division of Fox Corporation, and the FOX Employee Benefits summarized here, covers the majority of employee benefits. The following distinctions below outline the differences between the Tubi and FOX benefits:
For all salaried employees, in lieu of the FOX Vacation policy, Tubi offers a Flexible Time Off Policy to manage all personal matters.
For all full-time, regular employees, in lieu of FOX Paid Parental Leave, Tubi offers a generous Parental Leave Program, which allows parents twelve (12) weeks of paid bonding leave (top up in Canada) within the first year of birth, adoption, surrogacy, or foster placement of a child in addition to applicable government leave program(s) and FOX’s short-term disability policy (if applicable). This time is 100% paid through a combination of any applicable government leaves and wage-replacement programs in addition to contributions made by Tubi.
For all full-time, regular employees, Tubi offers a monthly wellness reimbursement.
About Tubi:
Boldly built for every fandom, Tubi is a free streaming service that entertains over 100 million monthly active users. Tubi offers the world's largest collection of Hollywood movies and TV shows, thousands of creator-led stories and hundreds of Tubi Originals made for the most passionate fans. Headquartered in San Francisco and founded in 2014, Tubi is part of Tubi Media Group, a division of Fox Corporation.
We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, gender identity, disability, protected veteran status, or any other characteristic protected by law. We will consider for employment qualified applicants with criminal histories consistent with applicable law.
$136k - $187k per year
...hundreds of millions of users worldwide. Our commitment to reliability is a key foundation of our product and our dedication to exceeding customer availability expectations is a core engineering focus. As a Senior Site Reliability Engineer, you'll join our SRE team based in...SeniorLocal areaRemote workWorldwide$110k - $120k per year
...recognition programs that celebrate your impact The Job: Site Reliability Engineer The Site Reliability Engineer is responsible for ensuring... ...reporting, SLO performance metrics, and incident trends to senior management. What You Bring : Technical Proficiency:...SeniorTemporary workInternshipWork at officeRemote work- ...modern, IoT-enabled, cloud-based tool for reliability, safety, and operations of physical... ...2.5 billion. We’re looking for a Site Reliability Engineer to help advance MaintainX’s... ...concepts (SLOs, error budgets, incident management) ~3–5+ years in software development...Suggested
- ...best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage Means… You are an engineer, a builder, and a systems thinker. You’ll create middleware and platform...SeniorFull timeWork at officeLocal areaRemote workWorldwideMonday to fridayFlexible hours
$100k per year
...growing our team and looking for contributors of all seniorities. Tenstorrent is building large-scale AI... ...deployments. This role sits at the intersection of site reliability, infrastructure operations, and customer engineering, ensuring our systems are reliable, observable,...SuggestedPermanent employment$192k - $288k per year
...partnering with product squads to scale reliability best practices and design safe deployment... ...complex architectural challenges, mentoring engineers, and shaping the long-term resilience... ...across multiple teams. Background in Site Reliability Engineering (SRE) Familiarity...Long term contractWork at office$100k - $125k per year
...We are seeking an experienced and motivated Software Engineer to join our dynamic Site Reliability Engineering (SRE) team. As a Site Reliability Engineer... ...procurement, employee expenses, corporate cards, supplier management, tax compliance, and treasury. Tipalti partners with...Work at officeFlexible hours$141k - $191k per year
...Do you have experience in Service Management, working with cloud providers, software... ...SRE Manager, you will lead a team of 10+ engineers, oversee their development and ensure operational... ...the Role: In this opportunity as Site Reliability Engineering Manager , you will be...Work at officeLocal areaFlexible hours2 days per week3 days per week- ...platforms across capital markets, investment management, and data-driven decision-making. Technology is a strategic enabler, and reliability, security, and governance are... ...analytics teams. Your new role As a Site Reliability Engineer (SRE) focused on User Access &...Contract work
- ...Years of Experience: 6-8 We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of platform... ...frameworks and observability design Experience in incident management and performance engineering Strong understanding of DORA...Contract work
- ...operational complexities of today’s investment management landscape. As a division of Mitsubishi... ...We are looking for a Database Reliability Engineer to join our team. This is not a traditional... ...? Take a look at our careers site and you’ll find everything you’d expect...SeniorPermanent employmentFull timeInternshipRemote workWorldwide
- ...functions including trading, market data, portfolio management, risk, and investment operations. Reliability, controlled change, and clear operational... ...supports the business. Your new role As a Site Reliability Engineer (SRE) – Applications, you will focus on operational...Long term contractContract work
- ...Site Reliability Engineer – APM, Dynatrace, Observability Duration: 12 months Location: Toronto Hybrid: 2 days in office a week SRE Lead Deep application and system-level knowledge across complex end-to-end environments, including tightly integrated on prem...Contract workWork at office2 days per week
$150k - $170k per year
...Description General Information: Job Title: Senior Engineering Manager Location: Toronto, ON (Onsite/Hybrid) Job Type: Full-Time... ...architecture decisions for scalability, security, maintainability, reliability, and cost efficiency. Partner with Technical Leads to...SeniorLong term contractFull timeWork at officeImmediate startRemote workFlexible hours- ...Peter Lucas Project Management invests in people, community, and cutting-edge technology... ...currently looking for an Operational Reliability Project Engineer to join our team. At Peter Lucas, we... ...River, Ontario, in a full-time, on-site Monday–Friday role . This position thrives...Full timeInternshipMonday to friday
$80k per year
...We have been recognized by Deloitte as one of Canada’s Best Managed Companies and by Waterstone Human Capital as one of Canada’s... ...upskilling within a quickly growing company JOB OVERVIEW: As the Site Manager, you will be responsible for managing day-to-day janitorial...Permanent employmentFull timeContract workWork at officeFlexible hours$100k - $115k per year
...As a construction site inspector, you will work on medium-to-large... ...infrastructure projects. As an Intermediate /Senior Site Inspector , based out of our Toronto... ...all parties on a project, including the engineering design team, Jacob’s site team, client, contractors...SeniorFull timeContract workFor contractorsSummer workWork at office$95k - $130k per year
...candidates can work from anywhere in Canada. The Platform Engineering Senior Manager leads a team of business analysts and platform engineering... ...flows, and non-functional requirements to ensure secure, reliable, and maintainable solutions. ~ Create and maintain architecture...SeniorFull timeFor contractorsHome officeFlexible hours$110k - $130k per year
...sense of belonging at Quince. THE ROLE Senior Site Merchandiser We are seeking a Senior... ...conversion. This role is responsible for managing day-to-day content updates, homepage... ...follow-through Responsibilities Own and manage day-to-day homepage updates, ensuring...SeniorSeasonal workWork at officeLocal area$60k - $65k per year
...Job Responsibility: Assistant Site Manager Job Overview: Reporting to the Site Manager, the Assistant Site Manager will be an employee of the Aurum Group of Companies and will be working with one of the biggest clients of Aurum. The Assistant Site Manager (ON) is responsible...Permanent employmentFull timeContract workShift work$50k - $60k per year
...Commercial Cleaning Services (CCS) is seeking a Site Manager for one of our facilities in the GTA. The Site Manager is responsible for the daily management of janitorial services, staff performance, and service delivery excellence. CCS is a rapidly growing Building Service...RemplacementFull timeFor contractorsWork at office$216k - $270k per year
...in the mapping business for decades. Our engineering team is growing rapidly, and we are looking for Engineering Managers to help us scale. Our engineers are smart, flexible... ...processes lean. Responsibilities : Manage a rapidly growing team of engineers developing...SeniorHourly payLong term contractTemporary workWork at officeFlexible hours3 days per week$94.6k - $176k per year
...the team Data and AI Technology (DAT) Engineering supports BMO's Digital-First, risk, regulatory... ...want to be in. What will you do Manage a team of 20+ FTE and global resources... ...level and serves as a specialist resource to senior leaders and stakeholders. What do you...SeniorFull timeContract workPart timeWork at office2 days per week- ...Job Title: Production Reliability Engineer Location: Toronto, ON Note: Prior experience in BFSI, Public Sector, or Telecom... ...service availability. Lead root cause analysis and problem management activities. Support telecom operations, network services,...
$50k - $54k per year
...people-driven parking solutions provider. Our workforce of 8,000 manages 3,400 high-density parking facilities across 500 North American cities... .... What You'll Do We are seeking motivated and customer-centric Site Managers to lead our parking operations in various locations....Full time$127.73k - $138.48k per year
...Job ID: 62727 Job Category: Engineering & Technical Division & Section: Solid Waste Management Services, SWM Transfer Stations... ...11-JUN-2026 to 25-JUN-2026 Senior Engineer (Plant Maintenance)... ...reports and statements. ~Performs site inspections and assessments of changed...SeniorLong term contractPermanent employmentFull timeContract workTemporary workFor contractorsInternshipMonday to fridayShift work- ...world. The opportunity We are seeking a self-driven Senior Manager Cloud Engineering to plan and design the technology solution for client... ...which are optimal, secure, efficient, scalable, resilient and reliable, and at the same time. are compliant with EY cloud...SeniorShift work
- ...Job Title: Platform Reliability Engineer Location: Toronto, ON Note: Prior experience in BFSI, Public Sector, or Telecom is non-negotiable . Position Overview: Seeking an experienced Platform Reliability Engineer with a strong background in BFSI,...
$154k - $192.5k per year
...Role: We are looking for a skilled Engineering Manager to lead our App Layer Platform team. This... ...a critical role focused on providing a reliable, scalable and secure platform for Aiven... ...leading diverse teams, from junior to senior engineers, to successfully deliver software...SeniorLocal area$140.6k - $190.6k per year
...Job Description Senior Manager, Software Engineering — Search Platform Overview of the Role The CoCounsel Legal Integrations team is seeking... ...problem worth solving, and holds itself to the same reliability and developer-experience standards as any best-in-class...SeniorFull timeInternshipWork at officeLocal areaFlexible hours2 days per week3 days per week
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Manager, Site Reliability Engineering. Be the first to apply!
- manufacturing production manager Toronto, ON
- industrial production manager Toronto, ON
- directeur production Toronto, ON
- event production manager Toronto, ON
- video production manager Toronto, ON
- vfx production manager Toronto, ON
- responsable maintenance industrielle Toronto, ON
- responsable équipe production Toronto, ON
- production planning manager Toronto, ON
- directeur industriel Toronto, ON
