Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineer (SRE) - Azure AKS, Observability & Terraform

Temporary

Astra North Infoteck Inc.

Site Reliability Engineer (SRE) – Azure AKS, Observability & Terraform

Key Responsibilities

  • Observability, SRE, DevOps roles with expertise in infrastructure and application reliability
  • Dynatrace, ELK, Splunk, PagerDuty
  • SLI/SLO frameworks
  • Azure Kubernetes Service (AKS), Terraform, Azure managed services

What will you do

  • Design and implement observability-as-code solutions using Terraform for monitoring pipelines, dashboards, and alerting across distributed systems
  • Drive observability improvements using Dynatrace, ELK, Splunk, PagerDuty for real-time performance insights and system visibility
  • Instrument applications for end-to-end observability including distributed tracing, metrics collection, and log aggregation across Node.js and .NET microservices and event-driven architectures
  • Troubleshoot complex production incidents across service layers, databases, caches, and APIs using SLI/SLO frameworks
  • Investigate and resolve Azure Kubernetes Service (AKS) infrastructure issues ensuring reliability and scalability of containerized workloads using Terraform and Azure services (SQL MI, Redis, Functions, Event Grid)
  • Translate business requirements into observable, resilient systems aligned to SLIs/SLOs
  • Automate operational tasks using Infrastructure-as-Code and CI/CD to reduce toil and improve resilience
  • Lead incident response and remediation for critical systems, including blameless postmortems and chaos engineering practices
  • Collaborate with development, platform, and business teams to improve availability, scalability, and operational excellence

What do you need to succeed

Must-have

  • 8+ years experience in SRE, DevOps, or Observability roles focused on infrastructure and application reliability
  • Strong expertise in Dynatrace, ELK, Splunk, PagerDuty and observability principles (instrumentation, correlation IDs, SLIs/SLOs)
  • Advanced proficiency in Azure Kubernetes Service (AKS), Terraform, and Azure managed services (SQL MI, Redis, Functions, Event Grid)
  • Hands-on experience with observability instrumentation (distributed tracing, metrics, logs) across Node.js and .NET microservices and event-driven systems
  • Strong troubleshooting skills across distributed systems (services, databases, caches, APIs) in production environments
  • Incident management expertise using PagerDuty and ServiceNow, including high-severity incident resolution and RCA
  • Knowledge of incident, problem, and change management, SRE principles, blameless postmortems, and chaos engineering
  • Strong communication and leadership skills for cross-functional coordination and incident handling
Vacancy posted 17 hours ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer (SRE) - Azure AKS, Observability & Terraform in Toronto, ON vacancy
  • $130k - $180k per year

     ...legally work in Canada (visa or sponsorship won't be provided) Our Platform is growing and we are looking to hire a Senior Site Reliability Engineer (SRE) / Cloud Engineer Our main Cloud Platform is Azure (those with Azure will be prioritized first) About Us: We're... 
    Suggested
    Full time
    Remote work
    Visa sponsorship
    Work visa
    Flexible hours

    acquird

    Toronto, ON
    11 hours ago
  •  ...Years of Experience: 6-8 We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of platform services. The ideal candidate will bring strong expertise in SRE practices, observability, infrastructure automation, and developer... 
    Suggested
    Contract work

    Astra North Infoteck Inc.

    Toronto, ON
    a month ago
  • $141k - $191k per year

     ...at Thomson Reuters and develop your career. As an SRE Manager, you will lead a team of 10+ engineers, oversee their development and ensure operational excellence...  .... About the Role: In this opportunity as Site Reliability Engineering Manager , you will be responsible for:... 
    Suggested
    Work at office
    Local area
    Flexible hours
    2 days per week
    3 days per week

    Thomson Reuters

    Toronto, ON
    more than 2 months ago
  •  ...Site Reliability Engineer – APM, Dynatrace, Observability Duration: 12 months Location: Toronto Hybrid: 2 days in office a week SRE Lead Deep application and system-level knowledge across complex end-to-end environments, including tightly integrated on prem... 
    Suggested
    Contract work
    Work at office
    2 days per week

    Astra North Infoteck Inc.

    Toronto, ON
    a month ago
  • $136k - $187k per year

     ...millions of users worldwide. Our commitment to reliability is a key foundation of our product and our...  ...customer availability expectations is a core engineering focus. As a Senior Site Reliability Engineer, you'll join our SRE team based in Europe to ensure our production... 
    Suggested
    Local area
    Remote work
    Worldwide

    Okta

    Toronto, ON
    12 days ago
  •  ...Role: Azure Platform Engineer - Networking, Azure DevOps, Terraform Total Experience: 6–8 Years Required Skill Set: # Design, develop, and maintain...  ...follow best practices for scalability, security, and reliability. # Troubleshoot cloud platform issues and... 
    Contract work

    Astra North Infoteck Inc.

    Toronto, ON
    17 hours ago
  • $192k - $288k per year

     ...with product squads to scale reliability best practices and design safe...  ...challenges, mentoring engineers, and shaping the long-term resilience...  ...incident recurrence. Observability Proficiency: Hands-on experience...  ...multiple teams. Background in Site Reliability Engineering (SRE)... 
    Long term contract
    Full time
    Work at office

    nubank

    Toronto, ON
    11 hours ago
  •  ...lead the architecture and development of Waabi’s monitoring and observability stack, used to monitor the health and performance of cloud and...  ...and leadership.   Qualifications: - 5+ years software engineering or systems/performance engineering experience (BS in CS/EE or... 
    Full time

    waabi

    Toronto, ON
    11 hours ago
  •  ...SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe...  ...belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage Means…  You are an engineer, a builder, and a... 
    Full time
    Work at office
    Local area
    Remote work
    Worldwide
    Monday to friday
    Flexible hours

    iManage

    Toronto, ON
    a month ago
  • $100k per year

     ...internal clusters and customer deployments. This role sits at the intersection of site reliability, infrastructure operations, and customer engineering, ensuring our systems are reliable, observable, and production-ready. This role is hybrid, based out of Toronto, ON;... 
    Permanent employment

    Tenstorrent

    Toronto, ON
    3 days ago
  •  ...Senior Observability Engineer Location: Toronto, ON (Hybrid – 4 Days Onsite) Duration: 12 Months Experience: 6–8 Years...  ...OpenTelemetry, Jaeger/Tempo GitOps (ArgoCD/FluxCD), Helm, Terraform AWS/Azure/GCP, S3/Object Storage PromQL, LogQL/Lucene, SQL... 
    Contract work

    Astra North Infoteck Inc.

    Toronto, ON
    6 days ago
  •  ...Senior Site Reliability Engineer - Edge Location : Ottawa/Toronto, On-Site Reports to: Head of Architecture & Security The role...  ...a hardened, reproducible field build with no rewrite. Observability that survives DDIL — telemetry that works within real link... 
    Full time

    dominion%20dynamics

    Toronto, ON
    11 hours ago
  •  ...driven decision-making. Technology is a strategic enabler, and reliability, security, and governance are foundational to how their...  ..., risk, and analytics teams.  Your new role   As a Site Reliability Engineer (SRE) focused on User Access & Applications, you’ll sit at the... 
    Contract work

    Hays

    Toronto, ON
    more than 2 months ago
  •  ...data, portfolio management, risk, and investment operations. Reliability, controlled change, and clear operational readiness are...  ...technology supports the business. Your new role   As a Site Reliability Engineer (SRE) – Applications, you will focus on operational... 
    Long term contract
    Contract work

    Hays

    Toronto, ON
    more than 2 months ago
  •  ...San Francisco and founded in 2014, Tubi is part of Tubi Media Group, a division of Fox Corporation. About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's... 
    Remplacement
    Full time
    Contract work
    Temporary work
    Flexible hours

    Tubi

    Toronto, ON
    more than 2 months ago
  • $75.9k - $141.9k per year

     ...Technology About the Role As a Senior Azure Platform Engineer, you will play a hands-on engineering...  ..., and reuse. Apply DevOps and SRE practices, including CI/CD, monitoring,...  ...Infrastructure as Code using GitHub Actions, Terraform, or CDKTF-based frameworks. Create... 
    Full time
    Contract work
    Part time
    Toronto, ON
    5 days ago
  •  ...Job Requirements: - 5–7+ years of experience in Data Engineering, Data Quality, Data Observability, or related disciplines. - Hands-on experience with...  ..., Monte Carlo, Databand, Soda). - Proficiency with Azure SQL, Azure networking concepts, authentication patterns... 
    Contract work
    Work at office

    Astra North Infoteck Inc.

    Toronto, ON
    5 days ago
  •  ...Job Description Role: SRE/ DevOps Engineer Duration: Long Term Location: Toronto, Canada- Hybrid   Role Summary We are seeking a highly skilled  Site Reliability Engineer (SRE) / DevOps Engineer  to support enterprise cloud and DevOps transformation initiatives... 
    Long term contract
    Contract work

    TMS LLC

    Toronto, ON
    22 days ago
  •  ...We are looking for a Database Reliability Engineer to join our team. This is not a traditional...  ...think in code, manage infrastructure via Terraform, and treat database provisioning,...  ...environments: provisioning, scaling, observability, and reliability — all through Infrastructure... 
    Permanent employment
    Full time
    Internship
    Remote work
    Worldwide

    MUFG Investor Services

    Toronto, ON
    22 days ago
  •  ...Role: Azure Serverless Engineer - APIM Job Description: We are looking for an experienced Azure Serverless Cloud Engineer to This role...  ...Implement Infrastructure as Code (IaC ) using Hashicorp CDKTF, Terraform with Typescript/CDKTF. • Create reusable patterns for... 
    Contract work

    Astra North Infoteck Inc.

    Toronto, ON
    6 days ago
  •  ...Description: A Senior Platform Engineer is responsible for designing,...  ...platforms, automation, and reliable systems. Required...  ...experience with cloud platforms (Azure preferred for enterprise environments...  ...such as CI/CD pipelines, observability, and developer tooling. ·... 
    Contract work

    Astra North Infoteck Inc.

    Toronto, ON
    12 days ago
  • $250k per year

     ...Role: Observability Engineer – Trading Client:  Elite FinTech Compensation: $120,000 - $250,000 CAD + Bonus Location:  Toronto...  ...Working with multiple technical teams to ensure visibility and reliability across systems. Key Responsibilities Monitoring Tools... 
    Permanent employment
    Immediate start

    Hunter Bond

    Toronto, ON
    a month ago
  •  ...Job Title: Platform Reliability Engineer Location: Toronto, ON Note: Prior experience...  ...will improve platform resiliency, observability, and performance while ensuring highly...  ...capacity planning. Experience with DevOps, SRE, or platform engineering practices.... 

    NavitasPartners

    Toronto, ON
    19 days ago
  •  ...Job Title: Production Reliability Engineer Location: Toronto, ON Note: Prior experience...  .... Experience with monitoring, observability, and performance management tools. Experience...  ...-critical enterprise systems. SRE, Cloud, or DevOps certification... 

    NavitasPartners

    Toronto, ON
    19 days ago
  •  ...Snr. Integration Engineer – Interac e-Transfer Modernization 1....  ...components using .NET/C# and Azure services such as Azure Functions...  ..., and logging to ensure reliable and consistent payment processing...  ...resolution. • Implement observability practices including structured... 
    Contract work

    Astra North Infoteck Inc.

    Toronto, ON
    21 days ago
  •  ...Senior Cloud Infrastructure Engineer Location: Toronto Downtown...  ...manage cloud infrastructure across Azure and Google Cloud Platform (GCP)....  ...infrastructure using IaC tools (Terraform, ARM templates) Monitor performance, reliability, and health of infrastructure... 
    Contract work

    Astra North Infoteck Inc.

    Toronto, ON
    27 days ago
  •  ...Job Summary Capgemini is looking for a Production support Engineer to work for the Commercial Line of Business. An ideal candidate...  ...developers, operations teams, and other stakeholders to ensure system reliability and availability. Documentation: Maintain clear and... 
    Permanent employment
    Full time
    Local area

    Capgemini

    Toronto, ON
    14 hours ago
  •  ...Senior .NET Azure Integration Engineer Role Descriptions: Integration Engineer Lead Essential Skills: Interac Experience Experience...  ...implement as OAuth2 | OIDC | and mTLS Implement observability practices including Development: Azure Functions | Logic... 

    Astra North Infoteck Inc.

    Toronto, ON
    9 days ago
  •  ...production support activity at a high level including ITIL (information technology infrastructure library), monitoring, DevOps, SRE (site reliability engineering), and disaster recovery. How to discuss common financial topics, including financial markets, equity trading,... 
    Permanent employment
    Relocation
    Flexible hours

    mthree Recruiting Portal

    Toronto, ON
    1 day ago
  •  ...Job Description: Job Title: Lead AI Engineer (Azure / GenAI) Location: Toronto, ON Work Style: Hybrid (2 days per week in-person at...  ...& DevOps (Azure) Deploy on Azure (App Service, Functions, AKS), integrate Entra ID, and support CI/CD, logging, and monitoring... 
    Contract work
    Work at office
    2 days per week

    Astra North Infoteck Inc.

    Toronto, ON
    26 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer (SRE) - Azure AKS, Observability & Terraform. Be the first to apply!