Site Reliability Engineer (SRE) - Azure AKS, Observability & Terraform
Temporary
Astra North Infoteck Inc.
Site Reliability Engineer (SRE) – Azure AKS, Observability & Terraform
Key Responsibilities
- Observability, SRE, DevOps roles with expertise in infrastructure and application reliability
- Dynatrace, ELK, Splunk, PagerDuty
- SLI/SLO frameworks
- Azure Kubernetes Service (AKS), Terraform, Azure managed services
What will you do
- Design and implement observability-as-code solutions using Terraform for monitoring pipelines, dashboards, and alerting across distributed systems
- Drive observability improvements using Dynatrace, ELK, Splunk, PagerDuty for real-time performance insights and system visibility
- Instrument applications for end-to-end observability including distributed tracing, metrics collection, and log aggregation across Node.js and .NET microservices and event-driven architectures
- Troubleshoot complex production incidents across service layers, databases, caches, and APIs using SLI/SLO frameworks
- Investigate and resolve Azure Kubernetes Service (AKS) infrastructure issues ensuring reliability and scalability of containerized workloads using Terraform and Azure services (SQL MI, Redis, Functions, Event Grid)
- Translate business requirements into observable, resilient systems aligned to SLIs/SLOs
- Automate operational tasks using Infrastructure-as-Code and CI/CD to reduce toil and improve resilience
- Lead incident response and remediation for critical systems, including blameless postmortems and chaos engineering practices
- Collaborate with development, platform, and business teams to improve availability, scalability, and operational excellence
What do you need to succeed
Must-have
- 8+ years experience in SRE, DevOps, or Observability roles focused on infrastructure and application reliability
- Strong expertise in Dynatrace, ELK, Splunk, PagerDuty and observability principles (instrumentation, correlation IDs, SLIs/SLOs)
- Advanced proficiency in Azure Kubernetes Service (AKS), Terraform, and Azure managed services (SQL MI, Redis, Functions, Event Grid)
- Hands-on experience with observability instrumentation (distributed tracing, metrics, logs) across Node.js and .NET microservices and event-driven systems
- Strong troubleshooting skills across distributed systems (services, databases, caches, APIs) in production environments
- Incident management expertise using PagerDuty and ServiceNow, including high-severity incident resolution and RCA
- Knowledge of incident, problem, and change management, SRE principles, blameless postmortems, and chaos engineering
- Strong communication and leadership skills for cross-functional coordination and incident handling
Vacancy posted 17 hours ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer (SRE) - Azure AKS, Observability & Terraform in Toronto, ON vacancy
$130k - $180k per year
...legally work in Canada (visa or sponsorship won't be provided) Our Platform is growing and we are looking to hire a Senior Site Reliability Engineer (SRE) / Cloud Engineer Our main Cloud Platform is Azure (those with Azure will be prioritized first) About Us: We're...SuggestedFull timeRemote workVisa sponsorshipWork visaFlexible hours- ...Years of Experience: 6-8 We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of platform services. The ideal candidate will bring strong expertise in SRE practices, observability, infrastructure automation, and developer...SuggestedContract work
$141k - $191k per year
...at Thomson Reuters and develop your career. As an SRE Manager, you will lead a team of 10+ engineers, oversee their development and ensure operational excellence... .... About the Role: In this opportunity as Site Reliability Engineering Manager , you will be responsible for:...SuggestedWork at officeLocal areaFlexible hours2 days per week3 days per week- ...Site Reliability Engineer – APM, Dynatrace, Observability Duration: 12 months Location: Toronto Hybrid: 2 days in office a week SRE Lead Deep application and system-level knowledge across complex end-to-end environments, including tightly integrated on prem...SuggestedContract workWork at office2 days per week
$136k - $187k per year
...millions of users worldwide. Our commitment to reliability is a key foundation of our product and our... ...customer availability expectations is a core engineering focus. As a Senior Site Reliability Engineer, you'll join our SRE team based in Europe to ensure our production...SuggestedLocal areaRemote workWorldwide- ...Role: Azure Platform Engineer - Networking, Azure DevOps, Terraform Total Experience: 6–8 Years Required Skill Set: # Design, develop, and maintain... ...follow best practices for scalability, security, and reliability. # Troubleshoot cloud platform issues and...Contract work
$192k - $288k per year
...with product squads to scale reliability best practices and design safe... ...challenges, mentoring engineers, and shaping the long-term resilience... ...incident recurrence. Observability Proficiency: Hands-on experience... ...multiple teams. Background in Site Reliability Engineering (SRE)...Long term contractFull timeWork at office- ...lead the architecture and development of Waabi’s monitoring and observability stack, used to monitor the health and performance of cloud and... ...and leadership. Qualifications: - 5+ years software engineering or systems/performance engineering experience (BS in CS/EE or...Full time
- ...SRE is part of a global organization that leverages the latest technology to communicate with our colleagues across the globe... ...belonging, collaboration, and accomplishment. Being a Senior Site Reliability Engineer at iManage Means… You are an engineer, a builder, and a...Full timeWork at officeLocal areaRemote workWorldwideMonday to fridayFlexible hours
$100k per year
...internal clusters and customer deployments. This role sits at the intersection of site reliability, infrastructure operations, and customer engineering, ensuring our systems are reliable, observable, and production-ready. This role is hybrid, based out of Toronto, ON;...Permanent employment- ...Senior Observability Engineer Location: Toronto, ON (Hybrid – 4 Days Onsite) Duration: 12 Months Experience: 6–8 Years... ...OpenTelemetry, Jaeger/Tempo GitOps (ArgoCD/FluxCD), Helm, Terraform AWS/Azure/GCP, S3/Object Storage PromQL, LogQL/Lucene, SQL...Contract work
- ...Senior Site Reliability Engineer - Edge Location : Ottawa/Toronto, On-Site Reports to: Head of Architecture & Security The role... ...a hardened, reproducible field build with no rewrite. Observability that survives DDIL — telemetry that works within real link...Full time
- ...driven decision-making. Technology is a strategic enabler, and reliability, security, and governance are foundational to how their... ..., risk, and analytics teams. Your new role As a Site Reliability Engineer (SRE) focused on User Access & Applications, you’ll sit at the...Contract work
- ...data, portfolio management, risk, and investment operations. Reliability, controlled change, and clear operational readiness are... ...technology supports the business. Your new role As a Site Reliability Engineer (SRE) – Applications, you will focus on operational...Long term contractContract work
- ...San Francisco and founded in 2014, Tubi is part of Tubi Media Group, a division of Fox Corporation. About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's...RemplacementFull timeContract workTemporary workFlexible hours
$75.9k - $141.9k per year
...Technology About the Role As a Senior Azure Platform Engineer, you will play a hands-on engineering... ..., and reuse. Apply DevOps and SRE practices, including CI/CD, monitoring,... ...Infrastructure as Code using GitHub Actions, Terraform, or CDKTF-based frameworks. Create...Full timeContract workPart time- ...Job Requirements: - 5–7+ years of experience in Data Engineering, Data Quality, Data Observability, or related disciplines. - Hands-on experience with... ..., Monte Carlo, Databand, Soda). - Proficiency with Azure SQL, Azure networking concepts, authentication patterns...Contract workWork at office
- ...Job Description Role: SRE/ DevOps Engineer Duration: Long Term Location: Toronto, Canada- Hybrid Role Summary We are seeking a highly skilled Site Reliability Engineer (SRE) / DevOps Engineer to support enterprise cloud and DevOps transformation initiatives...Long term contractContract work
- ...We are looking for a Database Reliability Engineer to join our team. This is not a traditional... ...think in code, manage infrastructure via Terraform, and treat database provisioning,... ...environments: provisioning, scaling, observability, and reliability — all through Infrastructure...Permanent employmentFull timeInternshipRemote workWorldwide
- ...Role: Azure Serverless Engineer - APIM Job Description: We are looking for an experienced Azure Serverless Cloud Engineer to This role... ...Implement Infrastructure as Code (IaC ) using Hashicorp CDKTF, Terraform with Typescript/CDKTF. • Create reusable patterns for...Contract work
- ...Description: A Senior Platform Engineer is responsible for designing,... ...platforms, automation, and reliable systems. Required... ...experience with cloud platforms (Azure preferred for enterprise environments... ...such as CI/CD pipelines, observability, and developer tooling. ·...Contract work
$250k per year
...Role: Observability Engineer – Trading Client: Elite FinTech Compensation: $120,000 - $250,000 CAD + Bonus Location: Toronto... ...Working with multiple technical teams to ensure visibility and reliability across systems. Key Responsibilities Monitoring Tools...Permanent employmentImmediate start- ...Job Title: Platform Reliability Engineer Location: Toronto, ON Note: Prior experience... ...will improve platform resiliency, observability, and performance while ensuring highly... ...capacity planning. Experience with DevOps, SRE, or platform engineering practices....
- ...Job Title: Production Reliability Engineer Location: Toronto, ON Note: Prior experience... .... Experience with monitoring, observability, and performance management tools. Experience... ...-critical enterprise systems. SRE, Cloud, or DevOps certification...
- ...Snr. Integration Engineer – Interac e-Transfer Modernization 1.... ...components using .NET/C# and Azure services such as Azure Functions... ..., and logging to ensure reliable and consistent payment processing... ...resolution. • Implement observability practices including structured...Contract work
- ...Senior Cloud Infrastructure Engineer Location: Toronto Downtown... ...manage cloud infrastructure across Azure and Google Cloud Platform (GCP).... ...infrastructure using IaC tools (Terraform, ARM templates) Monitor performance, reliability, and health of infrastructure...Contract work
- ...Job Summary Capgemini is looking for a Production support Engineer to work for the Commercial Line of Business. An ideal candidate... ...developers, operations teams, and other stakeholders to ensure system reliability and availability. Documentation: Maintain clear and...Permanent employmentFull timeLocal area
- ...Senior .NET Azure Integration Engineer Role Descriptions: Integration Engineer Lead Essential Skills: Interac Experience Experience... ...implement as OAuth2 | OIDC | and mTLS Implement observability practices including Development: Azure Functions | Logic...
- ...production support activity at a high level including ITIL (information technology infrastructure library), monitoring, DevOps, SRE (site reliability engineering), and disaster recovery. How to discuss common financial topics, including financial markets, equity trading,...Permanent employmentRelocationFlexible hours
- ...Job Description: Job Title: Lead AI Engineer (Azure / GenAI) Location: Toronto, ON Work Style: Hybrid (2 days per week in-person at... ...& DevOps (Azure) Deploy on Azure (App Service, Functions, AKS), integrate Entra ID, and support CI/CD, logging, and monitoring...Contract workWork at office2 days per week
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Site Reliability Engineer (SRE) - Azure AKS, Observability & Terraform. Be the first to apply!
Related searches
- site reliability engineer remote Toronto, ON
- senior site reliability engineer Toronto, ON
- site reliability engineer Toronto, ON
- site reliability engineer intern Toronto, ON
- site carpenter Toronto, ON
- website developer Toronto, ON
- site safety Toronto, ON
- site maintenance Toronto, ON
- site reliability engineer remote
- senior site reliability engineer
