Site Reliability Engineer (SRE) - Azure AKS, Observability & Terraform
Astra North Infoteck Inc.
Site Reliability Engineer (SRE) – Azure AKS, Observability & Terraform
Remote Role
Key Responsibilities
- Observability, SRE, DevOps roles with expertise in infrastructure and application reliability
- Dynatrace, ELK, Splunk, PagerDuty
- SLI/SLO frameworks
- Azure Kubernetes Service (AKS), Terraform, Azure managed services
What will you do
- Design and implement observability-as-code solutions using Terraform for monitoring pipelines, dashboards, and alerting across distributed systems
- Drive observability improvements using Dynatrace, ELK, Splunk, PagerDuty for real-time performance insights and system visibility
- Instrument applications for end-to-end observability including distributed tracing, metrics collection, and log aggregation across Node.js and .NET microservices and event-driven architectures
- Troubleshoot complex production incidents across service layers, databases, caches, and APIs using SLI/SLO frameworks
- Investigate and resolve Azure Kubernetes Service (AKS) infrastructure issues ensuring reliability and scalability of containerized workloads using Terraform and Azure services (SQL MI, Redis, Functions, Event Grid)
- Translate business requirements into observable, resilient systems aligned to SLIs/SLOs
- Automate operational tasks using Infrastructure-as-Code and CI/CD to reduce toil and improve resilience
- Lead incident response and remediation for critical systems, including blameless postmortems and chaos engineering practices
- Collaborate with development, platform, and business teams to improve availability, scalability, and operational excellence
What do you need to succeed
Must-have
- 8+ years experience in SRE, DevOps, or Observability roles focused on infrastructure and application reliability
- Strong expertise in Dynatrace, ELK, Splunk, PagerDuty and observability principles (instrumentation, correlation IDs, SLIs/SLOs)
- Advanced proficiency in Azure Kubernetes Service (AKS), Terraform, and Azure managed services (SQL MI, Redis, Functions, Event Grid)
- Hands-on experience with observability instrumentation (distributed tracing, metrics, logs) across Node.js and .NET microservices and event-driven systems
- Strong troubleshooting skills across distributed systems (services, databases, caches, APIs) in production environments
- Incident management expertise using PagerDuty and ServiceNow, including high-severity incident resolution and RCA
- Knowledge of incident, problem, and change management, SRE principles, blameless postmortems, and chaos engineering
- Strong communication and leadership skills for cross-functional coordination and incident handling
- ..., and proactively jump in to solve problems. Our actions reflect our values of honesty, reliability, openness, and humility. Your Role: As a Senior Site Reliability Engineer (SRE) at Treasure.ai , you will be instrumental in helping the organisation reach its full...SuggestedContract workWork at officeNight shift
$120k - $140k per year
...forward-thinking company solving real-world challenges to improve how farming works, today and for the future. As a Senior Site Reliability Engineer (SRE), you will play a key role in ensuring the scalability, reliability, and performance of our infrastructure and services....SuggestedFull timeWork at officeImmediate startWork from home$43.79 - $58.39 per hour
We are seeking a highly progressive Platform Engineer specializing in AI infrastructure and agentic execution environments to... ...In this role, you will bridge the gap between traditional Site Reliability Engineering (SRE) and cutting-edge Agentic AI operations. You will design,...SuggestedContract work$46.22 - $61.63 per hour
...forward-thinking Senior Data Engineer / MLOps Platform Lead to pioneer... ...orchestration, and system observability supporting the end-to-end machine... ...while ensuring the absolute reliability, security, and performance of... ...: Unity Catalog, serverless Azure frameworks, and modern IaC...SuggestedLong term contractContract work$111k per year
...reflect our values of honesty, reliability, openness, and humility.... ...force multipliers to reduce engineering toil, and freeing the team to... ...in tight feedback loops with SRE, Cloud Governance, Product Security... ...production-grade Python, Terraform, or TypeScript to prototype controls...SuggestedContract workWork at officeNight shift$112.2k - $147.2k per year
...About This Team lululemon Engineering is dedicated to building secure, reliable, and performant products for our... ...philosophies of Agile, DevOps, and SRE to accelerate our development process... .... ~ Working knowledge of observability and security tooling including CloudWatch...Permanent employmentPart time- ...Join INTRALOT as a DevOps Engineer At INTRALOT, we shape the future... ...industry with scalable, reliable, and cutting-edge systems. Here... ...using GitOps, IaC, and advanced observability to keep our systems stable,... ...optimize infrastructure using Terraform, Helm, AWS CloudFormation and...Permanent employmentFull timeWorldwide
- ...We are looking for a DevOps Engineer to help scale our cloud and on... ...Infrastructure-as-Code (IaC) frameworks (Terraform, CloudFormation or equivalent... ...systems on GCP, AWS or Azure Integrate DevSecOps... ...own performance monitoring and observability for distributed systems (...Full timeWork at officeFlexible hours
$46.22 - $61.63 per hour
...seeking a highly technical, delivery-focused Data Engineer with a specialized passion for data quality automation... ...Infrastructure: Extensive experience leveraging Azure Cloud Services (Azure Data Factory, Azure Functions, AKS, Docker). Orchestration & CI/CD Tooling:...Long term contractContract work$72k - $85k per year
...provide support for two distinct environments: Azure and a private cloud. However, there are... ...and support to the operations and engineering teams. Help to ensuring high availability... ...environments using technologies like VMware, AKS, and Docker. Good working knowledge of...Full timeCasual workWork at officeImmediate start$144.6k - $322.5k per year
.... Expert MLOps Platform Engineer Job Responsibilities:... ...cloud-native ML platforms (e.g., Azure ML, AWS SageMaker) to architect... ...Infrastructure as Code using Terraform or similar tools. ~ Working... ...Istio. Knowledge of policy engines like Kyverno or OPA for Kubernetes...Long term contractPermanent employmentFull timeLocal areaWorldwideFlexible hoursNight shiftWeekend work- ...reflect our values of honesty, reliability, openness, and humility.... ...force multipliers to reduce engineering toil, and freeing the team to... ...in tight feedback loops with SRE, Cloud Governance, Product Security... ...production-grade Python, Terraform, or TypeScript to prototype controls...Contract workWork at officeNight shift
$41.36 - $55.14 per hour
Are you a Senior Network Engineer looking for a new opportunity?... ...implement improvements using observability tools (e.g., SolarWinds, Splunk... ...networking (e.g., Aviatrix, AWS, Azure) - Network security (e.g.,... ...scripting (e.g., Python, Ansible, Terraform) is a plus. • Familiarity...Permanent employmentContract workLocal areaImmediate start- ...Join INTRALOT as a Platform Engineer - Cloud Networking (AWS) - Powering... ...industry with scalable, reliable, and cutting-edge systems. Here... ...infrastructure using Terraform, Ansible, CloudFormation, and... ...peer reviewed deployments. 🔎 Observe, Measure & Troubleshoot Traffic...Permanent employmentFull timeWorldwide
$132.6k - $174k per year
.... about this team lululemon Engineering is dedicated to building secure, reliable, and performant products for our guests... ...of Agile, DevOps, and SRE to accelerate our development process... ...hands-on with AWS services using Terraform and Infrastructure-as-Code best practices...Permanent employmentPart time- ...Join INTRALOT as a Platform Engineer – AWS Infrastructure - Powering... ...industry with scalable, reliable, and cutting-edge systems. Here... ...CloudWatch and centralized observability tools . Troubleshoot infrastructure... ...as Code using Terraform and/or AWS CloudFormation ....Permanent employmentFull timeWorldwide
$100k - $135k per year
...Site Superintendent – Vancouver At Novacom Building Partners , we’re more than a general contractor — we’re a team redefining what construction can be. Based in Surrey, BC , we deliver innovative, people-focused commercial projects across financial institutions,...For contractorsImmediate start$120k - $140k per year
...Site Superintendent *This is not an active job. We work with many local general contractors... ...or diploma in construction management, engineering, or another relevant field preferred ~... ...valid driver’s license and access to a reliable vehicle ~5+ years’ experience in the...Contract workFor contractorsFor subcontractorInternshipWork at officeLocal areaRelocationFlexible hours- TRS Staffing Solutions has an exciting opportunity for a Site Front Desk Administrator to support our clients’ remote camp operation. This... ...TRS Staffing Solutions (Canada) Inc. specializes in supplying engineers, designers, project managers, and other technical and professional...Contract workWork at officeRemote workWorldwide
- **_Remote | Full-Time or Contract | Engineering Agents Only Technology_** **Build the Technology... ...with strong organizational habits * Reliable setup for remote work **Bonus Points For... ...backend frameworks * Familiarity with AWS, Azure, or Google Cloud * Proficiency with Git...Long term contractFull timeContract workRemote workFlexible hours
$36.49 - $48.66 per hour
...seeking a highly analytical and technical SDET Quality Engineer specializing in Data Quality and Observability to join our team in Vancouver. In this role, you will... ...interacting with enterprise cloud data systems (Azure, AWS, or GCP). Data Fundamentals: Solid conceptual...Long term contractContract workWork at office- ...reflect our values of honesty, reliability, openness, and humility.... ...our Hadoop/Hive & Trino query engines and expand from there into our... ...documentation Partner with SRE to automate cluster operations... ...Design and implement observability solutions, including health metrics...Contract workWork at officeNight shift
$125k - $160k per year
...of the Earth, modern software engineering, machine learning, and cloud... ...capable of delivering automated, reliable, meaningful analytics from... ...as a next generation Earth observation platform. Our belief is that... ...cloud-based environment (AWS, Azure, GCP) ~+1 for AWS Certifications...Full timeWork at officeRemote workWork from homeFlexible hours- ...an experienced Senior DevOps Engineer to join our engineering team at... ...infrastructure, ensuring system reliability, and implementing disaster... ...Collaboration: ~ Work closely with SRE team on infrastructure... ...code tools, particularly Pulumi, Terraform or Cloud Formation. ~ Strong...Full timeInternshipWork at officeDay shift
$80k - $130k per year
...clients. As an Intermediate Mechanical Engineer , you will work with AtkinsRéalis... ...part of corporate QA/QC program. Perform site investigations safely including travelling... ...to access ceiling space, lowering self to observe mechanical services low to ground, and committing...Full timeFor subcontractorInternshipWork at officeLocal areaFlexible hours$179k - $200k per year
...Requirements Staff Platform Engineer- Network Engineering Who... ...technology teams to deliver scalable, reliable, and secure connectivity... ...using tools such as Python, Terraform, or CloudFormation to improve... ...~ Experience with network observability and operational tooling, using...Long term contractPermanent employmentPart timeWorldwide$76.2k - $176.59k per year
...seeking an experienced Data Engineer with strong expertise in Databricks... ...workflows, and ensuring the reliability and quality of data systems... ...with cloud platforms (Azure preferred; AWS/GCP acceptable... ...are responsible for building reliable and scalable data infrastructure...Permanent employmentFull timeLocal area$43 per hour
...database design and querying Experience developing responsive, mobile-friendly web applications Experience working with Microsoft Azure services for application hosting and development Familiarity with Azure DevOps and modern development workflows Strong...Long term contractPermanent employmentContract workTemporary workCasual workManual laborLocal areaImmediate startRemote workWork from homeFlexible hours$132.8k - $247.2k per year
...party (Dynamics), and third-party solutions (through Azure Communication Services). The platform enables reliable and high-quality messaging, meeting, and audio/... ...and Values Responsibilities As a Principal Engineering Manager, you will: Act as a key member of our mission...Full timeLocal area- ...focus on developer experience, reliability, and automation. In this role... ...capabilities while embedding observability, governance, resilience, and... ..., cloud infrastructure or SRE roles. ~ Strong programming... ...with ArgoCD, Helm, and IaC (Terraform or Crossplane). ~ Hands-on...Long term contractContract workFor contractorsWork at office
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Site Reliability Engineer (SRE) - Azure AKS, Observability & Terraform. Be the first to apply!
- site reliability engineer intern Vancouver, BC
- senior site reliability engineer Vancouver, BC
- site reliability engineer Vancouver, BC
- site reliability engineer remote Vancouver, BC
- site maintenance Vancouver, BC
- site carpenter Vancouver, BC
- website developer Vancouver, BC
- site safety Vancouver, BC
- site reliability engineer intern
- site reliability engineer sre


