Cloud Site Reliability Engineer
Astra North Infoteck Inc.
Job Description: Skills: Dynatrace, Observability, Monitoring Engineering, SRE Practices
Experience: 6-8 years
Job Description
We are seeking a highly skilled Dynatrace Monitoring Engineer / Site Reliability Engineer (SRE) responsible for designing, implementing, and maintaining observability solutions across enterprise applications and infrastructure. This role focuses on proactive monitoring, performance visibility, incident prevention, and enforcing reliability standards through service-level objectives (SLOs). The ideal candidate brings deep Dynatrace expertise along with strong troubleshooting, communication, and architectural awareness.
Key Responsibilities
Dynatrace Engineering & Monitoring
Design, configure, and maintain Dynatrace dashboards, alerting rules, and synthetic monitoring for business-critical URLs.
Build customized dashboards for:
Application Performance (APM)
Infrastructure monitoring (hosts, processes, services) Kubernetes & cloud workloads Business metrics & SLA/SLO insights
Use DQL (Dynatrace Query Language) to create advanced tiles, analytic views, and metric visualizations.
Standardize dashboards to be reusable, scalable, and aligned with business KPIs.
Observability & SRE Practices
Define and manage Service Level Objectives (SLOs) to measure availability, reliability, and operational performance.
Exercise key SRE decision rights (e.g., rejecting operationally substandard software, advising developers on improvements).
Implement observability requirements ensuring systems meet expected service levels with proper operational characteristics.
Focus on reliability, scalability, and performance of production computing systems, including complex distributed systems.
Develop observability standards that ensure predictable system behavior and early detection of errors or failures.
Incident Management & Problem Resolution
Conduct root cause analysis (RCA) through post ‑ mortem reviews, ensuring permanent remediation and preventing recurrence.
Provide strong troubleshooting for application, infrastructure, and integration-level monitoring issues.
Integrate Dynatrace and monitoring workflows with ITSM platforms.
Cross ‑ Functional Collaboration
Work closely with infrastructure, application, cloud, and security teams to ensure seamless operational monitoring.
Lead or contribute to enterprise-wide initiatives as a subject matter expert.
Interact with governance, audit, compliance, and risk groups to provide observability insights and ensure adherence to standards.
Identify emerging technologies and propose innovative enhancements to monitoring and reliability engineering practices.
Essential Skills
Strong hands-on experience with Dynatrace SaaS/Managed, including dashboard creation, alert configuration, and synthetic monitoring.
Strong understanding of APM concepts, infrastructure monitoring, cloud monitoring, and (preferably) Kubernetes/microservices environments.
Familiarity with DQL, metrics, entity models, and relationships within Dynatrace.
Experience integrating Dynatrace or similar monitoring tools with ITSM systems.
Excellent troubleshooting and communication skills.
Strong foundation in networking, reliability engineering, scalability, and cloud operational characteristics.
Ability to drive SRE practices such as:
SLO creation
Release readiness assessments
Operational risk evaluation
Continuous improvement through automation and observability standards
- ...Site Reliability Engineer - Dynatrace & Ansible Required Skills & Experience (Mandatory) ~5–8 years of experience in SRE | DevOps | or Platform Engineering roles ~ Strong hands-on experience with Dynatrace for observability and monitoring ~ Strong hands-...SuggestedFull time
- ...incident response to raise the reliability and transparency of our... ...end observability stack across Dynatrace, Splunk, Power BI, and Google... ...service health and NOT platform engineering or DevOps provisioning. Is... ...to, an accessible interview site, alternate format documents,...SuggestedFlexible hours
$100.9k - $131.1k per year
...initiatives, contributing directly to the code base, guiding services to production readiness, and building common tooling for all of engineering. We are all curious folks and strive to be constantly learning! This role follows a hybrid schedule, with in-office work required...SuggestedLong term contractPermanent employmentTemporary workManual laborWork at officeRemote workFlexible hours- ...Job Description: Site Reliability Engineer (SRE) – Observability Toronto - Hybrid (1-2 days office) Role Summary We are looking for... ...applications into observability platforms (e.g., Dynatrace, ELK, Datadog) • Configure dashboards, alerts, and basic anomaly...SuggestedContract workWork at office
- ...best of both work styles in a workplace that is intentional about belonging, collaboration, and accomplishment. Being a Site Reliability Engineer – Data Services at iManage Means… You are an engineer, a builder, and a systems thinker. You ensure data durability, optimize...SuggestedFull timeWork at officeLocal areaRemote workWorldwideMonday to fridayFlexible hours
- ...Job Description What is the opportunity? Join RBC as a Lead Site Reliability Engineer and take the lead in ensuring the reliability, scalability, and performance of our critical production systems and infrastructure. This is your chance to drive innovation through cutting...Full timeRemote work
$50 per hour
...Role: Site Reliability Engineer - Production Support Rate Max for $50/hr. Position Overview seeks a skilled and experienced Production Support Engineer through vendor staffing to support our digital applications. This role combines hands-on production support with...Contract work$72k - $138k per year
...mentoring and on the job coaching Summary As a Senior Site Reliability Engineer – Production Management, you will design, deliver, and... ...using industry‑standard tooling (e.g. Datadog, AppDynamics, Dynatrace). Develop and operate services that rely on middleware technologies...Temporary workFixed term contractFlexible hours$141k - $191k per year
...and develop your career. As an SRE Manager, you will lead a team of 10+ engineers, oversee their development and ensure operational excellence. About the Role: In this opportunity as Site Reliability Engineering Manager , you will be responsible for: Team Leadership...Work at officeLocal areaFlexible hours2 days per week3 days per week- ...other’s unique experiences and embrace the flexibility to do your best work. Creating a career you love? It’s Possible. The Site Reliability Engineering organization at Pinterest is accountable for ensuring overall Pinterest availability as well as enhancing Engineering teams...Work at officeLocal areaRelocationRelocation package
- ...San Francisco and founded in 2014, Tubi is part of Tubi Media Group, a division of Fox Corporation. About the Role: Site Reliability Engineering (SRE) at Tubi is not a traditional operations team. We are a software engineering organization that applies a developer's mindset...RemplacementFull timeContract workTemporary workFlexible hours
- ...Title: Site Reliability Engineer (Production Support & Incident Management) Role: Site Reliability Engineer Location: Toronto... ...of tools and languages, such as: • DevOps CI/CD • Dynatrace • Splunk • PagerDuty • ServiceNow • Software...Contract work
- ...Description What is the opportunity? This role will be responsible for the development, implementation, and support of Site Reliability Engineering (SRE) solutions for applications supported by the Digital Branch SRE organization. As the Engineering arm of the Digital...Full timeFlexible hours
$107k - $157.3k per year
...highly motivated and experienced Senior Site Reliability Developer (SRE) to manage critical... ...cloud infrastructure. Reporting to the Engineering Manager, you will be leading design and... ...with monitoring and logging tools such as Dynatrace, Grafana, DataDog, ELK Stack, and CloudWatch...Full timeFor contractors- ...Production Support Engineer / SRE Work Mode: 4 days Onsite Production Support... ...and database issues using: · Dynatrace · OpenShift (OCP) · Elastic / Kibana... ...· Experience with DevOps and Site Reliability Engineering tools such as: Helios, UCD...Full time
- ~ University degree in an Engineering discipline or technologist course relevant to job/equipment function. ~3 to 5 years working as... ...record of at least 2 to 3 years demonstrating application of Reliability methods and analysis ~ Experience in controlling and being accountable...Local area
- ...Job Title: Rotating Engineer – Offshore Reliability Experience: Minimum 12 Years Qualification: Bachelor’s Degree in Mechanical Engineering Industry: Oil & Gas / Refinery (Offshore) Work Location : Saudi Arab Job Description: The Rotating Engineer – Offshore...Permanent employmentFull time
- ...Job Title: Rotating Engineer – Onshore Reliability Experience: Minimum 12 Years Qualification: Bachelor’s Degree in Mechanical Engineering... ...ensure safe and efficient plant operations. # Ensure reliable operation and optimal performance of rotating equipment including...Permanent employmentFull time
- ...Job Title: Mechanical Engineer – Onshore Reliability Experience: Minimum 12 Years Qualification: Bachelor’s Degree in Mechanical Engineering... ..., and optimize maintenance activities to ensure safe, reliable, and efficient plant operations. # Develop and implement...Permanent employmentFull time
- ...Job Title: Mechanical Engineer – Offshore Reliability Experience: Minimum 12 Years Qualification: Bachelor’s Degree in Mechanical Engineering Industry: Oil & Gas / Refinery (Offshore) Work Location : Saudi Arab Job Description: The Mechanical Engineer...Permanent employmentFull time
$133k - $199.6k per year
...communications. Our team collaborates closely with engineering teams across Stripe and internal... ...an ability to establish priorities and reliably execute on solutions (often with hard deadlines... ...office for team/business meetings, on-sites, meet-ups, and events, our expectation is...Full timeWork at officeLocal areaRemote workWork from homeRelocation- ...Protecnium is an international consulting firm specializing in engineering and technical services ( . We are currently looking for a Site Engineer to join our team. -Project: subway/tunnel project -Estimated length : 18 months- with the possibility of being extended...Contract workTemporary workLocal areaMonday to fridayNight shift
- ...Position Purpose: The primary objective of the Database Reliability Engineer r is to provide expertise across database and data platform... ...databases, ensuring they operate efficiently, securely, and reliable within private and public cloud environments Designing and...Long term contractFull timeInternshipRotating shift
$107k - $157.3k per year
...experienced Senior Software Reliability Developer to join our Autodesk... ...relationships. Reporting to the Software Engineering Manager, you will be part of... ..., including 3+ years in a Site Reliability Engineering role... ..., and logging tools, such as Dynatrace, Splunk, OpenTelemetry,...Long term contractFull timeFor contractorsWork at office1 day per week$90k - $100k per year
...Efficiency and reliability are the pillars of our success. Amentum is looking for a Facilities Maintenance Site Manager to drive our preventive maintenance programs and optimize... ...Qualifications: Bachelor’s degree in engineering, Business Administration, Facility Management...Hourly payDaily paidLong term contractRemplacementContract workWork at officeLocal areaShift workWeekend work- .... The Role The DevOps Engineer is responsible for providing... ...-prem applications to ensure reliability and performance using SLOs, SLIs... ..., Grafana, ELK stack or Dynatrace. ~ Working knowledge of Streaming... ...to, an accessible interview site, alternate format documents,...Internship
- ...to Ontario Faster, more frequent, and reliable access to rapid transit with more than 227... .... Job Description The MEP Site Superintendent leads and supervises all MEP... ...submittals. Liaise with consultants, engineers, and clients regarding technical issues and...Full timeContract workFor subcontractorLocal area
$70k - $80k per year
...Amentum is seeking a Reliability Planning Analyst I to support our team of multi-skilled technicians... ...with OSHA, EPA and Company and Site-Specific rules and regulations always.... ...accomplish work. Identifies work requiring engineering and design and reviews with proper entities...Hourly payContract workWork at officeLocal areaShift workWeekend workDay shift$126.8k - $164.1k per year
...for our clients. Role Overview The Engineer II plays a key role in developing and... ...expertise with a strong focus on efficiency, reliability, scalability, and security, supporting... ...using tools such as Datadog and Dynatrace Improve system reliability through automation...Work from home$90k - $110k per year
...Spadina, Moss Park, Corktown). Job Description The Site Geologist will play a key role in integrating geological expertise... ...closely with multidisciplinary teams including geotechnical engineers, designers, project managers, and contractors, this role will ensure...Full timeContract workFor contractors
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Cloud Site Reliability Engineer. Be the first to apply!
- google cloud architect Toronto, ON
- junior cloud engineer Toronto, ON
- cloud infrastructure architect Toronto, ON
- associate cloud engineer Toronto, ON
- cloud engineer remote Toronto, ON
- cloud devops engineer Toronto, ON
- cloud solution architect Toronto, ON
- cloud operations engineer Toronto, ON
- cloud network engineer Toronto, ON
- google cloud engineer Toronto, ON
