Azure Site Reliability Engineer

Company:

Epsilon Solutions Ltd.

Location:

Posted:

December 13, 2025

Job Role: Azure Site Reliability Engineer

Location: Toronto, ON, Canada (Hybrid)

Job Type: Contract

Job Description:

Monitoring and Alerting

Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users

Incident Response

Respond to incidents and outages, diagnose problems, and implement solutions to minimize downtime and restore service

Automation

Automate repetitive tasks and processes to improve efficiency and reduce manual effort

Performance Optimization

Identify and address performance bottlenecks to ensure systems run efficiently and effectively

Infrastructure Management

Manage and maintain the underlying infrastructure including servers, networks, and cloud resources

Capacity Planning

Plan for future capacity needs to ensure systems can handle anticipated workloads

Release Engineering

Develop and maintain processes for deploying software updates and releases

Collaboration

Work closely with developers, operations teams, and other stakeholders to ensure system reliability and availability

Documentation

Maintain clear and concise documentation of systems processes, and procedures

Continuous Improvement

Identify areas for improvement and implement changes to enhance system reliability and performance

Skills and Qualifications

Cloud Platform Microsoft Azure

Excellent knowledge of AKS

Monitoring tools: Dynatrace, Splunk, Grafana

Operating System Windows Linux

Scripting Shell Scripting Python PowerShell

Database MySQL Oracle SQL database management

Container Services Kubernetes Docker Helm

An understanding of Camunda is preferable