Post Job Free
Sign in

Site Reliability Engineer

Company:
Altis Technology
Location:
Toronto, ON, Canada
Posted:
May 18, 2025
Apply

Description:

We are seeking a proactive Site Reliability Engineer (SRE) to drive reliability, performance, and efficiency across our systems and platforms. You'll work closely with Application Development, QA, Product, and Data Engineering teams to champion a DevOps/SRE culture rooted in automation, observability, and continuous improvement.

Key Responsibilities:

Collaborate cross-functionally to promote SRE and DevSecOps best practices across the organization.

Build and maintain reliable, scalable systems with a focus on availability, performance, and resiliency .

Establish and monitor SLOs/SLIs, and develop comprehensive dashboards to support decision-making from both technical and business perspectives.

Lead efforts to reduce toil through automation, self-healing systems, and advanced monitoring (e.g., synthetic monitoring, RUM).

Apply observability and reliability testing practices from architecture through operations, leveraging Agile and product-based models.

Drive the adoption of cutting-edge tools in observability, automation, platform engineering, AIOps, and MLOps.

Contribute to and lead Communities of Practice (CoP) and SRE Office Hours to foster knowledge sharing and continuous improvement.

Qualifications:

SRE & DevOps Expertise:

Strong experience in observability, toil reduction, incident response, and performance optimization.

Proficient with monitoring tools such as Dynatrace, CloudWatch, and Azure Monitor .

Skilled in IaC, CaC, JSON, and scripting with Python, Node.js, Ruby, PowerShell, and Shell .

Deep understanding of Dynatrace advanced features: DT Guardian, RUM, Synthetic Monitoring, AI-based event correlation .

Cloud & Automation:

Expert in AWS Cloud services: CDK, Lambda, CloudWatch, EKS, EC2, ELB, S3, SSM .

Experience with log ingestion pipelines (AWS Firehose, Dynatrace OpenPipeline), and operational dashboards.

Hands-on experience with Ansible Tower, AWS SSM, Bitbucket/GitHub, and CI/CD workflows .

Orchestration & Data:

Familiarity with orchestration tools like Step Functions, Apache Airflow, and container platforms.

Knowledge of data pipelines, data lakes, and databases (Redshift, RDS, Aurora, PostgreSQL, SQL Server, Oracle).

Leadership & Communication:

Strong problem-solving and knowledge management skills.

Effective communicator who bridges technical and business teams.

Collaborative, inclusive leader who builds high-performing teams and fosters a culture of growth and recognition.

Apply