Post Job Free
Sign in

Site Reliability Engineer - Philippines (Night Shift)

Company:
Intrado Life & Safety, Inc.
Location:
Basak, 6015, Philippines
Posted:
April 21, 2026
Apply

Description:

About Us Intrado is dedicated to saving lives and protecting communities, helping them prepare for, respond to, and recover from critical events. Our cutting edge company strives to become the most trusted, data centric emergency services partner by uniting fragmented communications into actionable intelligence for first responders. At Intrado, all of our work truly matters.

Responsibilities/Qualifications

In this Site Reliability Engineering (SRE) role, you’ll partner closely with development and business teams to create effective monitoring, alerting, and observability solutions that improve system performance and visibility. You’ll support production systems, troubleshoot complex issues, and help drive long term stability through proactive incident management and automation. You'll get to design secure, cost effective, and reliable cloud infrastructure.

This role will work nights between 9 or 10pm to 5 or 6am in SST.

Reliability Engineering & System Operations

Design, implement, and maintain scalable, reliable production systems.

Troubleshoot and resolve complex application and system issues.

Collaborate with development teams to build features with reliability, observability, and performance in mind.

Apply Site Reliability Engineering (SRE) best practices including Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs).

Monitoring & Observability

Develop and maintain monitoring, alerting, synthetic testing, and dashboards to ensure visibility into system health.

Configure agents for metrics/log collection and manage incident notification channels.

Analyze trends and recurring issues to drive proactive improvements.

Cloud Infrastructure Management

Manage and optimize AWS/Azure environments in staging and production.

Collaborate with architecture, development, and finance teams to design secure, cost effective, and reliable cloud infrastructure.

Incident & Problem Management

Participate in 24/7 on call rotations, quickly respond to production incidents, and identify root causes.

Lead post mortems and implement long term fixes.

Escalate and communicate issues as appropriate.

Automation & Tooling

Automate repetitive operational tasks and improve system efficiency.

Build and maintain deployment and configuration tools.

Work in CI/CD tools such as GitHub Actions.

Collaboration & Customer Focus

Partner with product and development teams to prioritize and resolve production impacting issues.

Support internal teams with tools and insights for efficient self service.

Ensure timely resolution of tickets and clear communication with stakeholders.

Architecture & Documentation

Review technical documentation (HLDs/FRDs) to identify potential issues early.

Maintain knowledge of product platforms and usage patterns.

What You Bring:

Education: Bachelor’s in Computer Science, MIS, or related field (or equivalent experience).

Experience: 4+ years in application support; experience in development, databases, or systems administration preferred.

Cloud: Expertise in AWS and/or Azure (GCP a plus) with hands on experience.

Languages: Skilled in one or more languages (Python, Go, Java, Ruby, JavaScript); scripting with Bash or Python.

Monitoring Tools: Experience with tools like DataDog, Splunk, New Relic; dashboard creation and performance monitoring.

Systems & Networking: Strong Linux/Unix skills; SQL, VPN, TCP/IP, FTP/SMTP troubleshooting.

Containers & IaC: Production level of Kubernetes and Terraform.

SRE Practices: Knowledge of SLIs/SLOs/SLAs, CI/CD, and automation strategies.

Soft Skills: Excellent problem solving, communication, and collaboration.

Mindset

Continuous improvement focus with a proactive approach to reliability. #J-18808-Ljbffr

Apply