Post Job Free
Sign in

Site Reliability Engineer, Consultant

Company:
Blue Shield of CA
Location:
Long Beach, CA
Posted:
December 30, 2025
Apply

Description:

Your Role

We are seeking an Experienced Site Reliability Engineer (SRE) to lead reliability, scalability, and performance initiatives across our production systems. In this role, you will blend software engineering, automation, and systems operations to ensure that our platforms are resilient, efficient, and continuously improving.

You will be part of a cross-functional team responsible for designing, implementing, and maintaining reliable systems that support millions of requests daily. This position requires a deep understanding of distributed systems, cloud infrastructure, automation, and incident response.

Your Knowledge and Experience

Education & Experience

Requires a Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience); Master's degree a plus.

7+ years of experience in building, supporting, and improving production systems and infrastructure.

Cloud Platforms

Minimum 5 years of hands-on experience with Azure, AWS, or GCP.

Demonstrated expertise in virtual machines (VMs), containers, cloud networking, identity and access management (IAM), monitoring, storage, and serverless functions.

Comfortable deploying and managing cloud-native services and infrastructure.

Programming & Scripting

Proficiency in one or more languages such as Python, Go, Java, Bash, PowerShell, or similar.

Ability to write clean, maintainable code for automation and tooling.

Containerization & Orchestration

Experience working with Kubernetes, Docker, and tools like Helm or Red Hat OpenShift.

Familiarity with managing containerized applications in production environments.

Monitoring & Observability

Working knowledge of tools such as Prometheus, Grafana, Datadog, New Relic, ELK Stack, Dynatrace, Splunk, Big Panda, SolarWinds.

Ability to set up dashboards, alerts, and metrics to ensure system health and performance.

CI/CD & Configuration Management

Experience with CI/CD pipelines using tools like Jenkins, GitHub Actions, GitLab CI, Argo CD, Spinnaker.

Familiarity with configuration management tools such as Ansible, Chef, Puppet.

Automation & Emerging Technologies

Understanding of Agentic AI systems and automation frameworks for incident response and infrastructure optimization is a plus.

Interest in exploring intelligent automation to improve reliability and reduce manual toil.

Testing & Deployment Expertise

Experience with chaos engineering tools (e.g., Gremlin, Chaos Monkey) and methodologies.

Hands-on knowledge of Blue/Green and Canary deployment strategies in cloud-native environments.

#LI-EB1

Apply