Post Job Free
Sign in

SRE infrastructure

Company:
ClifyX
Location:
Rollingwood, TX, 78716
Posted:
May 14, 2025
Apply

Description:

Cloud/DevOps Site Reliability Engineer (SRE) with hands-on experience in Kubernetes, Docker, and containerized environments.

The ideal candidate will be responsible for managing production infrastructure services, ensuring high availability, reliability, and performance of our internal cloud systems.

The role requires expertise in "automation", infrastructure management, and operational support, with a good working knowledge/ understanding of various database technologies (Oracle, SingleStore, ClickHouse, MongoDB), Kafka/Python/Shell-scripting a plusCloud Infrastructure Management: Implement and maintain scalable cloud infrastructure across various environments. This would mostly be internal cloud and may not be 3PC.

Kubernetes & Docker Management: Deploy, manage, and scale applications using Kubernetes, Docker, and container orchestration tools.

Automation & CI/CD: Develop/Implement automation scripts for infrastructure provisioning, deployment, and continuous integration/delivery (CI/CD) pipelines.

Production Support & Monitoring: Ensure high availability, monitoring, incident resolution, and performance tuning for production environments.

Collaboration: Work closely with development teams to optimize cloud-native applications and improve system efficiency.

Incident Response: Lead and coordinate incident management, troubleshooting, and post-mortem analysis for production systems.

Continuous Improvement: Advocate for and implement best practices for cloud-native infrastructure, automation, and security.

Apply