Job Description
We are seeking a highly skilled Cloud / Site Reliability Engineer to support cloud automation and infrastructure performance initiatives for The Client, a leading organization in enterprise cloud services. This is a contract-based opportunity ideal for professionals experienced in Terraform, GKE, Kubernetes, and Infrastructure as Code (IaC) best practices.
Key Responsibilities:
Design, develop, and maintain reusable Terraform modules for provisioning GKE (Google Kubernetes Engine) resources.
Automate deployment, scaling, and management of GKE clusters, node pools, and associated network/storage components.
Implement and enforce IaC best practices such as Git-based version control, code reviews, and CI/CD pipelines.
Collaborate with DevOps, development, and cloud engineering teams to optimize infrastructure performance, cost-efficiency, and scalability.
Maintain accurate and comprehensive documentation covering GKE cluster usage, architecture standards, and lifecycle management.
Monitor, assess, and improve system reliability, availability, and security across cloud-hosted services.
Troubleshoot infrastructure issues, identify root causes, and deliver preventive solutions using automation and observability tools.
Must-Have Skills Qualifications:
10–15 years of experience in cloud infrastructure, DevOps, or site reliability roles.
Expertise in Terraform for GKE provisioning and modular infrastructure design.
Strong hands-on experience with GKE clusters, automation, scaling, and reliability.
Proficiency in Kubernetes (multi-cluster management, Helm, networking, etc.).
Familiarity with CI/CD integration, GitOps workflows, and secure DevOps pipelines.
Strong commitment to cloud reliability, security standards, and operational excellence.
Excellent communication and collaboration skills within cross-functional teams.