Job Description
About the Role
CosmosGrid is a modern DevOps and Cloud Engineering consultancy delivering scalable infrastructure, automation, 24/7 support, and secure private AI solutions. Our engineers work across global time zones to support clients with precision, clarity, and technical excellence.
Key Responsibilities
Design, build, and maintain cloud-native infrastructure on AWS using Kubernetes, Terraform, and modern DevOps tooling.
Implement CI/CD pipelines using GitHub Actions, GitLab CI, or Jenkins, ensuring fast, reliable delivery.
Manage Kubernetes clusters, troubleshoot workloads, optimize scaling, and ensure platform security.
Deploy and configure observability stacks (Prometheus, Grafana, Loki, Alertmanager) to monitor system performance.
Support infrastructure automation, configuration management, and GitOps practices (ArgoCD/Flux).
Participate in on-call rotation as part of CosmosGrid’s global 24/7 DevOps support model.
Collaborate closely with client engineering teams to deliver solutions aligned with business and technical goals.
Identify and implement cloud cost optimizations using FinOps principles.
Contribute to documentation, internal tooling, and best practices across the organization.
Required Qualifications
3+ years of hands-on DevOps, Cloud Engineering, or SRE experience.
Strong experience with AWS (EC2, VPC, IAM, S3, EKS, CloudWatch, etc.).
Proficiency in Kubernetes administration and troubleshooting.
Solid experience with Terraform and Infrastructure as Code workflows.
Hands-on experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins).
Familiarity with observability tools: Prometheus, Grafana, Loki, ELK, Alertmanager.
Strong scripting ability in Python, Bash, or Go.
Understanding of networking concepts (DNS, load balancing, proxies, ingress).
Experience implementing DevOps best practices: automation, repeatability, scalability.
Comfortable communicating with clients and working in fast-paced, distributed teams.
Preferred Skills (Nice to Have)
Experience with Karpenter, Bottlerocket, or EKS cost/performance tuning.
Familiarity with GitOps tooling (ArgoCD, FluxCD).
Understanding of MLOps architectures or experience deploying AI/LLM workloads.
Experience with Vault, KMS, or other secrets management tools.
Exposure to multi-cloud environments (Azure, GCP).
What We Look For
Engineers who are curious, resourceful, and enjoy solving hard problems.
People who take ownership and deliver with reliability and professionalism.
Strong communicators who thrive in collaborative, client-facing environments.
A passion for cloud-native technologies and continuous learning.