Do you have experience in building scalable, cloud-based systems from the ground up, then this is the role for you!
As part of the Infrastructure Engineering team, you will design large-scale backend systems, implement scalable infrastructure to support large-scale Generative AI (GenAI) workloads, optimize systems for performance, and drive the development of GenAI infrastructure.
Required Qualifications:
Education Experience: Bachelor’s degree in computer science, Computer Engineering, or relevant technical field, or equivalent practical experience.
Professional Experience: A proven track record as a technical leader in developing scalable, maintainable, and high-performance cloud-native infrastructure. Hands-on involvement in the architecture, development, and deployment of reliable infrastructure. 8+ years of experience leading, mentoring, or managing software engineers.
Core Technical Skills:
Practical Expertise in one or more cloud platforms like AWS, Azure, or GCP including scaling with Docker and Kubernetes.
Deep understanding of network architecture, protocols, and security best practices
Proficiency in Infrastructure as Code (IaC) tools like Terraform, CloudFormation
Experience with containerization technologies like Docker, Kubernetes
Familiarity with scripting languages like Python, Bash
Experience with service mesh (Linkerd, Istio) and API gateways (Kong, Traefik) to enhance microservices deployment and management.
Ability to tackle complex problems, make data-driven decisions, and apply best practices in secure software development.
Preferred Qualifications:
10+ years of hands-on experience building and maintaining high performance scalable infrastructure
Experience developing tools, libraries, and infrastructure for data preprocessing, model training/finetuning, and deployment of LLMs in production environments.
Proficiency in orchestration frameworks like Flyte, MLFlow or similar technologies for automating complex workflows