Post Job Free
Sign in

Staff Architect, AI Infrastructure

Company:
Super Micro Computer
Location:
San Jose, CA
Posted:
May 17, 2025
Apply

Description:

Job Req ID: 26676

About Supermicro:

Supermicro® is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.

Job Summary:

Supermicro IT team is seeking a visionary Staff Architect, AI Infrastructure to lead the architecture and scaling of GPU-accelerated infrastructure optimized for AI and machine learning workloads. This role requires deep system-level expertise, automation, and hands-on experience designing infrastructure at scale. You will architect integrated compute, network, and cooling systems that support next-generation AI platforms while ensuring operational efficiency and future readiness.

Essential Duties and Responsibilities:

Hyperscaler-Grade Infrastructure Design

Design and scale high-performance infrastructure inspired by hyperscalers (e.g., NVIDIA DGX SuperPOD, Meta RSC, Azure NDv5, AWS Trainium clusters), with a focus on modularity, density, and operability.

System-Level Architecture

Lead the integration of compute, networking, storage, and power systems for high-density GPU workloads (NVIDIA, AMD, Intel Gaudi), ensuring system-wide performance optimization.

Automation & Orchestration

Build and standardize infrastructure provisioning, deployment, and monitoring via infrastructure-as-code tools (Terraform, Ansible, Python), ensuring repeatability and scale.

AI-Ready Network Design

Architect East-West GPU interconnects and North-South data ingress/egress paths using InfiniBand (HDR/NDR) and high-speed Ethernet (100G/400G), with support for VXLAN, BGP, and EVPN.

Liquid & Air Cooling Infrastructure

Design and oversee deployment of air- and liquid-cooled racks, PDUs, containment solutions, and backup power systems tailored for thermally intensive AI workloads.

Observability & Monitoring

Implement telemetry and health metrics to proactively manage system performance and lifecycle states.

Infrastructure Documentation & Standards

Create robust documentation for reference architectures, operational playbooks, and lifecycle workflows to support global deployments.

Cross-Functional Leadership

Collaborate with ML platform teams, data scientists, hardware architects, and facility engineers to align infrastructure capabilities with AI platform needs.

Technology & Market Evaluation

Analyze and influence roadmap decisions by staying current on industry trends from NVIDIA, AMD, Intel, and cloud hyperscalers.

Qualifications:

10+ years in data center infrastructure or hyperscaler-scale compute environments, ideally with AI or HPC workloads

Bachelor's degree or equivalent experience

Proven success architecting GPU infrastructure using NVIDIA, AMD, or Intel Gaudi platforms

Hands-on experience with large-scale data center deployments, including mechanical/electrical design and containment

Strong automation experience

Deep knowledge of RDMA, InfiniBand, Ethernet,and overlay networks

Experience with bare-metal orchestration for GPU environments

Experience with hyperscaler environments or colocation data centers supporting AI workloads

Experience supporting AI/ML workloads across hybrid cloud environments

Strong business acumen: able to balance performance, cost, and scalability in architecture decisions

Salary Range

$168,000 - $184,000

The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs.

EEO Statement

Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.

Apply