Post Job Free
Sign in

Sr Network Engineer (AI Cloud)

Company:
Lavendo
Location:
Dallas, TX
Pay:
115000USD - 145000USD per year
Posted:
June 28, 2025
Apply

Description:

Job Description

About the Company

Our client is a leading technology company specializing in AI-centric cloud infrastructure, which is reshaping the landscape of artificial intelligence. The company operates one of the most powerful commercially available supercomputers and provides scalable AI cloud infrastructure optimized for AI/ML workloads, leveraging thousands of NVIDIA GPUs, high-performance InfiniBand networking, and managed Kubernetes or Slurm orchestration. Their platform supports AI developers and enterprises requiring large-scale GPU compute power and sustainable, energy-efficient data centers.

The Mission

To democratize access to high-performance AI infrastructure by delivering innovative, scalable, and sustainable cloud solutions that empower AI developers and enterprises worldwide to accelerate their AI and machine learning workloads with unparalleled efficiency and reliability.

The Opportunity

We are seeking a Senior Network Engineer to join our client’s infrastructure team. This role is critical in ensuring the smooth and stable operation of the company’s large-scale data center networks and global backbone. You will design, develop, and maintain advanced network architectures supporting thousands of server ports and GPU cluster interconnects, contributing directly to the company’s mission of delivering cutting-edge AI cloud services. This is a remote US-based role requiring frequent travel to data center sites, as well as the ability to work European time zone hours and collaborate with international teams.

What You'll Do

Ensure stable operation of data center infrastructure, points of presence, and global backbone networks

Design and develop large-scale data center networks, including InfiniBand-based GPU cluster interconnects

Develop and maintain monitoring and automation tools to improve network operations

Provide technical design, operational support, and collaborate cross-functionally with R&D, SRE, ITDC, and network development teams

Lead and support major network infrastructure upgrades and new region launches

Liaise with vendors for troubleshooting and network infrastructure testing

Maintain comprehensive network documentation and testing plans

Participate in on-call rotations and travel 2-3 times monthly to sites in New Jersey, Kansas City, and occasionally Amsterdam

Work European time zone hours and coordinate with global teams

What You Bring

At least 5 years of experience working in large, complex technology environments

Proven ability to manage and support critical infrastructure serving large user bases

Strong analytical and troubleshooting skills for resolving complex network issues

Hands-on, proactive approach to maintaining and improving network systems

Self-motivated and able to work independently while contributing to a high-performing team

Excellent communication skills with experience collaborating across diverse teams and cultures

Ability to travel domestically and internationally 2-3 times monthly

Legal authorization to work full-time in the U.S. without visa sponsorship

Preferred Technical Skills

Networking Certifications: CCNP, CCIE, JNCIE, or equivalent expert-level qualifications

Routing & Switching: BGP, IS-IS, Segment Routing MPLS (with IPv6), Ethernet switching, VXLAN, ECMP, L3 MPLS VPNs

Data Center & Cloud Networking: Troubleshooting TCP/IPv4/IPv6 in complex data center topologies (e.g., CLOS networks), cloud overlay network technologies, software-defined networking (SDN)

Vendor Ecosystem: Hands-on experience with Juniper, Arista, and Mellanox network equipment

Cloud Platforms: AWS, Azure, Google Cloud

Bonus: Knowledge of GPU and Infiniband networking

Bonus: Programming skills in Python or Go for network automation

Bonus: Experience working in Linux environments

Key Success Drivers

Passion for staying current with HPC and AI infrastructure domains

Commitment to maintaining the highest standards of network reliability and performance in environments where milliseconds matter

Comfort working with international teams and adapting to different operational requirements across global data center locations

Drive to expand technical expertise in cutting-edge areas like GPU networking, software-defined infrastructure, and cloud-native networking

Why Join?

Competitive base salary ranging from $115,000 to $145,000 per year plus quarterly performance bonuses

100% company-paid medical, dental, and vision insurance for employees and families

401(k) plan with up to 4% company match and immediate vesting

Generous parental leave (20 weeks primary, 12 weeks secondary caregivers)

Remote work reimbursement up to $85/month for mobile and internet

Travel allowance and support for frequent business travel

Company-paid short-term, long-term disability, and life insurance

Work with a team operating one of the world’s most powerful supercomputers

Contribute to sustainable AI infrastructure with energy-efficient data centers that reuse waste heat

Enjoy a culture that blends startup innovation with the resources of an established company

Interviewing Process:

HR Screening + Candidate Survey

Level 1: Interview with the Hiring Manager

Level 2: Internal Routing Skills Interview

Level 3: External Routing Skills Interview

Level 4: Automation Skills Interview

Reference and Background Checks: Conducted after the successful completion of all interview stages

Job Offer: Extended to the selected candidate following successful checks

We are proud to be an equal opportunity workplace and are committed to equal employment opportunity regardless of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, genetic information, veteran status, gender identity, or expression, sexual orientation, or any other characteristic protected by applicable federal, state or local law.

Compensation Range: $115K - $145K

Full-time

Fully remote

Apply