Post Job Free
Sign in

AI Performance Software Engineer

Company:
Signify Technology
Location:
San Francisco, CA
Posted:
June 29, 2025
Apply

Description:

AI Inference Software Engineer — Stealth AI Systems Startup Base Salary Range: $200,000-$300,000 Location: San Fransisco (Onsite) A stealth-stage AI systems company is redefining the performance boundaries of inference at scale.

As generative AI models become larger and more complex, inference is emerging as the core bottleneck in production environments.

This team is building a vertically integrated stack—from low-level GPU kernels to developer-friendly APIs—that dramatically improves inference speed, efficiency, and scalability.

Spun out of cutting-edge academic research and backed by deep industry experience across distributed systems, machine learning infrastructure, and hardware design, they are focused on enabling production-grade AI with minimal latency and maximal throughput.

Their platform integrates seamlessly with modern ML frameworks like PyTorch and LangChain, allowing teams to deploy and monitor workloads in seconds.

They are looking for a Software Engineer focused on AI inference performance to help build and optimize the core runtime infrastructure powering these systems.

This role sits at the intersection of deep learning, systems engineering, and GPU performance.

What You’ll DoImplement and evaluate advanced inference optimization techniques, including quantization, KV caching, and FlashAttentionDesign and build systems for distributing inference workloads efficiently across multiple GPUs and nodesProfile and benchmark large-scale models to identify bottlenecks across the software and hardware stackOptimize CUDA kernels and GPU memory usage to improve performance across a wide variety of AI modelsCollaborate closely with research and systems engineers to push the limits of model serving infrastructure What They’re Looking ForProficiency with CUDA and experience writing or optimizing GPU kernelsStrong background in Python and C++ developmentHands-on experience with PyTorch, TensorFlow, or similar deep learning frameworksKnowledge of distributed systems or model-serving platforms at scaleFamiliarity with performance tuning, benchmarking tools, and profiling techniques Nice to HaveGraduate degree in computer science, engineering, or a related fieldExperience with compiler frameworks such as MLIR or TritonExposure to vLLM, ONNX, or custom model runtimes This is a rare opportunity to work on core infrastructure for AI systems at a team solving some of the hardest performance challenges in the field.

Apply