About the Team The mission of our Applied Machine Learning (AML) team is to push the next-generation AI infrastructure and recommendation platform for ads ranking, search ranking, live streaming, and e-commerce.
We drive substantial impact across ByteDance's core businesses by building world-class ML platforms and systems.
We are seeking an Tech Lead, AML Inference to oversee the development and execution of ByteDance's inference infrastructure.
This role will lead and mentor a team of Machine Learning Engineers focused on inference, ensuring reliability, scalability, and performance across large-scale distributed systems.
The Inference Lead will collaborate closely with research, product, and platform teams to design and deliver cutting-edge solutions that power critical ranking and recommendation services.
Responsibilities - Lead and mentor a team of inference-focused Machine Learning Engineers, setting technical direction and ensuring best practices.
- Drive the design and evolution of distributed inference infrastructure to support feeds, ads, search, and other core ranking models.
- Oversee the development of monitoring, observability, and management tools to ensure reliability and scalability of online inference services.
- Identify and resolve system inefficiencies, performance bottlenecks, and reliability issues, ensuring optimized end-to-end performance.
- Partner with research and product teams to translate requirements into robust and efficient inference solutions.
- Stay at the forefront of advancements in inference frameworks, ML hardware acceleration, and distributed systems, incorporating innovations where impactful.
Minimum Qualifications - Bachelor's degree or above in Computer Science, Electrical Engineering, or related field.
- 5+ years of experience in developing and deploying large-scale, distributed systems, with at least 5 years in a leadership or technical lead role.
- Strong programming skills in languages such as C++, Python, or Go.
- Deep understanding of inference frameworks and ML system deployment (e.g., TensorFlow, PyTorch, TensorRT, JAX, MXNet). - Proven experience optimizing performance for large-scale machine learning systems, including hardware-software co-design, GPU/RDMA acceleration, or HPC techniques.
- Excellent communication and collaboration skills; ability to work across research, engineering, and product teams.
Preferred Qualifications - Experience leading teams working on high-throughput, low-latency ML serving systems.
- Contributions to open-source ML or systems projects.
- Familiarity with container orchestration, service mesh, or cloud-native ML infrastructure.
- Experience collaborating with and leading global, cross-functional teams across different time zones.