Postdoctoral Researcher, Large Behavior Models
At Toyota Research Institute (TRI), we're on a mission to improve the quality of human life. We're developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility, we've built a world-class team in Automated Driving, Energy & Materials, Human-Centered AI, Human Interactive Driving, Large Behavior Models, and Robotics.
The Learning From Videos (LFV) team in the Robotics division focuses on the development of foundation models capable of leveraging large-scale multi-modal (RGB, depth, flow, semantics, bounding boxes, tactile, audio, etc) data from multiple domains (driving, robotics, indoors, outdoors, etc) to improve the performance of downstream tasks. This paradigm targets training scalability, since data from multiple modalities can be equally leveraged to learn useful data-driven priors (3D geometry, physics, dynamics, etc) for world understanding. Our topics of interest include, but are not limited to, Video Generation, World Models, 4D Reconstruction, Multi-Modal Models, Multi-View Geometry, Data Augmentation, and Video-Language-Action models, with a primary focus on embodied applications. We are aiming to make progress on some of the hardest scientific challenges around spatio-temporal reasoning, and how it can lead to the deployment of autonomous agents in real-world unstructured environments.
This year-long postdoctoral research position will be highly integrated into our team, with hands in both ongoing and new research and development threads in the areas of:
4D World Models
Physical and Embodied Intelligence
Multi-Modal Learning
This researcher will have the opportunity to work collaboratively with our team at TRI on high-risk, high-reward projects, pushing forward our understanding of spatio-temporal reasoning and zero-shot generalization. This is a research-focused position, targeting the development of methods and techniques that can solve real-world problems. We welcome you to join a positive, friendly, and enthusiastic team of researchers, where you will contribute to helping people gain and maintain independence, access, and mobility. We work closely with other Toyota affiliates and actively collaborate towards research publications and the productization of our developed technologies.
Responsibilities
Develop, integrate, and deploy algorithms for Multi-Modal and 4D reasoning targeting physical applications.
Handle the ingestion of large-scale datasets for training, including streaming, online, and continual learning.
Invent and deploy innovative solutions at the intersection of machine learning, computer vision, and robotics that improve the real-world performance of useful tasks.
Work closely with robotics and machine learning researchers and engineers to understand theoretical and practical needs.
Follow best practices producing maintainable code, both for internal use as well as for open-sourcing to the scientific community.
Qualifications
Ph.D. in a relevant technical field.
A strong background in computer vision and its applications to robotics and embodied systems.
A standout colleague with strong communication skills, and an ability to learn from others and contribute back to the scientific community with publications or open source code.
Passionate about assisting and amplifying older adults and those in need through dexterous manipulation, human-robot collaboration, and physical assistance innovation.
Bonus Qualifications
Spatio-temporal (4D) computer vision, including multi-view geometry, 3D/4D reconstruction, video generation, self-supervised learning, occlusion reasoning, etc.
Large-scale training of multi-modal deep learning methods, both in terms of dataset sizes and model complexity, context length extension, and efficient attention, distributed computing, etc.
Application of machine learning and computer vision to embodied applications.