Yudong Liu
+1-412-***-**** # **************@*****.*** § https://github.com/YudongL2000
Education
Carnegie Mellon University 2018-2023
Master of Science in Computer Science (Thesis based) GPA: 4.03/4.30 Bachelor of Science in Computer Science, Minor in Mathematical Science GPA: 3.76/4.00
•Selected Coursework
– Algorithms & Complexity: Algorithms for Big Data (15-859), Algorithm Design and Analysis (15-451)
– Systems: Advanced Operating Systems and Distributed Systems (15-712), Distributed Systems (15-440)
– Machine Learning: Multimodal Machine Learning (11-777), Convex Optimization (10-725), Advanced NLP (11-711)
– Maths: Methods of Optimization (21-690), Numerical Linear Algebra(21-344), Principals of Real Analysis (21-356) Experience
•Research Internship in Multicomp Lab, Language Technologies Institute, CMU May 2020 - Aug 2020 Domain: Multimodal Machine Learning Mentor: Dr. Louis-Philippe Morency
– Designed and Implemented Autoencoder architecture for multimodal feature extraction, including videos, audios and text features for downstream tasks
•Research Assistant in Mohimani Lab, Computational Biology Department, CMU Aug 2021 - Aug 2023 Domain: Applied Algorithms, Machine Learning, Bioinformatics Mentor: Dr. Hosein Mohimani
– Designed and Implemented time-efficient algorithms and Machine Learning models for high-throughput bioinformatic analysis, including molecular networking, and predicting molecular-binding affinities between protein sequences
•Teaching Assistant, CMU
– 15-213/513 Intro to Computer Systems (Summer 2023)
– 15-712 Advanec Operating Systems and Distributed Systems (Fall 2023)
– 10-701 Intro to Machine Learning (Fall 2023)
Selected Projects
•Efficient clustering and spectral library search under large data scale Algorithm Design, Data mining
– Designed and Implemented MASST+ and Networking+, two game-changing algorithms for spectral library clustering, searching and analysis that’s 3 magnitudes faster than the state of art, solving a fundamental open problem in Computational Biology. (Paper accepted by Nature Biotechnology as co-first author)
•Expansion Language Models for Conditional Adaptation (In Progress) Machine Learning, NLP
– Adapting pretrained small language model vocabulary embedding to large language models with few-shot training, and providing a pipeline for Multimodal video captioning and QA task with easy adaptation to pretrained LMs
•High-Modality Multimodal Transformer Machine Learning
– Constructed a universal HighMMT model capable of handling over 8 modalities and 15 tasks from multiple research areas through fast modality transfer. Improved tradeoff between performance and efficiency over existing models.
•DynPartition Distributed ML, Reinforcement Learning
– Proposed and Implemented a novel rein-forcement learning-based scheduler that performs dynamic partitioning of computation across multiple heterogeneous GPUs for dynamic neural network inference tasks.
•Distributed Bitcoin Miner Distributed Systems
– Implemented a distributed bitcoin miner simulator based on Remote Procedural Calls. The miner runs on LSP (Live Sequence Protocol) capable of handling computational intensive tasks and recovering from sudden failures Technical Skills
• Languages: C/C++, Python, Go, SML, Rust
• Libraries: PyTorch, Python Libraries, C++ STL, SQL
• Expertise: Machine Learning, Applied Algorithms, Distributed Systems. Publications
[1] Mihir Mongia*, Tyler M. Yasaka*, Yudong Liu*, Mustafa Guler, Liang Lu, Aditya Bhagwat, Bahar Behsaz, Mingxun Wang, Pieter C. Dorrestein, Hosein Mohimani. Fast Mass Spectrometry Searches of Untargeted Metabolomics Data using MASST+. Nature Biotechnology.
(Accepted)
[2] Paul Pu Liang, Chun Kai Ling, Yun Cheng, Alexander Obolenskiy, Yudong Liu, Rohan Pandey, Alex Wilf, Louis-Philippe Morency, Russ Salakhutdinov. Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications. Neurips, 2023. (In review)
[3] Mihir Mongia, Romel Baral, Abhinav Adduri, Donghui Yan, Yudong Liu, Yuying Bian, Paul Kim, Bahar Behsaz, Hosein Mohimani. AdenPredictor: accurate prediction of the adenylation domain specificity of nonribosomal peptide biosyn- thetic gene clusters in microbial genomes. Bioinformatics, 2023, 39, i40-i46. DOI:10.1093/bioinformatics/btad235
[4] Paul Pu Liang, Yiwei Lyu, Xiang Fan, Jeffrey Tsaw, Yudong Liu, Shentong Mo, Dani Yogatama, Louis-Philippe Morency, Ruslan Salakhut- dinov. High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modali Representation Learn- ing. Transactions on Machine Learning Research (05/2023). DOI:10.48550/arXiv.2203.01311