Yahan Zheng *****.*****.**@*********.*** +1-603-***-****
EDUCATION
Dartmouth College, Hanover, NH Sep 2024 - May 2026(expected) M.S. Computer Science
Coursework: Network Science and Complex System Machine Learning and Statistical Data Analysis Computer Graphics
Sichuan University, Sichuan, China Sep 2020 - July 2024 B.Eng Internet of Things Engineering
Accumulative GPA: 3.74/4.0
Award: Semester Scholarship (21', 22', 23'), Computer Design Competition (Provincial 2nd, 23'), Mathematical Contest in Modeling (Provincial 1st, 22')
Coursework: Fundamentals of Image Recognition & Applications Massive Data Processing & intelligent Decision Making Principle & Application of Big Data Technology Algorithmic Design Machine Learning Discrete Mathematics
EXPERIENCE
Software Engineer Intern Bytedance Beijing 12/2024 – Present
● Designed and implemented a modular system architecture encompassing four key modules: Marketing Navigation, Inventory Navigation, Client Tracking Navigation, and Product Tracking Navigation, optimizing system organization and functionality.
● Developed API communication protocols to enable seamless data exchange between modules, reducing system latency and improving data consistency by 30%.
● Engineered CRUD functionalities for critical components, including procurement and sales management, enhancing operational efficiency and user experience.
● Deployed Redis caching for high-frequency queries and implemented modular designs for scalability, reducing database load and improving system response time by 40%.
● Conducted performance tuning and scalability tests to support high-volume operations, ensuring system stability under peak loads.
● Developed interfaces for real-time updates on procurement and shipping status, enhancing operational transparency. Intern Minds, Machines, and Society Group Dartmouth College 10/2024 – Present
● Collaborated with cross-functional teams, designing and developing scalable machine learning models to reduce the race/religion/gender stereotypes and biases that exist in existing Large Language Models.
● Preprocessed 20GB datasets for re-training the BERT model.
● Compared various techniques for reducing model bias, such as CDA, Self-Debias, SentenceDebias, and Iterative Nullspace Projection.
Intern Irving Institute for Energy and Society Dartmouth College 10/2024 – Present
● Developed a web crawler that systematically gathered over 10,000 documents related to clean energy policies, preparing the data for training a large language model.
● Engineered a Retrieval Augmented Generation (RAG) system for question answering on energy policy documents.
Processing documents by converting them to parsable plain text and applying semantic segmentation for efficient chunking.
Utilizing multi-agent LLMs and extractive QA models (ELECTRA, RoBERTa, ALBERT) for metadata/label tagging, enhancing similarity calculations.
Employing SentenceTransformer for text vectorization and FAISS for efficient embedding storage and retrieval.
Combining retrieved content with generative models (e.g., Llama 3.1) for accurate answer generation. Project Researcher Carnegie Mellon University 04/2023 - 10/2023 Multi-modal solution for Deepfake Detection and Source Identification
● Utilized BERT, R-Net and Wav2Vec 2.0 for feature extraction.
● Designed a multi-modal analysis framework using Pytorch, implemented layer freezing to enhance the training velocity.
● Leveraged MMMU-BA model and customize the dual-modal attention framework to improve Deepfake detection accuracy by 5.89%.
Project Researcher Sichuan University 09/2021 - 09/2023 Multi-feature Fusion Speech Deepfake Detection
● Developed an advanced voice forgery detection system using the ASVspoof 2017 dataset by extracting and processing audio features for classification.
● Applied oversampling and threshold adjustment techniques to manage ASVspoof 2017 dataset, employed spectrograms to extract MFCC, LFCC, CQCC, and other audio features.
● Extracted features from the audio features(pictures) using the PixelHop algorithm, and utilized GRU for classification to address timing issues.
● Applied oversampling and threshold adjustment techniques to manage a 9:1 real-to-forged audio ratio to improve the EER of baseline by 3.53%, achieving an ACC of 91.2. Contraband Detection
● Integrated various datasets, including PIDray, CLEAR, and SIXray, to enrich the data diversity and enhance model robustness.
● Based on the Transformer model, leveraging its attention mechanisms to focus on global context within images, thereby improving both object detection and image recognition efficiency.
● Achieved performance exceeding previous models in detecting guns, wrenches, knives,etc. and overall average precision. The mAP was 92.47, compared to YOLOv5’s mAP of 87.2, representing a 5.27% increase in average precision.
Wind Power Forecasting Method
● Based on the dataset provided by the School of Electrical Engineering at Sichuan University, converted 15-minute interval data into 24-hour intervals to enhance periodicity.
● Processed data to enhance periodicity and employed the Prophet algorithm to manage seasonal and extreme value characteristics, and improved the RMSE of baseline Prophet prediction model by 13%.
● The paper "Wind Power Forecasting based on the Prophet Model" was presented at the IEEE I&CPS Asia 2022 conference and received a patent.
PUBLICATIONS, PATENTS & COPYRIGHTS
● "Multi-modal Solution: Deepfake Detection and Source Identification.", ICCEIC 2023 (Publication)
● "Wind Power Forecasting based on Prophet Model.", IEEE I&CPS Asia 2022) (Publication)
● "A Fake Speech Detection Method Based on PixelHop Feature Dimensionality Reduction." (Patent)
● "Transformer-based Contraband Detection Platform." (Copyrights) SKILLS
● Programming Languages: Python, Java, C/C++, JavaScript, HTML/CSS, SQL
● Technical Skills: Pytorch, Tensorflow, Apache, React, Git, Node.js, Express.js, MongoDB, Unix/Linux, Docker, PyTorch, REST, Kubernetes, PyTorch