M.S. in Data Science at New York University

Location:

Queens, NY, 11106

Posted:

May 28, 2025

Contact this candidate

Resume:

TIEN-LENG WU

******@***.*** 347-***-**** New York, NY linkedin.com/in/tien-leng-wu

EDUCATION

New York University Sep 2024 - May 2026

Master of Science in Data Science (Cumulative GPA: 3.833/4) Coursework: Introduction to Data Science, Computational Cognitive Modeling, Learning from Small Data, Machine Learning, Big Data, Natural Language Understanding National Taiwan University Sep 2020 - Jun 2023

Bachelor of Arts in Economics & Data Science and Social Inquiry Certificate (Overall GPA: 3.98/4.3 (3.85/4)) SKILLS

Programming Languages & Tools: Python, SQL, Julia, C++, R, Stata, Tableau, Excel, Git, Hadoop, Spark, Jupyter, LATEX Data Science Technologies & Packages: Data visualization, Data cleansing, Data wrangling, Machine learning, Deep learning, Neural networks, Reinforcement learning, Bayesian machine learning, NumPy, Dask, Large Language Models, Hypothesis testing, Regression models, Classification models, Clustering methods, Dimensionality reduction, Scikit-learn, High performance computing, Natural language processing, PyTorch, Pandas, Matplotlib, Statistics, Econometrics EXPERIENCES

Course (Capstone) Project: Big Data Apr 2025 - May 2025 Large-scale Movie Recommender System and Customer Segmentation System with Spark on High Performance Computing (HPC) Cluster

• Generated the top 100 pairs of users who have the most similar movie watching style with a MinHash & Locality-Sensitive Hashing

(LSH) pipeline; validated the results using average Pearson correlation with randomly picked 100 pairs of users

• Developed a biased-adjusted popularity baseline model and an Alternating Least Squares (ALS) based model for movie recommender system; adopted MAP@k and NDCG@k for evaluation metrics and conducted hyper-parameter tuning to improve the ALS model Course Project: Natural Language Understanding Jan 2025 - May 2025 Enhancing Performance of Large Language Models on Distinct Question-answering (QA) Tasks with Fine-tuning and Budget Forcing

• Fine-tuned on the s1K dataset to investigate whether mathematical reasoning ability improves model performance on QA tasks

• Applied Budget Forcing, a relatively new test-time scaling technique (Jan 2025), to evaluate its impact on QA accuracies

• Conducted experiments on three distinct QA datasets—mCSQA (commonsense), REPLIQA (context retrieval), and Knights & Knaves (logical deduction)— across three settings: task-specific fine-tuning, s1K fine-tuning, and budget forcing Course (Capstone) Project: Introduction to Data Science Nov 2024 - Dec 2024 Assessing Professor Effectiveness with Statistical Inference and Machine Learning Methods

• Performed data cleansing and data wrangling on certain columns; visualized data of certain variables with Matplotlib and Seaborn

• Conducted significance testing using parametric and non-parametric methods to test different hypotheses

• Built regression models with regularization and classification models, using metrics like AUROC to assess predictive performance Research Assistant, Behavioral and Data Science Research Center, National Taiwan University Apr 2023 - Dec 2023 Studies of Presidential Elections and Referendums in Taiwan

• Managed data preprocessing; filled in missing values, merged multiple datasets, and handled invalid or unreadable strings

• Investigated root causes of missing, inaccurate, and unbalanced data; developed corresponding solutions

• Visualized data and applied econometrics models to processed data with Python and Stata Course Project: Machine Learning (Kaggle Competition) Apr 2023 - Jun 2023 Track Ranking on Spotify and YouTube Datasets: An Ordinal Ranking Problem

• Preprocessed data via techniques including one-hot encoding and tokenization and implemented three approaches (boosting methods combined with residual learning, feedforward neural network, and ranking regression based on binary classification) to rank tracks

• Compared the three approaches based on efficiency, scalability, and interpretability metrics to evaluate their performance Course Project: Database Management Nov 2022 - Dec 2022 Dinner Recommendations Using SQL on Dcard Datasets

• Queried the database of Dcard (a social media platform in Taiwan) to address the questions that had business values and conducted exploratory data analysis (statistical graphs and maps) on query results to interpret findings Course Project: Statistical Learning and Deep Learning Oct 2022 - Dec 2022 MVP Predictions in the NBA (National Basketball Association)

• Preprocessed data and conducted exploratory data analysis with Tableau to produce sophisticated graphs of NBA player statistics

• Implemented various models, including Lasso regression, Random Forest, GBDT, AdaBoost, and XGBoost, to predict MVP proba- bility scores for each player; recommended the best model, and generated an MVP list with interpretation

Contact this candidate