Assistant Data

Location:

Lawrence Township, NJ, 08648

Posted:

January 03, 2021

Contact this candidate

Resume:

YI XU

**** **** ***** *****, *************, NJ, *8648 626-***-**** ********@*****.***

EDUCATION

New York University Expected May 2021

M.S in Data Science GAP 3.9/4.0

Core Courses: Time Series analysis, Machine Learning, Big Data, Natural Language Processing, Deep Learning University of California, Berkeley May 2019

B.A in Applied Mathematics GPA 3.5/4.0

TECHNICAL SKILLS

Programming & Tools: Python (PyTorch, scikit-learn, Scipy, Numpy, Matplotlib, Pandas), SQL, R (CARET, dplyr, ggplot2), C++, Matlab, Spark, Hadoop, Git, AWS, Excel (VLOOKUP)

Techniques: Regression, Classi cation, Clustering, Reinforcement Learning, Survival Analysis, Visualization WORKING EXPERIENCE

NYU Center of Data Science & NYU Langone Medical Center May 2020 - Oct 2020 Research Assistant Advisor: Prof. Carlos Fernandez-Granda

Conducted KMeans to categorize platelet activity, and examined the death rate curve within each cluster.

Designed 3 kinds of semi-supervised regression methods that use 97 labeled observations and 7k unla- beled observations altogether to predict platelet activation score for each unlabeled record, leading to a more apparent separation in mortality curves.

Performed PCA on vitro dataset containing 36 platelet activity indexes for each drug with various concentra- tions, and innovated to use the rst PC score to describe platelet activation. ScoreOne Technology July - Aug 2019

Risk Analyst Intern

Maintained customer’s service messages dataset by joining two databases on the shared values using SQL.

Sliced sentences into individual word and compared word frequency by drawing word cloud graphs in R.

Reduced cash loan fraud loss by 4% by reviewing researching users’ text message by locating key words. Duke University & Statistical and Applied Mathematical Sciences Institute May - June 2018 Data Consultant Intern Director: Prof. David L. Banks

Coordinated a consultation project for IUPAC to project their membership subscriptions by country.

Explored and pre-processed data; Trained regression model and included dummy variables to improve model generality; Achieved 13% better return vs the old model. PROJECTS

Default Detection on Mortgage Data (Python)

Constructed 6 new features by domain knowledge-based feature engineering and selected 16 essential features among total 23 attributes according to correlation coe cient and customized embedded method.

Tuned hyperparameters using grid search and trained various classi cation models, covering SVM, Random Forest, Gradient Boosting and Neural Network to distinguish customers who struggle for payment.

Identi ed F1 score and AUPRC as evaluation matric speci cally for the imbalanced dataset which contains only 30% positive labels.

Located Gradient Boosting as the optimal model, improving F1 score by 15% from the baseline model. Recommendation System for User-book Interaction Data (Spark, Hadoop)

Built a book recommendation system with over 223M interactions using Alternative Least Square (ALS) algorithm in Spark.

Preprocessed the dataset by converting the dataset format from csv to parquet for e ciency.

Visualized the latent factors representation for items and users by PCA, t-sNE and UMAP.

Contact this candidate