Post Job Free
Sign in

Assistant Data

Location:
New York City, NY
Salary:
60,000
Posted:
October 04, 2020

Contact this candidate

Resume:

Yunbai Zhang

530-***-**** adgnhu@r.postjobfree.com Linkedin: yunbai-zhang-1ab326138

EDUCATION

Columbia University NYC, NY

M.S. in Data Science 02/2020

Related courses: Deep Learning, Natural Language Processing (NLP), Algorithms for Data Science, Machine Learning, Bayesian Model for Machine Learning, Data Visualization and Analysis, Advanced Statistical Inference, Database University of California, Davis Davis, CA

B.S. in Statistics B.A. in Mathematics 06/2018

Related courses: Time Series Analysis, Generalized Linear Models, ANOVA Analysis, Multivariate Data Analysis, Linear Programming, Stochastic Process, Partial Differential Equations, Numerical Analysis Honor: High Honor at the College of Letters and Science (top 7%), Outstanding Performance Citation in the Statistics Department (top 5%), Departmental Citation in the Mathematics Department (top 5%), Dean’s Honors List TECHNICAL SKILLS

Python, SQL, R, Java, Tableau, MATLAB, C Keras, TensorFlow, PyTorch Spark, Hadoop, GCP, AWS, Linux Latex PROFESSIONAL EXPERIENCE

Research Assistant (NLP project) – Columbia Business School, Columbia University 02/2020 - Present l Cleaned text data using RegEx in python for combining ingredients to generate creative and tasty recipes. l Designed and Built up a hierarchy tree model for discriminating ingredient labels based on the document coverage and length of subwords as metrics to guarantee the accuracy and uniqueness of each ingredient label. l Reduced the number of duplicated labels from 731 to 71 for 8,000 ingredients in total. Research Assistant – Statistics Department Research Group, UC Davis 03/2017- 08/2018 l Implemented a new R package for functional data models including Mode of Variation Visualization and Functional Principal Component Analysis (FPC).

l Applied the R package in time series median house price analysis for over 2000 U.S counties from 1996 to 2017. l Constructed a functional regression model to predict the median house price for each county in 2018. l Related methods: BeautifulSoup for crawling the house data, visualize FPC scores and projected onto U.S. map, filled in the missing house price data by the predicted model. Intern at Data Analytics Department – TeamSun Technology Co., Ltd, Beijing, CN 06/2017 - 08/2017 l Applied NLTK packages to calculate sentimental scores and analyze over 20,000 customs online reviews for 200 electric products.

l Employed D3 technique to visualize correlation between number of reviews and average sentiment scores. l Constructed a R Shiny app for interactive visualization serving over 2M users. By allowing users to choose preferred products, we identified how consumer’s feedback impacted on product buying process and improved product quality. Intern at Data Analytics Department - A+P Group, Beijing, CN 06/2016-08/2016 l Built up a time series ARIMA model to predict median house price in 2015 based on median house price per month in Jiangsu from 2010 to 2014 retrieved by SQL from Chinese government database. l Web Scratched by BeautifulSoup method to obtain economic factors, such as the number of schools, Michelin-starred restaurants and companies; labeled the factors in difference levels of colors and projected them onto choropleth map to visualize the impact of various economic factors on housing price. MACHINE LEARNING PROJECTS NYC, NY

Capstone project with Capital One – Time Series Anomaly Detection, Columbia University 08/2019 - 12/2019 l Implemented hidden markov model and mean-reverting gaussian process to generate time series data and labels for simulating credit card delinquency behavior and prevent banks from financial loss. l Experimented with machine learning models including random forest, XGboost, logistic regression with SMOTE sampling, RUSboost and neural network based models (ANN, LSTM) upon generated data to detect anomalous behavior. l Chose the best detection model (i.e. LSTM with attention) with 90% recall and 82% precision. Group leader – Deep Learning Project for human face synthesis, Columbia University 06/2019 - 08/2019 l Utilized pre-trained face segmentation model to identify locations of face attributes for face synthesis. l Modified Pix2Pix model by replacing Batch Normalization with Spatially-Adaptive Normalization; successfully enhanced the testing accuracy from 89% to 93.7% and mean Intersection over-Union from 0.79 to 0.85. l Expanded face image synthesis to video synthesis by adding a temporal consistency loss to generator model and reduced flickers inside the video.

Individual project – Recommendation System, Columbia University 02/2019 - 08/2019 l Derived and implemented Alternating Least Square Matrix Factorization model for user-based recommendation. system; constructed a contend-based recommendation system by combining items features and user purchase history. l Recommended k closest items by using items’ cosine similarity and user historical data; evaluated the recommendation engine by Precision and Recall at k and normalized discounted accumulative gain (NDCG) metrics.



Contact this candidate