Weixing Yang
Data scientist with math/computer science double major and demonstrated experience with big data analytics and intensive programming. Seeking a position within a creative and dynamic work environment. ****.******@*****.***
New York, United States
nycdatascience.com/blog/aut
hor/weixyang/
linkedin.com/in/weixing-yang-
381485173
github.com/weixyang90
SKILLS
R(dplyr, ggplot2, shiny)
Python(numpy, pandas,
scikit-learn)
SQL (MySQL)
Web-Scraping(scrapy)
KNN Clustering
Linear Models
Tree based models
Naive Bayes Java
HTML/CSS
Data Structures
Git/Github
LANGUAGES
Mandarin
PROFESSIONAL EXPERIENCE
Part-time Data Scientist
ABPHINA
10/2020 - Present,
Propose business ideas to help the company make strategic decisions. Provide plans for gathering, and organizing data from multiple sources. Ensure the quality, consistency of data and utilize different analytics to assist in turning raw data into fact-based conclusions.
Develop and deliver reports as a result of the tested hypothesis and provide useful information for the organization.
Data Science Intern
ABPHINA
06/2019 - 09/2019,
Used supervised and unsupervised learning with Python packages (numPy, Pandas, scikit-learn, SciPy, etc.) to predict malaria trends and outbreaks.
Divided countries with clustering models including K-means and LCA. Processed data with missing value by KNN, collinearity removal, and re-sampling with bootstrap. Predicted disease trend with different regression models including penalized models (Lasso, Ridge, Elastic net, etc), polynomial regression with feature interactions, tree models (Random Forest, Stochastic Gradient Boosting and XGBoost).
Evaluated and improved model performance by cross-validation and tuning hyperparameters. Translated model results into business recommendations with key contributing factors and variable weights.
Data Science Fellow
NYC Data Science Academy
01/2019 - Present,
Worked with a senior data scientist from UnitedHealthcare to build a model to predict hospital readmission rates for diabetic patients to assist hospitals and insurance companies target high-risk patients. Implemented advanced feature engineering and used several classification models including logistic regression, random forest, gradient boosting, and extreme gradient boosting on large, complex data set.
Predicted house sales prices using a highly dimensional dataset. Processed missing data for numerical and categorical features. Used a single Machine Learning model including Lasso regression, Ridge regression, ElasticNet, Random forest, Gradient Boosting, and Extreme Gradient Boosting to predict target price. Stack all the above models except Randomforest to perform better prediction for the target.
EDUCATION
Bachelor of Science
Stony Brook University
02/2014 - 12/2017,
Double Major in Computer Science and Applied Mathematics & Statistics Data Structure, Analysis of Algorithms, Principles of Database Systems, Scripting Languages, Principles of Programming Languages, Software Engineering Achievements/Tasks
Achievements/Tasks
Achievements/Tasks
Courses