RUI CAO DATA ANALYST
***.*****@*****.*** 475-***-****
rui-cao-b76415186/ ruicao87
New Haven,CT( Open to Relocation) Canadian Citizen
SUMMARY
Data Analyst professional with background utilizing Python, SQL for the entire data analysis workflow including data wrangling, data storytelling, data visualization, exploratory data analysis, machine learning, and predictive modeling. I am also a production engineering technician with over 5 years of experience in the opticals industry. I have a strong skill in build and set up models and equipment, documenting test results, performing analysis, troubleshooting and inventory management for the manufacturing workflow. LANGUAGES: Python(Pandas, NumPy, Scikit-Learn,nltk), SQL, Unix Command DATA VISUALIZATION: Matplotlib, seaborn, bokeh
DATA COLLECTION:: JSON, CSV, API, Web_Scraping
STATISTICS: Inferences: Frequentist, Bootstrap, Bayesian, Hypothesis Testing, Modeling, A/B Testing MACHINE LEARNING: Linear Regression, Logistic Regression, kNN, Random Forest, Naive Bayes, Natural Language Processing, K-Means Clustering, Neural Networks, Keras, Decision Tree, Gradient Boosting SKILLS
PROJECTS
Sentiment Analysis of Women’s E-commerce Clothing Review 2019 - 2019 Python tools: numpy,pandas,seaborn,nltk,sklearn,shap,lime Using Natural Processing Language(NLP) techniques to find the most popular words an anonymized Women’s Clothing E-Commerce dataset in order to find out which features are most important to customers (color, price, size, etc) Built a customer sentiment predictive model using text message Modeled the training data to several classifications (naive Bayes, logistic regression, random forest) and choose logistic regression that has the best performance: over 90% precision/ recall on recommended items and 74% recall on not-recommended items Interpreting Text Prediction using SHAP and LIME
Forecasting Bike Rental Demand in SF 2019 - Current Python Tools: numpy, pandas, seaborn,matplotlib,sklearn Exploratory Data Analysis on Bike Share dataset and see what is the relationship between each factor and the total trips per day, how these factors affect the demand
Modeled the training data with several regressions (Random Forest, Decision Tree, Linear Regression, Gradient Boosting, Adaboost), choose random forest as the best performance model and predict daily trips in SF on a selected station. EMPLOYMENT
Springboard Data Scientist/Data Analyst Fellow Aug. 2019 - Current 550+ hours of hands-on curriculum, with 1:1 industry expert mentor oversight, and completion of 2 in-depth capstone projects. Mastering skills in Python, SQL, data analysis, data visualization, hypothesis testing, and machine learning. Sanmina - Engineering Technician - Ottawa, CA Apr. 2015 - June 2017 Working with production engineers to build the electro-optical devices, conduct experiments and collect data, calculate results. Writing production procedures and training production technician. Oz Optics Ltd. - Fiber Optics Group Lead - Ottawa,CA Nov. 2012 - Apr. 2015 Supervised on Candela Project(Laser Hair Remover Equipment), Involved in build and set up equipment, documenting test results, troubleshooting, inventory audits and instrument calibration
Design and develop High Power optical products and writing procedures. EDUCATION
Springboard Aug. 2019 - Current
Data Science Career Track
6 month mentor_led intensive course in data science,machine learning, Python, SQL University of Calgary Sept. 2009 - June 2012
Master Biomedical Engineering
Nanjing Normal University Sept. 2005 - June 2009
Bachelor Electrical Engineering and Automation