Hua Shi
Queens, NY, ***** 347-***-**** ***************@*****.*** LinkedIn Blog Github
DATA SCIENTIST
Experience in data acquisition and data modeling, statistical analysis, machine learning, deep learning, and NLP. With a background in Economics and Data Science, I bring strong skills in Python coding as well as a passion for delivering valuable data through analytical functions and data retrieval methods. Fluent in English, Chinese, and Korean. TECHNICAL SKILLS
Python, OOP, SQL, Scikit-learn, NumPy, Pandas, Keras, Tableau, Google Analytics, MySQL, Hadoop, Apache Spark, R, Matlab, Excel, Machine Learning, Deep Learning,Statistical Analysis, TensorFlow, NLP, Web Scraping, API TECHNICAL PROJECTS - MACHINE LEARNING
Positive or Negative? NLP with Amazon Sports & Outdoors data GitHub Predicts if a review is positive or negative to measure customer satisfaction
● Explored and understood the Amazon data using Matplotlib, word cloud, Tableau, and Seaborn
● Applied model stacking method to predict the target using Random Forest, Neural Network, Naive Bayes, Logistic Regression and XGboost to obtain 91.4% accuracy
● Presented insights in complex data sets and provided detailed reporting of findings using Tableau Prediction of PM2.5 level in Beijing Time-Series Panel Data Analysis Github Predicts PM2.5 crosses twelve different locations in Beijing with time series panel data
● Cleaned the data and filled out missing data with backward method for time series panel data
● Visualized data with different types of packages such as matplotlib, seaborn, and plotly
● Utilized four different models to predict the data: Fixed Estimator, Random Estimator, Pooled OLS, LSTM then compared models with two hypothesis tests for first three models ( Hausman Test and Lagrange Multiplier Test ) and RMSE scores for Random Estimator and LSTM
Functional? Not Functional? or Functional but need to be repaired? Ternary Classification Analysis Github Predicts whether water pumps are broken, need repair, or are working properly in Tanzania
● Cleaned the data and gained the insights with Pandas, Seaborn, and matplotlib
● Used Grid Search to tune the hyperparameters for Random Forest, Decision Tree, and also applied Logistic Regression to analyze the data, obtained an 80% f1 score after model evaluation.
● Interpreted patterns of the complex data sets and provided a detailed presentation of findings LICENSES & CERTIFICATIONS
Google Analytics Individual Qualification Completion ID: 32758005 Certificate of Data Science Credential ID:UC-8E9Q9QTK Tableau - Data Visualization Credential ID:UC-D2VM8PS0 EMPLOYMENT HISTORY
Dior Beauty Advisor -Macy’s Inc., Flushing, NY Aug 2017 – Jan 2018
● Collaborated with counter managers to organize products and keep correct inventory
● Ranked #1 sales out of eight based on employee satisfaction and sales record
● Used WeChat, KakaoTalk, and Facebook to promote products and answered customers’ questions to offer the best experience of products
Assistant Store Manager -The Yeon Beauty, Flushing, NY Jun 2016 – Apr 2017
● Hired and trained employees with professional sales skills and maintained daily inventory count to keep its accuracy
● Maintained statistical and financial records and put forward new strategies for holiday events with my store manager
● Advertised the store promotions using Wechat, Kakaotalk, and Facebook and provided all customers online Q&A group chat 24hrs. The highest sales performance was 10x the regular daily sales amount. EDUCATION
Flatiron School, New York, NY Jan 2020 - Apr 2020
Immersive Data Science Bootcamp program
Stony Brook University, Stony Brook, New York Aug 2015 - Dec 2016 Master of Science, Economics
Yanbian University, Jilin Province, China Aug 2009 - Dec 2013 Bachelor of Science,Economics
Published paper: Study on the Relationship between Energy Consumption, Economic Growth, and Carbon Dioxide Emission