Post Job Free

Resume

Sign in

Data Analyst Python

Location:
San Diego, CA
Posted:
April 02, 2020

Contact this candidate

Resume:

SIYU LAI

adclj9@r.postjobfree.com • www.linkedin.com/in/siyu-lai/ • (217) 974 - 5014

SUMMARY

● Highly analytical Data Analyst with 2 years’ research experience manipulating large dataset in R, gained business insight and communication skills from intern experience in HR consulting industry with hand-on dataset.

● Proficient in Statistical Analysis (PCA, regressions, decision tree, clustering, Monte Carlo, Bootstrap), Predictive Modeling, Optimization, Machine Learning, Web analytics, Recommended System, Text mining and Data mining

● Technical Skills: R (4 years), Python (Pandas, NumPy, Seaborn), SQL, SAS, Java, Html, Excel, LaTeX, Tableau, Git, Spark, Hadoop, AWS

● Detail-oriented self-starter with fast-learning, decision-making, problem-solving, story-telling, communication and client facing skills EDUCATION

University of California, San Diego June 2020

Master of Science in Business Analytics

Coursework: Business Analytics; Machine Learning; Large Data, Web Analytics, Text Mining, Customer Analytics, Big Data University of Illinois, Urbana-Champaign May2019

Bachelor of Science in Statistics, Minor in Business GPA:3.86/4.0 PROFESSIONAL EXPERIENCE

Data Analyst Research Assistance – Gies College of Business, UIUC, Urbana, IL 03/2019 - 08/2019

● Conducted web analytics with python to scrape Amazon website to track over 1000 products’ information within designed categories and price over 3 months. Performed batch processing to solve memory problems using Bash.

● Compensated missing data in R with imputation techniques (LOCF) and normalization.

● Analyzed the relationship between competition and price volatility in the amazon dataset with R, synthesized and visualized the result from analyzing standard deviation, coefficient of variation and MAPE for each time cycle. Data Analyst, Health & Retention – Aon Hewitt, Shanghai, China 05/2018 - 08/2018

● Manipulated big datasets and provided normalized data to professional mentor in Excel, who used it for customers’ flexible solutions for pharmaceutical company’s employee benefits plan.

● Organized data to build up industry benchmarks so as to visualize the potential benefits of Aon’s business solutions in Tableau and Power point; Created well-organized presentations to help customers easily understand consulting results and make the business decisions. PROJECT EXPERIENCE

Customer Prediction Analytics 01/2020 - 02/2020

Methods: RFM, Logistic, Neural Network (Kera & Sklearn & XGboost), Bootstrap

● Predicted email name list of second wave advertisement based on existing results, used an ensemble method by comparing 5 different models and choosing the model with highest net profit to predict the name list.

● Evaluated the predictive performance based on the profit, ROME, Lift, Gain, AUC of models. Extensively tuned the parameters in the model to improve the predictive performance.

● Identified the business insights of different models via parameter analysis and data visualization in Python and R.

● By using the logistic regression & NN model to target customers, we can gain 62.25% of buyers by targeting 25% of the customers. Statistical Computing Methods (Black Friday Sales Analysis) 08/2018 - 12/2018 Methods: Monte Carlo, Bootstrap, Permutation Test, Random Number Generation

● Trained and tested the statistical models from the dataset to investigate the best target demographics of the Black-Friday sales, aimed at a hyper-targeted advertising by predicting the age range with the most purchasing power.

● Compared and analyzed the strength and conditions of the models in permutation test to verify our hypothesis; developed bootstrap in R to estimate distribution of each age group on non-bestseller population.

● Implemented Random Number Generation to simulate potential customer behavior and visualized simulations. Statistical Learning (Flight Delay Analysis) 08/2018 - 12/2018 Methods: SVM, Random Forest, PCA, QDA, LQA, Lasso regression, Ridge regression

● Predicted the flight arrival delay by doing feature selection, extensive data visualization and thorough analysis of relationships among 30 explanatory features.

● Categorized the delay time for flight arrivals and perform the multiclass classifications based on statistical methods: SVM and Random Forest, achieved the classification accuracy of 88%.

Web Data Analyze 07/2019-09/2019

● Crawled websites and extracted useful data from Yelp’s review section with Python to analyze marketing characteristics of certain products, and summarized valuable insight report

● Extracted Twitter content using Stream Listener API, and analyzed text sentiment with Microsoft Azure



Contact this candidate