Post Job Free
Sign in

Data Analyst Python

Location:
Irvine, CA
Posted:
October 14, 2020

Contact this candidate

Resume:

LIAO, JING

Tel: +949-***-**** Email: adgy0n@r.postjobfree.com Irvine, CA

EDUCATION

University of California, Irvine(UCI) September 2018-June 2020 Master of Science in Statistics GPA: 3.87/4.0

Donald Bren School of Information & Computer Sciences TECHNICAL SKILLS

Programming:

w R(3.5 year experience), applied R for data cleaning, feature engineering and data visualization, built various statistical models and machine learning models, programmed optimization and sampling algorithms, created data analysis report via Rmarkdown and Latex. w Python(1.5 year experience)applied Python for data cleaning, feature extraction and data visualization, run Numpy/Pandas/Sklearn/Tensorflow/Pytorch, etc. w SQL(1 year experience) applied SQL to import and clean data, add, delete, and modify data, run complex queries, window functions, etc.

w Matlab (2 year experience) ability to design, code, test, debug, modify, document and maintain programs, and deliver quality product within deadline. Coursework: Data Analysis, Statistical Modeling, Regression, Probability, Statistical Consulting, Simulation, Bayesian Analysis, Longitudinal Effects and Machine Learning, Survival Analysis. Software:

w Excel familiar with VLOOKUP, SUMIF, Pivot Tables. w Tableau/Power BI ability to transform complex data into concise views according to business needs and create visual interactive reports.

w SPSS conduct statistical analysis utilized SPSS to analyze survey data from over 8,000 survey respondents.

PROFESSIONAL PROJECTS

E-commerce Data Analysis (SQL+Tableau) Irvine,CA March 2020-June 2020 l Applied Mysql to clean the data of 100,000 user from behavior, dismantle the cause of user churn and visualized it from the multi-dimensional platform activity, conversion rate, recommendation system, etc. l Built the AARRR funnel model to analyze user churn. Through comparative analysis, it is found that collection or adding shopping cart will increase the conversion rate of purchases 8.5% higher the conversion rate of additional purchases after browsing.

l Applied FRM model to classify customer value, found that most users are development users and retention users, and assist in optimizing the post-maintenance of various users Effects Of E- cigarette On Stem Cell Function (Statistical Consulting) Irvine CA January 2020-March 2020 l Collaborated with UCI cancer research center and multilevel PHDs to analyze the effect of E-cig vapor on stem cell functioning. Performed exploratory data analysis for the pilot study. l Applied statistical t test and longitudinal effect to analyze the data. Performed power analyses to improve the efficiency of the follow-up study.

Pre-hospital Diagnosis In Stroke (Data Analyst Individual Project) Irvine CA July 2019- March 2020 l Developed GLM by R software to analyze the collected data at a local hospital with clinical and EEG variables. l Applied LASSO to control the model complexity and selected the predictors. The accuracy rate of new prediction model is 21.3% higher than the original.

l Applied the Bayesian Analysis and Gaussian process classification model to predict stroke; the accuracy of posterior predictive model reached 85% in 5-fold cross validation. AI in Cardiotocography(AI in Biology and Medicine Project) Irvine CA October 2019-December 2019 l Constructed SVM, Random Forest, Neural Network from scikit- learn via Google Colab to predict fetus health status for University of Porto, Portugal with 2126 instances. Applied the GridSearchCV with the 10-fold cross validation, to find the optimal hyper parameters.

l Used Cohen's kappa to provide robust evaluation of the model. Digital Recognizer (Machine Learning Project) Irvine CA October 2019-December 2019 l Applied Principal Component Analysis to reduce dimensionality of MNIST (data set consists of 60,000 training images and 10,000 testing images)

l Applied KNN, SVM by Python, tuning hyperparameter to achieve an accuracy of 0.976 and running time of 18.749s of KNN and an accuracy of 0.983 and running time of 25.594s of SVM. Built the deep neural network by CNN using Keras with Tensorflow backend that reached an accuracy of 0.990 and running time of 27.508s. Public health problems of Malaria (Data analysis) Irvine CA May 2019-July 2019 l GEE model was built in R to quantify how the probability of having malaria change over time and whether there are food intervention effects.

l GLMM model was applied to capture the specific subject effects and to predict the probability of malaria over time in the study for a 7 years old boy in the control intervention. Effects of Beta Carotene Supplementation on Serum Levels(Data analysis) Irvine CA. March 2019-May 2019 l Performed a descriptive data analysis with boxplot and spaghetti plots. l Linear Mixed Effect models were built in R to analyze how dose level of beta-carotene supplementation affects the trajectory of serum beta-carotene and vitamin E in phase I of pharmacokinetics study. Likelihood ratio test was conducted to answers the research questions of the study. Statistical Computation (Statistical computation Project) Irvine CA October 2019-December 2019 l Developed programming to compare the optimal performance of Hamiltonian Monte Carlo(HMC), Stochastic Gradient Hamiltonian Monte Carlo(SGHMC), and Riemann Manifold Hamiltonian Monte Carlo(RMHMC) by R in the sense of the convergence speed(speed of finding stationary distribution) and extent of auto-correlation, finding RMHMC performed globally better while being more time consuming. Las Vegas Hotel Evaluation(Statistical Machine Learning Individual Project) Irvine CA May 2019-July 2019 l Built SVM, Neural Network and Random Forest via Python to forecast review score of Las Vegas hotels in TripAdvisor l Developed Multinomial and Ordinal Logistic Regression in R based on the characteristic of order of review score. Discovered that free internet is an important feature when users review the hotel and the number of reviews and the hotel star are critical features in hotel review. AI Application in Othello(Artificial Intelligent Project) Irvine CA October 2018-December 2018 l Applied Minimax, Alpha-Beta algorithms via Python to improve the win rate of the game. l Built evaluation function based on the chessboard and mobility, achieved the winning rate of 62%. DATA ANALYSIS INTERNSHIPS

Data Analyst Intern in GAN., Irvine, California Aug 2020-Present l Responsible for extracting and selecting data from archived Excel files, mapping data sets to a new Chart of Accounts, reconciling the totals for the mapped data to existing ledger amounts, transferring data into ERP and SQL database, According to business needs, wrote SQL scripts to collect multi-dimensional data from the company database,. l Through Excel's pivot table, Vlookup function, etc., for data cleaning, mapping, analysis. analyzed user portraits and behaviors for different brands from business segmentation dimensions (basic information, social attributes, etc.)

l Applied Python and R to conduct data mainuplation and achieved merging and data aggregation of multiple tables simultaneously, efficiently present visual data reports, and reduce the workload of the department by 30% l conducting features engineering, assisting with creating Dashboards on Power BI. Competition

l Kaggle “Hear disease classification” (first 15%) March 2020 l Kaggle “Rainfall prediction with satellite image information” (first 9%) December 2019 LEADERSHIP & ACTIVITIES

Statistics Reader at UCI Statistical department Jun 2019-June 2020 Has been assisted the professor for teaching Statistics 7, Statistics 120A and Statistics 120C in UCI. Overseas Volunteer in Indonesia Jan 2015-March 2015



Contact this candidate