Post Job Free
Sign in

Data Analyst

Location:
Sunnyvale, CA
Posted:
February 15, 2020

Contact this candidate

Resume:

Chenxin (Jessica) Guo

607-***-**** E-mail: *****@*******.*** Address: 1495 Lakeside Drive, Sunnyvale, CA 94085 https://www.linkedin.com/in/chenxin-guo/

EDUCATION

Cornell University, College of Engineering, Ithaca, NY May 2020 Master of Engineering in Data Analytics, GPA: 3.65 University of Liverpool, Liverpool, England June 2018 Bachelor of Science in Mathematics with Finance, GPA: 3.91, First Class academic scholarship Relevant Courses: Data Structure in Java, Machine Learning for Intelligent System, Machine Learning for Data Mining, Statistical Data Mining, Big Data Technologies, NLP, Simulation Modelling & Analysis, Optimization2 SKILLS

Techniques: Python, R, SQL, Java, Excel, VBA, Tableau, TensorFlow, SAS, MATLAB, Linux, Git, PHP, Neo4j INTERNSHIP EXPERIENCE

John Hancock Life Insurance, Advanced Analyst Co-op, Boston Jul 2019 - Dec 2019

• Published KPI dashboards tracker by Tableau to keep track of the performance of digital campaigns; performed A/B testing on conversion rate and other metrics analysis and wrote analysis reports to 200 senior managers.

• Established data automation pipelines/workflows for databases; Created brokerage distribution mapping database by SQL, enhanced sales efficiency and impact 200,000 producers worldwide.

• Looked into drivers of positive and negative performance of the insurance sales through time series modelling; Reduced time needed to prepare the report by 25%.

• Built a text analytics/topic modeling capability for customer feedback Data; generated word clouds and topic models for each of the data sets that help analyze large text data sets by using NLP models with Python.

• Built data automation pipelines/workflows for brokerage hierarchy mapping. Saved 30% of the time. Equifax, Data Analyst Intern (team of 5), Ithaca Nov 2018 - May 2019

• Used SQL to extract and transform data; constructed customer clustering using K-means model with Dynamic Time Wrapping (DTW) algorithm, leading an initiative which impacted dynamic clustering field.

• Conducted feature selection to select top 3 most relevant variables among 1000 features from 15 million customer dataset; performed Wavelet Transformation to turn multiple time-series variables into static ones.

• Adopted machine learning models to separate good payment behavior from bad; improved accuracy by 10% and provided Equifax with a more accurate credit evaluation model to better safeguard funds from investors.

• Held weekly meetings with directors to discuss the progress; gave a final presentation in front of all managers. Yancheng Bureau of Statistics, Summer Data Analyst, China June 2017 - Sep 2017

• Integrated and cleaned the 10-year agricultural commercial data according to different standards; visualized data with Excel Pivot Table and Tableau.

• Participated in analyzing dynamic data and composited indicator; improved agricultural economics profit forecasting through machine learning methods: Logistic Regression, Multiple Linear Regression and Decision Tree. Improved the accuracy rate to 95.4%

RELEVANT PROJECTS

Datathon competition and Kaggle machine learning competitions (R & Python) Fall 2018 - Spring 2019

• Classify the device with which President Trump wrote each tweet (text analysis) 1. Data preprocessing: cleaned and tokenized text data; removed stop words; stemmed and lemmatize words to noun only; built the bag of words; created word cloud visualization. 2. Words embedding: encoded the tweets contents by TF-IDF; encoded tweets by Doc2Vec. 3. NLP modeling: used TF-IDF combine with RNN model, Doc2Vec combined with random forest model, and LDA model; Achieved highest prediction accuracy: 86%.

• Google analytics customer revenue prediction (200 million data) (team of 4) 1. Data preprocessing: oversampled the data to make it balanced; cleaned the data: replaced NA, deleted outlier and changed the categorical data into dummy variables. 2. Models: Fitted a paired model first identified total customers using kernel SVM and XGBoost classification models, then used boosting tree and neural network to predict the profitable customers. 3. Selected feature with L1 regularization and forward selection; finally tuned the model with cross validation.



Contact this candidate