Data Analyst

Location:

Chicago, IL

Posted:

March 02, 2018

Contact this candidate

Resume:

Yue (Joanna) Wang

*************@********.*** **** S Lake Shore Dr, Chicago, Illinois 60615 773-***-**** TECHNICAL SKILLS

• Tools: Python (Pandas, Numpy, SciPy, Sklean, Tensorflow, Caffe, Tkinter), RStudio, SAS, Oracle, Tableau, VBA, SQL, Hadoop, Hive, Pig, Spark, AWS, UNIX/LINUX.

• Database Design: SQL(MySQL/NoSQL), ETL Process, Data Warehouse, OLTP/OLAP, Data Visualization.

• Statistical Analysis: Linear Regression, GLM, Binomial and Poisson Regression, ANOVA, Nonlinear Models.

• Machine Learning/Deep Learning: PCA, Shrinkage, Clustering, Cross Validation, Bagging and Boosting, Market Basket Analysis, Decision Trees, Random Forests, Support Vector Machines, Neuro Networks (ANN, CNN, RNN, LSTM), Recommendation System, Collaborative Filtering, Natural Language Processing. EDUCATION

University of Chicago, Chicago, Illinois Sep 2016 - Dec 2017 Master of Science in Analytics (Applied Statistics Concentration), GPA: 3.941/4.0

• Coursework: Statistical Analysis, Database Design and Implementation, Programming for Analytics, Time Series, Data Mining in R and SAS, Machine Learning, Advanced Machine Learning, Python, Financial Analytics, Linear and Nonlinear Models, Deep Learning and Image Processing, Big Data and Text Analytics.

• Member: Women in Analytics Club Trading Club

Nanjing University, Nanjing, China Sep 2011 - Jul 2015 Bachelor of International Economy and Trade (Economics Concentration), GPA: 3.5/4.0

• Coursework: Macroeconomics, Microeconomics, Econometrics, Intermediate Accounting, Monetary Finance and Banking, Financial Statement Analysis, Business Law, International Economics, Marketing.

• Honors/Awards: People Scholarship (2013, 2014)

WORK EXPERIENCE

Sun Trading LLC, Chicago, IL Oct 2017 – Dec 2017

Support and Data Analyst Intern, Developer Team

• Helped develop the accounting and financial system. Performed daily operational support: monitoring, maintaining, and executing the morning trading procedures. Troubleshoot operational errors in the Linux environment and postgres database using SQL. Composed reports using Tableau and Panopticon. CME Group, Chicago, IL Jun 2017 – Sep 2017

Data Science Intern, Market Regulation Department

• Analyzed large sets of transactional data, developed data preparation routines in R, SQL and Python. Built, assessed and validated models such as Regression, Logistic Regression and Clustering Analysis. Designed and implemented creative approaches to detect illegal trading orders and predictive modeling problems.

• Queried data through APIs, designed user-interface programs, utilized Oracle database and wrote SQL script to analyze the data. Frequently used Tableau for data visualization. B2W Digital, Chicago, IL Jan 2017 – May 2017

Data Science Capstone Intern, Retailing Department

• Designed the Buybox Algorithm to recommend the best choice of purchase to customers. Created the ranking score to decide the ranking of products according to consumers’ preferences.

• Utilized parametric models such as Nested Logistic model, Mixed Multinomial Logit model and Neuro Networks, and non-parametric models such as K-means Clustering and Latent Class Analysis for predictive modeling in R. PROJECT EXPERIENCE

NBA Database Design and Player Analysis Sep 2016 - Dec 2016

• Applied Python to source the NBA data from API and normalized the data from 1NF to 3NF. Used Google Cloud SQL to create the OLTP and OLAP databases using DDL and DML schema.

• Utilized Tableau, running on top of a MySQL database, to produce the dashboards showing each Bulls player’s strengths and weaknesses in shooting, passing, rebounding and defense. Human Resources Data Mining and Predictive Analytics Jan 2017 - Mar 2017

• Analyzed why best employees leave the company prematurely and predicted which employee leaves next.

• Developed the classification models using Logistic Regression, LDA/QDA, K-Nearest Neighbors, Decision Tree, Random Forest and Support Vector Machines(SVM) with predictive accuracy of 86% in training and 83% in test. Tweeter Analytics (Natural Language Processing) Mar 2017 - Jun 2017

• Utilized Spark to query data from Hadoop system. Conducted text analytics such as frequency distribution, Ngram and Lexical Diversity with NLTK package. Found the most popular topics using TF-IDF and LDA methods.

• Developed Sentiment Analysis using Watson NLU API, TextBlob, urlib and etc. Classified the tweeters into positive, negative and neutral, and visualized the results in SPSS Modeler.

Contact this candidate