Engineering Corporation Data Analyst

Location:

Dallas, TX

Posted:

July 31, 2021

Contact this candidate

Resume:

Yikun Wu (Queenie)

******@***.*** • 214-***-****

EDUCATION:

MS of Applied Statistics and Data Analytics, Southern Methodist University, Dallas, TX May 2020

• GPA: 3.7

MS of Applied Economics and Predictive Analytics, Southern Methodist University, Dallas, TX December 2018

• GPA: 3.5

Bachelor of Economics and Forestry Management, Sichuan Agricultural University, Chengdu, China May 2017 TECHNICAL SKILLS:

Stata, SQL, R, Python, SAS, Tableau, TradingView, data science Data Modeling Skills:

• Regression models

• Data clustering and data cleaning

• Data visualization, including high dimensional data

• Time series models, including data imputing and tests

• Penalized models, Lasso, Ridge and elastic models, including model selection

• ANOVA model and test

• Machine Learning and classification models

• Principal component analysis (PCA)

• Longitudinal model

WORK EXPERIENCE:

Consultant, CITT Services, Dallas, TX July 2020 – Present

• Deliver the latest signals and forecasts for different financial instruments to our top ten clients to aide their business decisions; explain the results and details in business/non-technical terms.

• Research client’s buyer persona and understand their underlying business needs; give advices on future marketing and buyer targeting work.

• Collect information of buyer persona from CRM database by SQL; build buyer persona data bank.

• Report price analysis results and models with a detailed dashboard by Tableau or Python.

• Data cleaning with R or Python, detecting missing values and outliers.

• Work with classification models including Logistic regression, K-nearest neighbors, Support Vector Machine, Random Forest. Writing report of each project.

• Utilize previous stock price data (usually 1-year daily price data) to build LSTM and time series models for future price forecasts, including data cleaning and data manipulation.

• Analyze HubSpot CRM data and social media performance by data visualization using tableau. Data Analyst, China State Construction Engineering Corporation, Beijing, China May 2019 – August 2019

• Prepared dataset for modeling, made sure the modeling work would be continued. Guaranteed the models and reports would be ready for clients on time.

• Prepared data for the modeling work, make sure the data was appliable. Helped group with result exhibition.

• Cleaned house price data from 2016 to 2018 by KNN, removing near-zero variance variables and other strategies

• Facilitated data preparation visualization and testing for 20,000+ observations Data Analyst, China State Construction Engineering Corporation, Beijing, China May 2018 – August 2018

• Prepared for the modeling work. Made things easy orderly for group members to continue.

• Built visual analytics in Tableau to be used across 3 functional teams and presented them to supervisor

• Cleaned 55,300 data points with R using the package tidyverse PROJECT EXPERIENCE:

Energy Price in Time Series (R)

• Made a forecast of the energy price by time series models with clear data visualization.

• Data Transformation and Visualization: Formatted data; Cleaned missing values and outliers; Decomposed the data to see trend and seasonality; Used adf test to test the stationary; plotted the ACF and the PACF for the original sequence.

• Model Development: Generated ARIMA, SARIMA, ARCH and product season models, each with ACF and PACF plots

• Model Selection: Utilized Ljung-box test, residual test, Portmanteau Q test and AIC to compare different models; discovered the SARIMA model performed best

Cluster and Modeling Wine Qualities from Physicochemical Properties (Python)

• Researched on the main influences of the wine qualities.

• Data Preparation and Exploratory Analysis: Conducted data visualization and missing values elimination

• Regression: Created Lasso regression model and Ridge regression model to predict wine quality in 10 levels

• Classification: Generated random forests model and logistic model to classify and predict wine quality, which produced results with 61% accuracy

Kaggle Cereal Price (R)

• Researched the main effects of the cereal price.

• Data Cleaning: Removed near-zero and high-correlation variables to limit overfitting and used KNN to fill in missing values

• Cross-validation: Generated K-folds datasets to overcome high dimensionality (77 observations and 18 variables)

• Feature Engineering: Created a new variable by adding variable “potass” and “fiber”

• Modeling: Utilized 3 models: Lasso for variable selection and regularization, Ridge regression shrunk coefficients to improve prediction errors, and Elastic net regression as a regularized regression method Alibaba (BABA) Stock Price Forecasting (Python)

• Made a forecast of the future price of the stock Alibaba (BABA).

• Data Transform and Visualization: Transformed the dataset to a time series data set and plotted it

• Model Development: Built the LSTM model to have two layers with 50 neurons and 2 dense layers, one with 25 neurons and the other with 1 neuron

Contact this candidate