Yikun Wu (Queenie)
adnzlu@r.postjobfree.com • 214-***-****
EDUCATION:
MS of Applied Statistics and Data Analytics, Southern Methodist University, Dallas, TX May 2020
• GPA: 3.7
MS of Applied Economics and Predictive Analytics, Southern Methodist University, Dallas, TX December 2018
• GPA: 3.5
Bachelor of Economics and Forestry Management, Sichuan Agricultural University, Chengdu, China May 2017 TECHNICAL SKILLS:
Stata, SQL, R, Python, SAS, Tableau, TradingView, data science Data Modeling Skills:
• Regression models
• Data clustering and data cleaning
• Data visualization, including high dimensional data
• Time series models, including data imputing and tests
• Penalized models, Lasso, Ridge and elastic models, including model selection
• ANOVA model and test
• Machine Learning and classification models
• Principal component analysis (PCA)
• Longitudinal model
WORK EXPERIENCE:
Consultant, CITT Services, Dallas, TX July 2020 – Present
• Deliver the latest signals and forecasts for different financial instruments to our top ten clients to aide their business decisions; explain the results and details in business/non-technical terms.
• Research client’s buyer persona and understand their underlying business needs; give advices on future marketing and buyer targeting work.
• Collect information of buyer persona from CRM database by SQL; build buyer persona data bank.
• Report price analysis results and models with a detailed dashboard by Tableau or Python.
• Data cleaning with R or Python, detecting missing values and outliers.
• Work with classification models including Logistic regression, K-nearest neighbors, Support Vector Machine, Random Forest. Writing report of each project.
• Utilize previous stock price data (usually 1-year daily price data) to build LSTM and time series models for future price forecasts, including data cleaning and data manipulation.
• Analyze HubSpot CRM data and social media performance by data visualization using tableau. Data Analyst, China State Construction Engineering Corporation, Beijing, China May 2019 – August 2019
• Prepared dataset for modeling, made sure the modeling work would be continued. Guaranteed the models and reports would be ready for clients on time.
• Prepared data for the modeling work, make sure the data was appliable. Helped group with result exhibition.
• Cleaned house price data from 2016 to 2018 by KNN, removing near-zero variance variables and other strategies
• Facilitated data preparation visualization and testing for 20,000+ observations Data Analyst, China State Construction Engineering Corporation, Beijing, China May 2018 – August 2018
• Prepared for the modeling work. Made things easy orderly for group members to continue.
• Built visual analytics in Tableau to be used across 3 functional teams and presented them to supervisor
• Cleaned 55,300 data points with R using the package tidyverse PROJECT EXPERIENCE:
Energy Price in Time Series (R)
• Made a forecast of the energy price by time series models with clear data visualization.
• Data Transformation and Visualization: Formatted data; Cleaned missing values and outliers; Decomposed the data to see trend and seasonality; Used adf test to test the stationary; plotted the ACF and the PACF for the original sequence.
• Model Development: Generated ARIMA, SARIMA, ARCH and product season models, each with ACF and PACF plots
• Model Selection: Utilized Ljung-box test, residual test, Portmanteau Q test and AIC to compare different models; discovered the SARIMA model performed best
Cluster and Modeling Wine Qualities from Physicochemical Properties (Python)
• Researched on the main influences of the wine qualities.
• Data Preparation and Exploratory Analysis: Conducted data visualization and missing values elimination
• Regression: Created Lasso regression model and Ridge regression model to predict wine quality in 10 levels
• Classification: Generated random forests model and logistic model to classify and predict wine quality, which produced results with 61% accuracy
Kaggle Cereal Price (R)
• Researched the main effects of the cereal price.
• Data Cleaning: Removed near-zero and high-correlation variables to limit overfitting and used KNN to fill in missing values
• Cross-validation: Generated K-folds datasets to overcome high dimensionality (77 observations and 18 variables)
• Feature Engineering: Created a new variable by adding variable “potass” and “fiber”
• Modeling: Utilized 3 models: Lasso for variable selection and regularization, Ridge regression shrunk coefficients to improve prediction errors, and Elastic net regression as a regularized regression method Alibaba (BABA) Stock Price Forecasting (Python)
• Made a forecast of the future price of the stock Alibaba (BABA).
• Data Transform and Visualization: Transformed the dataset to a time series data set and plotted it
• Model Development: Built the LSTM model to have two layers with 50 neurons and 2 dense layers, one with 25 neurons and the other with 1 neuron