Post Job Free

Resume

Sign in

Data Analyst

Location:
Arlington, VA
Posted:
February 21, 2021

Contact this candidate

Resume:

SCARLETT WANG

202-***-**** **** S Eads St, Arlington, VA 22202 adkdbg@r.postjobfree.com www.linkedin.com/in/scarlett1204 EDUCATION

George Washington University Washington, DC

Master of Science, Business Analytics GPA: 3.62/ 4.0 Aug 2019 - Jan 2021

• Relevant Courses: Data Mining, Machine Learning, Big Data analytics, Optimization, Time Series Forecasting, Statistics for Analytics, Consulting for Analytics, Decision and Risk Analytics, Data Management for Analytics Hubei University Wuhan, China

Bachelor of Economics, Finance GPA: 3.5/4.0 Sept 2015 - Jun 2019

• Relevant Courses: Financial Engineering, Corporate Finance, Financial Mathematics, Probability and Statistics Bachelor of Engineering, Electronic Science and Technology

• Relevant Courses: Linear Algebra, Micro-Computer Application, Embedded System Design, C Programming SKILLSETS

• Explanatory Analysis: Tableau, R Shiny, Plotly & Dash in Python, Excel (Vlookup, Pivot Table, VBA, Solver)

• Coding Languages: MySQL, SAS, Python (NumPy, Pandas, math, Matplotlib, Seaborn, Scikit-learn, Gurobi), Linux, R EXPERIENCE

Data Analyst Practicum Project at IBM Aug 2020 – Dec2020 Washington, DC

• Web-scraped financial metrics and financial news of financial institutions using Selenium in Python. Teamed up with colleagues, performing missing data imputation and variable standardization to improve data quality. Developed a risk level classification model for 4000+ third-party vendors using Random Forest, achieved model AUC value up to 0.81.

• Carried out model explanation and importance of each feature using InterpretML dashboard. Collaborated with other interns to draft analytics reports and presented model deliverables to clients biweekly. Data Analyst Intern at Pinduoduo Inc. (NASDAQ: PDD) Apr 2019 – Jul 2019 Shanghai, China

• Utilized SQL (JOIN, Windows Function, Aggregate Function etc.) to query daily sales data from ERP system. Performed ad-hoc data analysis to monitor data fluctuations and abnormalities using Dash and Plotly in Python, exhibited analysis results and business insights to manager weekly.

• Implemented variable clustering and reduced data dimensionality from 27 features to 13. Built LASSO regression model for sales prediction, with achieving Root Mean Square Error down to 0.3371. Under the guidance of sales predictive model, the aged inventory decreased by 10.2% compared with the same period in 2018. Analyst Intern at Changjiang Securities (SZSE:000783) May 2018 – Aug 2018 Wuhan, China

• Assisted senior analyst to write special topic reports on pension target date funds (TDF) including industry analysis and glide path strategy research. Fetched historical data of selected portfolio through Wind Economic Database and performed data backtesting using SAS (%MARCRO, STAT, GRAPH etc.).

• Constructed competitive product analysis and dynamically updated evaluation system with indicators such as business income structure, profit & profit growth, and wealth management fundraising. RELEVANT PROJECTS

Iowa Housing Price Feature Engineering and Forecast Jun 2020 Washington, DC

• Based on the machine learning framework H20, feature engineering was performed on the historical data of Iowa house sales prices and 79 explanatory variables. Predicted the house price using GBM (Gradient Boosting Machine) and achieved Root Mean Squared Log Error down to 72.5.

• Built an animated dashboard of house location tracking with important indicators of price such as neighborhood, total living area, number of parking spaces etc. The project won the top 20% of the Kaggle competition. For project details, see: https://github.com/scarlett1204/Kaggle-housing-price.git US Crude Oil Price Time Series Forecast Apr 2020 Washington, DC

• Based on the US crude oil price data and its associated variable, used the SAS time series analysis system to fit the time series of crude oil prices (including univariate models such as SARIMA, seasonal + linear Model and multivariate model (such as TF model).

• The square root of variance of the best model is 5.06, and MAE is 3.98. Air Pollutant Concentration Analysis in the US Nov 2019 Washington, DC

• Analyzed the air pollution trend for 3 different states (CA, TX, VA) during 2008 - 2017. Include a stroke mortality rate dataset and a GDP by industry dataset for these 3 states to find out the correlation between manufacturing performances or stroke mortality rate and air pollution.

• Used Tableau to draw a geolocation map presenting the pollutant concentration comparison of 3 states, created a line chart to represent the pollutant concentration trends over the years for these states. For project details, see: https://public.tableau.com/profile/scarlett.wang8370#!/vizhome/tableau_project_15756948621130/airpollutantconcentration



Contact this candidate