Sign in

Data Python

College Park, Maryland, United States
June 10, 2019

Contact this candidate



765-***-******** Creighton Dr, College Park, MD 20740 ■ EDUCATION College Park, MD

Robert H. Smith School of Business, University of Maryland Aug 2017-May 2019 Master of Science in Quantitative Finance

Krannert School of Management, Purdue University West Lafayette, IN Bachelor of Science in Financial Counseling & Planning, Minor: Management Aug 2013-May 2017 PROFESSIONAL EXPERIENCE (Model Validation and Empirical Cumulative Default Rate Distribution Simulation) Experience Consulting Project (Freddie Mac) College Park, MD Empirical Cumulative Default Rate Distribution Simulation Oct 2018-Dec 2018

• Used SAS to create two stochastic equations (ARMA model and GARCH model) from Moody’s historical interest rates and HPA data

• Run 500 paths simulation on 50,000 loan samples over 40 quarters time periods to generate unconditional default rate distribution, Incorporate simulation results into behavioral equations for further model adjustment

• Concluded 7.2% cumulative default rates on 90 percentiles is close to the result of historical cumulative default rates at the same position Freddie Mac Mclean, VA

Quantitative Analytic Graduate Intern June 2018-Aug 2018

• Constructed Chapters of Model Validation PlayBook: Segmentation; Variables Reductions; Model Fit; Model Performance; and Machine Learning Models; provided clear guidelines for base teams to conduct model validations

• Conducted Peer Review on model validation reports based on the criteria of the Internal Auditing; specified findings which are not in consistency

• Collected the findings and Communicated with each base team director, asked justification and suggestions for improvement; helped the Senior Vice President understand the work of each base team from a high management level DATA COMPEITION (Data Visualization)

Smith Datathon 2019 College Park, MD

Capital Bikeshare and DC Taxi Market Analysis Mar 2019-Mar 2019

• Mapped the distribution for both bikeshares and taxis based on geographic coordinate and zip codes; averaged time, mileages and profit within each area, created dashboard in Tableau for both bikeshare distribution and DC taxi distribution to identify the start and end location pattern

• Summarized the highest frequency taxi area with average 94 minutes duration but average $11 profits and average 13 mileages, concluded the market is not efficient; in the future, strategically the DC taxi company should regain the market share by targeting the areas which have less bike stations for round trip DC Data Challenge 2019 College Park, MD

Maryland Small Business Development Center (SBDC) Individual Consulting Services Feb 2019-Mar 2019

• Analyzed the consulting data from SBDC, graphed the relationships between the consulting impacts and the company financial situation in tableau, Concluded the consulting impact worked more significantly on small companies which starts up within one year.

• Discovered in each county, the higher the percentage the industry is in the county, the higher the possibilities the companies in the industry will succeed.

• Created the histogram to separate companies in terms of whether it succeed in starting the business. Disclosed the average consulting time for both type of business is within 5 hours and after 5 hours, the more consulting time a small business spends with SBDC, the more successful rate the company has. ACADEMIC EXPERIENCE (Model Prediction)

Big Data Project College Park, MD

Yelp Review Nov 2018-Dec 2018

• Joined the yelp review and yelp business datasets using Apache Pig and AWS Elastic MapReduce cluster to select only the dentist information

• Conducted text mining on the dentist review contents by created document terms frequencies matrix and labeled each document based on sentiment analysis

• Built logistic regression in PySpark and predicted 90% accuracy, created another random forest challenge model for model validation and model comparison

• Concluded customers tend to find negative reviews are more useful than positive reviews and customers do not personally favor the review attitudes too much. Data Mining Project College Park, MD Prediction Contest Mar 2018-May 2018

• Led 4-member team to clean the Airbnb data set with 70 features of different raw variables; leveraged regression models to determine the correlation between high-booking-rate and explained variables; Built machine learning predictive model which captures the highest accuracy of high booking rate

• Based on our Champion logistic regression model, concluded the lower price, lower clearning fee and lower minimum_nights have significant positive correlation with high-booking-rate and suggested that hosts used our XGboosting model to determine whether their new business is likely successfully or not ADDITIONAL INFORMATION

• Machine Learning in R (2 year) and Python (1 year): Decision and Regression Tree, K-means, Ridge, LASSO, Neural Networks, Bayesian Networks

• Package and Module: data.table(R), dplyr(R), lubridate(R), numpy(Python), pandas(Python), sklearn(Python)

• Time Series Analysis in SAS (1 year): EWMA models, ARCH models, SAS MACRO, SAS SQL

• Relational Database: Microsoft SQL Server (1 year), BigQuery (1year), Relations, Functional Dependency, Normalization, Business rule, Referential Integrity

• Data Visualization: Tableau(1year), Lucidchart (1 year), ggplot(R), matplotlib(Python), seaborn(Python)

• Big Data Analysis in Hadoop (1 year): Hue, YARN, Sqoop, Pig, Hive, Impala, Spark; AWS: EMR, S3, EC2, CLI, SSH

Contact this candidate