Resume

Data Analyst Python

Location:

San Francisco, CA

Salary:

80000

Posted:

August 03, 2020

Contact this candidate

Resume:

Jijun “David” Du

San Francisco, CA 628-***-**** ade19e@r.postjobfree.com linkedin.com/in/urdavid/ github.com/dujijundavid

EDUCATION

University of California, Davis San Francisco, CA

Master of Science, Business Analytics (3.5/4.0) June 2020

Highlighted Coursework: Machine Learning, Advanced Statistics, Experimental Design, Big Data, Data Visualization

Rutgers University New Brunswick, NJ

Bachelor of Science, Business Analytics, Minor: Computer Science (3.53/4.0), Dean’s List May 2019

Highlighted Coursework: Database Theory, Risk Modeling, Data Structure and Algorithm, Artificial Intelligence (AI)

ANALYTICAL EXPERIENCE

Reviewbox San Francisco, CA

Data Scientist, Practicum Project Oct 2019 – June 2020

As part of the UC Davis MSBA, engaging with this early stage technology startup on data science projects include ecommerce data extraction, predictive modeling. Worked on detecting fraudulent reviews for Reviewbox clients.

• Project Design: designed 20+ fraudulent indicators from Amazon user profile pages, review text, and review history.

• Data Collection: used Python to scrape 26K Amazon profile pages, 1.3K+ product diagnostics from Reviewmeta.

• Data Processing: built data pipeline to clean, aggregate, and transform quantitative, qualitative,temporal and text data. Used NLP’s vectorizers to transform text into similarity matrix. Applied stratified sampling for imbalanced dataset.

• Predictive Modeling: Created 15 fraud-related features based on reviewer behavior analysis and NLP (TF-IDF, Bag of Words) to develop a fraudulent-review classifier using Random Forest with 92% precision. Lauched product improve customer retention by 8%.

Creditease Beijing, China

Data Analyst Intern June 2019 – Aug 2019

Identified 2 alternative loan strategies in this top wealth management company in China. Report to Creditease private credit fund partners and data modelers.

• Optimized portfolio performance using R and SQL on various investment scale. The alternative loan selection strategy LTV based (risk-driven) decreased 16.5 % prepayment rate, 17% bad debt rate and improved adjusted return rate at 2%.

• Reconstructed Orchard index via simulations by exploring marketplace lending index & small business lending index, Backtesting result achieved 98% accuracy to the original index.

• Collaborated with colleagues to investigate 65 US fintech credit loan startups for fund investment decisions. Edited and crafted visualizations in quarterly investor relation reports.

• Built a VBA dashboard for data team to preprocess loan data and automate loan selection in excel xlsx format.

PROJECTS

Lending Club Credit Risk Modeling April 2020 – June 2020

• Built Expected Loss model by constructing three different models (Probability of Default, Loss Given Default and

Exposure at Default) using Logistic Regression, Linear Regression and Beta Regression in Python.

• Developed a special scorecard based on the logistic model to for non-tech user to evaluate credit score for borrowers.

• Evaluate model performance using AUC, Confusion Matrix and achieved accuracy of 73%

Quantitative Analysis - Discover Investment Portfolios Apr 2018 - May 2018

• Retrieved Yahoo Finance data with Quantmod with BS/IS/CF from historical dataset, from Match 1980 to Dec 2017.

• Iteratively determined investment portfolio quarterly, by filter stocks by three layers of combination of financial ratios.

• Back tested for portfolio performance and improves annual returns by 3.1% compare to benchmark of S&P 500.

Data Science Competition

Home Default Risk – Featured Competition on Kaggle June 2018 – Sep 2018

•Solo, ranking 1042/7198, top 14.5%.

•Implemented credit scoring algorithms to evaluate likelihood of pay back based on personal, credit, and bureau data.

•Explanatory data analysis with Orange/ Tableau, deploy missing data imputation strategy with RF regressor.

•Ensemble LightGBM & Xgboost model, use Bayesian optimization for hyperparameter tuning.

Skills

Programming: R, Python (Pandas, Numpy, scikit-learn, pyspark, tensorflow), SQL, Javascript, Java, Excel VBA, Matlab

Data Science: machine learning, ETL, metrics design, natural language processing (NLP), predictive modeling

Database: MySQL, PostgreSQL, MongoDB, Neo4j

Tools: Spark, AWS (practitioner certified), Tableau (Practioner certified), Git, R shiny, GCP, API

Contact this candidate