Jijun “David” Du
San Francisco, CA 628-***-**** ade19e@r.postjobfree.com linkedin.com/in/urdavid/ github.com/dujijundavid
EDUCATION
University of California, Davis San Francisco, CA
Master of Science, Business Analytics (3.5/4.0) June 2020
Highlighted Coursework: Machine Learning, Advanced Statistics, Experimental Design, Big Data, Data Visualization
Rutgers University New Brunswick, NJ
Bachelor of Science, Business Analytics, Minor: Computer Science (3.53/4.0), Dean’s List May 2019
Highlighted Coursework: Database Theory, Risk Modeling, Data Structure and Algorithm, Artificial Intelligence (AI)
ANALYTICAL EXPERIENCE
Reviewbox San Francisco, CA
Data Scientist, Practicum Project Oct 2019 – June 2020
As part of the UC Davis MSBA, engaging with this early stage technology startup on data science projects include ecommerce data extraction, predictive modeling. Worked on detecting fraudulent reviews for Reviewbox clients.
• Project Design: designed 20+ fraudulent indicators from Amazon user profile pages, review text, and review history.
• Data Collection: used Python to scrape 26K Amazon profile pages, 1.3K+ product diagnostics from Reviewmeta.
• Data Processing: built data pipeline to clean, aggregate, and transform quantitative, qualitative,temporal and text data. Used NLP’s vectorizers to transform text into similarity matrix. Applied stratified sampling for imbalanced dataset.
• Predictive Modeling: Created 15 fraud-related features based on reviewer behavior analysis and NLP (TF-IDF, Bag of Words) to develop a fraudulent-review classifier using Random Forest with 92% precision. Lauched product improve customer retention by 8%.
Creditease Beijing, China
Data Analyst Intern June 2019 – Aug 2019
Identified 2 alternative loan strategies in this top wealth management company in China. Report to Creditease private credit fund partners and data modelers.
• Optimized portfolio performance using R and SQL on various investment scale. The alternative loan selection strategy LTV based (risk-driven) decreased 16.5 % prepayment rate, 17% bad debt rate and improved adjusted return rate at 2%.
• Reconstructed Orchard index via simulations by exploring marketplace lending index & small business lending index, Backtesting result achieved 98% accuracy to the original index.
• Collaborated with colleagues to investigate 65 US fintech credit loan startups for fund investment decisions. Edited and crafted visualizations in quarterly investor relation reports.
• Built a VBA dashboard for data team to preprocess loan data and automate loan selection in excel xlsx format.
PROJECTS
Lending Club Credit Risk Modeling April 2020 – June 2020
• Built Expected Loss model by constructing three different models (Probability of Default, Loss Given Default and
Exposure at Default) using Logistic Regression, Linear Regression and Beta Regression in Python.
• Developed a special scorecard based on the logistic model to for non-tech user to evaluate credit score for borrowers.
• Evaluate model performance using AUC, Confusion Matrix and achieved accuracy of 73%
Quantitative Analysis - Discover Investment Portfolios Apr 2018 - May 2018
• Retrieved Yahoo Finance data with Quantmod with BS/IS/CF from historical dataset, from Match 1980 to Dec 2017.
• Iteratively determined investment portfolio quarterly, by filter stocks by three layers of combination of financial ratios.
• Back tested for portfolio performance and improves annual returns by 3.1% compare to benchmark of S&P 500.
Data Science Competition
Home Default Risk – Featured Competition on Kaggle June 2018 – Sep 2018
•Solo, ranking 1042/7198, top 14.5%.
•Implemented credit scoring algorithms to evaluate likelihood of pay back based on personal, credit, and bureau data.
•Explanatory data analysis with Orange/ Tableau, deploy missing data imputation strategy with RF regressor.
•Ensemble LightGBM & Xgboost model, use Bayesian optimization for hyperparameter tuning.
Skills
Programming: R, Python (Pandas, Numpy, scikit-learn, pyspark, tensorflow), SQL, Javascript, Java, Excel VBA, Matlab
Data Science: machine learning, ETL, metrics design, natural language processing (NLP), predictive modeling
Database: MySQL, PostgreSQL, MongoDB, Neo4j
Tools: Spark, AWS (practitioner certified), Tableau (Practioner certified), Git, R shiny, GCP, API