Sign in

Data Scientist, Data Analyst, Data Specialist, Data Visualization

Queens, NY, 11375
March 19, 2020

Contact this candidate


Liyi (Lily) Kuo

Forest Hills, New York, ***** Email: Phone: (917) 334–0629


NYC Data Science Academy, New York, New York January 2020

• Data Science certificate program involving over 420 hours of coursework

• Related Coursework: Machine Learning, Algorithms for Data Science, Foundations of Statistics and Probability, Exploratory Data Analysis & Visualization

Touro College, New York, New York June 2013

• Master of Science Secondary Education in Mathematics New York University, New York, New York June 2010

• Master of Science Biomedical Engineering

• Bachelor of Science Chemical and Biological Engineering PROJECTS

Credit Card Fraud Detection

• Distinguished fraudulent credit card transactions from genuine client transactions using clustering and classification methods with an accuracy of 95% and a roc_auc_score of 97% Lending Club - Predicting Loans with Positive Return on Investments

• Reverse engineered and determined the 7 most important loan approval criteria with 96% accuracy of the Lending Club in determining accepted vs. rejected loans through clustering and machine learning models

• Increased the mean ROI for investors and stakeholders from 8% to 15% by selecting fully paid-off loans at maturity, results validated by the modern portfolio theory and machine learning models Ames Housing Price Prediction

• Implemented hedonic pricing models and machine learning modeling algorithms to predict house prices in using 79 features

• Identified the top 5 important attributes in housing price prediction across the counties of Ames Iowa. The final stacked model achieved an R square value of 89.2%

Leading Cause of Deaths in NYC Dashboard

• Explored the leading causes of death in New York City from 2007-14 using a dataset with 1100 entries provided by the Department of Health and Mental Hygiene (DOHMH) with R

• Classified 5 top causes of deaths [Heart Diseases, Cancer, Accidents, Chronic Lower Respiratory Diseases, and Stroke] which offers useful insights for health care providers

Hotwire Web Scraper

• Created and developed an extension to the Hotwire website to search for the lowest 5 flight prices 3 days to the designated travel date using selenium


Language and Frameworks: R, Python, SQL, MS Office Machine Learning: PCA, GLM, Linear Regression, Logistic Regression, Random Forest, Decision Trees, AdaBoost, XG Boost, SVM, Gradient Boosting, A/B Testing, Time Serie Analysis Tools and DBMS: RStudio, Jupyter, Git, MySQL, Tableau, Hadoop, PySpark, Matplotlib, Scikit-Learn, Pandas, Numpy, Scipy Others: Probability Theory, Statistics, Financial Mathematics, Monte Carlo Simulation EXPERIENCE

Department of Education/Classroom Mathematics Teacher, New York, New York September 2018

• Analyzed department-wide student data in MS Excel, monitored student requirements and performance to secure 92-100% student

• passing rate in New York State standardized math exams

• Developed predictive classification model to analyze and previse student performances using various features (i.e grade records) and provided early intervention for students with the highest probability of dropping out

• Orchestrated over 4 concurrent projects simultaneously, strategically prioritizing each step of the project to meet deadlines NYU Langone Medical Center/ MIS Junior Research Scientist, New York, New York April 2011

• Designed and revised projects according to budget, schedule, and implemented the Friedman Test and Anderson-Darling Test to validate experimental procedures using R with confidence level of 0.95

• Coordinated clinical researches involving 600+ cases to evaluate implant stability of Total Knee Arthroplasty six months post- surgery using statistical data analysis

• Derived a mean resistive force of 5.6 1.2 N and 4.9 1.2 N for the LFC and MFC sites with a lab constructed indentation device, result analyzed using R and MS Excel

Contact this candidate