YUXUAN (SHERRY) ZENG
** ****** ****** *** ****, Jersey City, NJ 07302 646-***-**** *************@*****.*** PROFESSIONAL EXPERIENCES
Lighthouse Guild, HealthCare Data Analytics New York, NY Data Scientist Jun 2018 - Jan 2019
§ Clustered and distinguished vision disease sub-types and stages leveraging K-Means Analysis in R for clinics
§ Migrated patient’s data from various data sources (flat files, SQL Server) to hdfs and mapped to Hive tables to build data lakes, contributing to the increase of the speed and space within Hadoop Big Data ecosystem
§ Automated weekly data wrangling processes for clinical history of 30K+ patients by constructing Alteryx workflow macros, boosting efficiency by saving 2 hours per day
§ Created demographic heat map reports via Tableau to deliver business analytical results internally as well as to fulfill compliance requirements from State Department of Health Diligence Vault FinTech New York, NY
Data Scientist Intern Jan 2018 - Apr 2018
§ Initiated a sentence similarity detector to improve the due-diligence question platform between investors and asset managers leveraging natural language processing (NLP) in Python and Tensorflow
§ Improved forecasting accuracy by embedding sentences in deep averaging network encoder comparing to methods like Word2Vec and GloVe embedding in Gensim
§ Optimized the digital Q&A process and enhanced efficiency by eliminating 72% question redundancy Crushh App Messaging New York, NY
Data Scientist Intern Sep 2017 - Mar 2018
§ Built 20+ message classifiers for various user features (relationship, gender, age, education, income, FICO score) by constructing machine learning pipelines to stack models of SVM, Naïve Bayes, Logistic Regression
§ Analyzed 400M+ text messages from 100K+ users with SQL for NLP purposes and vectorized in bag-of-words model with tf-idf transformation via TextBlob, scikit-learn, pandas, numpy and matplotlib library of Python
§ Created web dashboards to deliver intermediate analysis results and performed API testing with Postman
§ Deployed data migration and added new features into the live App with Git version control in Bash Shell RESEARCH EXPERIENCES
Quantitative P2P lending Investment Jun 2017 - Sep 2017
§ Developed a loan underwriting strategy with 2-year observations from Lending Club website, and achieved 3% higher Return on Investment (ROI) over the passive average basket loans
§ Constructed a loan default rate model by analyzing 100+ loan features, prioritizing 20 variables, and leveraging XGBoost and Ridge & Lasso regularization to avoid overfitting, and visualized results with ggplot in R Sector-Based Sentiment Trading Mar 2017 - Apr 2017
§ Built a long-short portfolio based on risk-parity model and determined weights with Principle Component Regression, achieving a 15% annual return
§ Collected historical corporate news, stock prices and fundamental financials of strong stocks in leading sectors from Bloomberg over two weeks for quantitative analysis and text mining
§ Utilized the bi-gram model to parse relevant news into numeric representation with NLTK in Python, applied Sentiment Analysis, and predicted the news impact to stock movement using Random Forest Model EDUCATION
Columbia University New York, NY
MS in Operations Research, Data Analytics Concentration Aug 2016 - Feb 2018 Nanjing University Nanjing, China
BE in Financial Engineering, 2015 Outstanding Students of Nanjing University Sep 2012 - Jun 2016 1st Class Award of 2015 Mathematical Contest in Modeling (Top 9% of 7636 teams worldwide) Feb 2015 SKILLS & INTERESTS
Programming: Python, SQL, Java, C; Statistical Modeling: R, Matlab, Tensorflow; Business Analytics: Tableau, Alteryx; Big Data: Hadoop Spark & Hive; Finance: CFA Level II Candidate Violin: Principal violinist in Nanjing University Symphony Orchestra, 17 years’ experience