Yang (Young) Yue
** **** *** ****** #****, New York, NY 10023 814-***-**** ******@********.***
PROFESSIONAL EXPERIENCE
Argus Research Group, New York, NY Sep. 2018 – Dec. 2018 Data Scientist
● Established rule-based anomaly detection algorithm in Python to monitor stock transaction activities (abnormal pricing, inactive tickers, invalid transaction date, etc.); manipulated and ETL data in SQL server with SSIS to handle missing data, outliers, and prepared machine learning oriented tables.
● Developed machine learning, and deep learning models using SVM, Xgboost, Vanilla RNN, and LSTMs to predict ticker’s future closing price based on original features; reached 0.002 MSE on the testing set by cross-validating and stacking ensemble models.
● Explored new features by utilizing NLP techniques with NLTK, and spaCy to analyze news data, and calculating RSI, MA, volume change, price delta, day-of-week, and percent change; applied modern portfolio theory to visualize efficient frontier and locate EWMA, volatility and the best Sharpe ratio.
● Optimized and regularized models by applying feature importance analysis, correlation matrix, and GridsearchCV from Scikit-learn to autotuned parameters; minimized MSE to 0.0009 resulting in 38% higher returns than the market.
● Collaborated with VP to construct new API gateway and data pipeline on AWS Chalice and Redshift to stream and synchronize data from clients like Yahoo finance, and Fidelity. R&R Consulting/Credit Spectrum, New York, NY Jul. 2018 - Aug. 2018 Credit Risk Analyst
● Established Cash Flow model in Excel on 10+ ABS and RMBS deals (e.g., Ford, J.P. Morgan, etc.) to analyze prepayment received, default balance, ending pool balance, and available funds during each period.
● Cleaned and Validated delinquency, and payment history data of loans in recent 10 years with Waterfall Editor to construct datasets using VBA/Excel, and R, in collaboration with supervisor and data providers.
● Performed Monte Carlo simulation to forecast different default rates for actual delta IRRs with VBA/Excel and optimized crucial formulas during simulation to correspond to extreme scenarios (e.g., high default rates). Eviion LLC, State College, PA Jun. 2017 – Jul. 2018 Founder/CEO, Matcha Schedule
● Led a 7-person start-up team to designed and developed a scheduling iOS app exclusively for college students.
● Collected and analyzed around 600 users’ data in excel from questionnaires, google analytics, and in-app activities, and improved app functionalities based on users’ hobbies and preferences based on A/B testing. EDUCATION
Columbia University, New York, NY Sep. 2017 - Dec. 2018
● Master of Art, Statistics (Data Science track)
● Courses: Advanced Machine Learning, Image Machine Learning, Applied Data Science, Advanced Data Analysis. Pennsylvania State University, State College, PA Sep. 2013 - May. 2017
● Bachelor of Science, Risk Management
● Courses: Financial Mathematics, Time Series Analysis, Applied Regression Analysis, Finance. PROJECTS
Content-based Recommendation Engine Python Pyspark Project
● Applied natural language processing techniques with NLTK and Gensim to load and tokenize 20K books dataset.
● Stemmed tokenized corpora with PorterStemmer and created TF-IDF model with bag-of-words to extract most common and significant words for each of the 20K books.
● Built similarity matrix based on Cosine Similarity and Word2Vec model to calculate the distance between books and visualize their relationships using Hierarchy Dendrogram; recommend new books with high similarity scores to users. Optimization for Quant Trading Strategy Columbia Finance Python Project
● Selected 5 well-performed trading strategies (e.g., Volatility Contango, Long-Short Multifactor, etc.) on Quantopian
● Applied Markowitz theory to determine mean-variance-optimized portfolio and calculated Sharpe Ratio and VaR.
● Presented and wrote an ad hoc analysis report to conclude that using Factor Model to estimate returns and covariance matrix yields the best Sharpe of 2.38 and VaR comparing to other combinations. Image Recognition (CNN Tensorflow) Columbia ML Python Project
● Constructed a 25 layers Convolutional Neural Network on GPU to categorize CIFAR10 dataset and the accuracy reached 90% with data augmentation by rotating images to increase the training samples.
● Implemented ResNet and DenseNet using SGD as optimizer which improved accuracy to 93% and 95% respectively. TECHNICAL SKILLS
Programming: Python (Numpy, Pandas, Matplotlib, Scikit-learn, NLTK, spaCy, Gensim, Tensorflow), R, MATLAB Big Data: Hadoop (HDFS), Hive, Pig, Spark, AWS, MySQL, SQL Server, Teradata, PostgreSQL, Databricks, Cloudera