Andrew Benjamin Young
U.S. Citizen 415-***-**** ********@****.*****.*** San Francisco, CA
Education
University of San Francisco, San Francisco, CA 07/2019 – 06/2020 M.S. in Data Science
Scholarship recipient.
Relevant Courses: Machine Learning, Deep Learning, Relational Database, Data Acquisition, Distributed Data Systems, Experiment Design, Linear Regression Analysis, and Time Series Analysis. National Sun Yat-sen University, Kaohsiung, Taiwan 09/2014 – 01/2019 B.B.A in Information Management
Double Major in Political Economy.
Relevant Courses: Operation System, Computer Network, Data Structure, and Algorithm. Work Experience
Valimail, San Francisco, CA 10/2019 – 07/2020
Data Science Intern
Built an automatic classification pipeline for a 100k+ records dataset, which can detect untrusted domains with a recall rate of 95% and speed up the human labeling cycle two times faster.
Used Gaussian Mixture Models (GMMs) to cluster phishing domains.
Applied multi-processing to speed up the feature scraping process and saved 70% of the time.
Focused mainly on Natural Language Processing (NLP) and model building, such as extractive text summarization and Gradient Boosting Machine (GBM) from both scikit-learn packages and H2O AutoML. Competitions
Business Model and Big Data Analysis Contest 12/2018
Predicted ratings from Women’s E-Commerce Clothing Review and won third place.
Used XGBoost, Multilayer Perceptron (MLP), and Synthetic Minority Over-sampling Technique (SMOTE). T-Brain Machine Learning Competition 05/2018
Predicted the price of Taiwan ETF (Exchange Traded Funds) and ranked 12th out of 487 teams.
Built a Keras LSTM model and trained with Google Colab. Projects
“Impulses” (entrepreneur project) 03/2020 – 04/2020
Built a minimum viable product (MVP) that can send a notification to users via SMS to encourage the users to forgo spending on nonessentials and save the money instead.
Worked in the platform team and built the website using Bootstrap, Flask, and Plaid API.
Hosted the product using Elastic Beanstalk (EB) and used Sphinx to create documentation.
The project is selected to present to a panel of venture capitalists. Implementation of Image Compression 02/2020
Compressed images using self-built k-means++ written in Python.
Used t-Distributed Stochastic Neighbor Embedding (t-SNE) to reduce the 300 dimensions word embeddings and plot the relationship between each word using k-means++ to classify the clusters. Health Insurance Marketplace Prediction Using Census and Rent Cost 01/2020
Used Amazon Elastic MapReduce (EMR) to do the distributed computing and Amazon Simple Storage Service
(S3) to save the big data.
Predicted the insurance rate using Gradient Boosting Machine (GBM) and Random Forest.
Used Spark to perform data manipulation and feature engineering. Technical Skills
Languages and Packages: Python, SQL, PySpark, R, C, scikit-learn, statsmodels, pandas, SQLAlchemy, NLTK, spaCy, NumPy, Plotly, Beautiful Soup, Requests, Flask.
Tools: PostgreSQL, MySQL, PyTorch, Keras, H2O, Amazon Elastic MapReduce (EMR), Amazon Simple Storage Service (S3), Elastic Beanstalk, CodePipeline, Sphinx, Git, Tableau, JIRA.