Python, Machine Learning, Deep Learning, SQL

Location:

San Francisco, CA

Posted:

July 30, 2020

Contact this candidate

Resume:

Andrew Benjamin Young

U.S. Citizen 415-***-**** ********@****.*****.*** San Francisco, CA

Education

University of San Francisco, San Francisco, CA 07/2019 – 06/2020 M.S. in Data Science

Scholarship recipient.

Relevant Courses: Machine Learning, Deep Learning, Relational Database, Data Acquisition, Distributed Data Systems, Experiment Design, Linear Regression Analysis, and Time Series Analysis. National Sun Yat-sen University, Kaohsiung, Taiwan 09/2014 – 01/2019 B.B.A in Information Management

Double Major in Political Economy.

Relevant Courses: Operation System, Computer Network, Data Structure, and Algorithm. Work Experience

Valimail, San Francisco, CA 10/2019 – 07/2020

Data Science Intern

Built an automatic classification pipeline for a 100k+ records dataset, which can detect untrusted domains with a recall rate of 95% and speed up the human labeling cycle two times faster.

Used Gaussian Mixture Models (GMMs) to cluster phishing domains.

Applied multi-processing to speed up the feature scraping process and saved 70% of the time.

Focused mainly on Natural Language Processing (NLP) and model building, such as extractive text summarization and Gradient Boosting Machine (GBM) from both scikit-learn packages and H2O AutoML. Competitions

Business Model and Big Data Analysis Contest 12/2018

Predicted ratings from Women’s E-Commerce Clothing Review and won third place.

Used XGBoost, Multilayer Perceptron (MLP), and Synthetic Minority Over-sampling Technique (SMOTE). T-Brain Machine Learning Competition 05/2018

Predicted the price of Taiwan ETF (Exchange Traded Funds) and ranked 12th out of 487 teams.

Built a Keras LSTM model and trained with Google Colab. Projects

“Impulses” (entrepreneur project) 03/2020 – 04/2020

Built a minimum viable product (MVP) that can send a notification to users via SMS to encourage the users to forgo spending on nonessentials and save the money instead.

Worked in the platform team and built the website using Bootstrap, Flask, and Plaid API.

Hosted the product using Elastic Beanstalk (EB) and used Sphinx to create documentation.

The project is selected to present to a panel of venture capitalists. Implementation of Image Compression 02/2020

Compressed images using self-built k-means++ written in Python.

Used t-Distributed Stochastic Neighbor Embedding (t-SNE) to reduce the 300 dimensions word embeddings and plot the relationship between each word using k-means++ to classify the clusters. Health Insurance Marketplace Prediction Using Census and Rent Cost 01/2020

Used Amazon Elastic MapReduce (EMR) to do the distributed computing and Amazon Simple Storage Service

(S3) to save the big data.

Predicted the insurance rate using Gradient Boosting Machine (GBM) and Random Forest.

Used Spark to perform data manipulation and feature engineering. Technical Skills

Languages and Packages: Python, SQL, PySpark, R, C, scikit-learn, statsmodels, pandas, SQLAlchemy, NLTK, spaCy, NumPy, Plotly, Beautiful Soup, Requests, Flask.

Tools: PostgreSQL, MySQL, PyTorch, Keras, H2O, Amazon Elastic MapReduce (EMR), Amazon Simple Storage Service (S3), Elastic Beanstalk, CodePipeline, Sphinx, Git, Tableau, JIRA.

Contact this candidate