Data Scientist, Machine Learning Engineer

Location:

Boston, MA

Posted:

October 06, 2020

Contact this candidate

Resume:

Tsungyen (Gordon) Yeh

DATA SCIENTIST · MACHINE LEARNING ENGINEER

* ******** **, ********, ** 02135, USA

+1-857-***-**** ************@*****.*** www.linkedin.com/in/gordon-yeh Education

Northeastern University Boston, MA

M.S. IN DATA ANALYTICS ENGINEERING Sep. 2018 - Dec. 2020 GPA: 3.88

Shanghai Jiao Tong University Shanghai, China

B.S. IN MECHANICAL ENGINEERING Sep. 2013 - Aug. 2017 Work Experience

McKinsey & Company, Inc. Waltham, MA

PREDICTIVE ANALYTICS & DATA MODELING CO-OP Jan. 2020 – July. 2020

• Cooperated with Security Operation Team, responsible for developing machine learning algorithm for cyber-security incidents

• Experimented and replaced baseline heuristic with machine learning models for malicious domains/URL detection problem

• Engineered data pipeline that processes 50+ million data points and extracted 1300+ features per data point as ML dataset

• Experimented parallel CNN, and LSTM models for domains classifier and achieved an accuracy of 98.8% and false positive rate (FPR) 0.7%

• Modified LSTM (Long Short-term Memory) architecture for URL classification problem and achieved 99.1% accuracy and 0.8% FPR

• Built a full-stack web app that supports label verification and model retraining using Python-Flask that servedasan internal dashboard for ML algorithms

Source Data Corporation – Algorithm Department Shanghai, China DATA SCIENTIST INTERN - NATURAL LANGUAGE PROCESSING May. 2019 – Aug. 2019

• Experimented CNN, and LSTM architecture with word2vec for classifying news/non-news texts with 0.98 f1-score

• Implemented FastText bi-gram neural network to distinguish novels’ text with 0.95 f1-score, experimented self-learning training regime which further improved f1-score by 0.02

• Optimized graph query to retrieve identical events in Neo4j database less than 0.05 second per transaction

• Recalled 20%+ more data in preprocessing pipeline by deploying SVM models on spam news having structural problems

• Vectorized news titles using TF-IDF for entity merging and clustered 30,000 news into 10 super cluster using K-means with 90%+ accuracy Super-Air Compressed-air Technology Company Kaohsiung, Taiwan DATA ANALYST INTERN Jan. 2018 – Aug. 2018

• Led technical seminars for three dealers in Southeastern Asia, boosted sales in area by 20%

• Conducted regression analysis, ANOVA on compressor’s data in time series, help secured subsidy up to 40% of cost for customers Projects Experience

Gender Classifier by Face Project

NORTHEASTERN UNI. STATISTICAL ENGINEERING COURSE Sep. 2019 - Dec. 2019

• Visualized and prepossessed data through Pandas, matplotlib. Performed feature selection by SciKit-Learn

• Built fully connected neural network from scratch using Python-Numpy as prototype and achieved 80.7% accuracy

• Validated models with cross-validation to tune hyperparameters and avoid overfitting issue Fortune-Teller Full Stack Project

NORTHEASTERN UNI. DATABASE MANAGEMENT COURSE Sep. 2019 - Dec. 2019

• Led a team of four engineers building a fortune-teller system consisted of database, backend, and front-end

• Contributed to designing NLP Q&A algorithm using cosine similarity with accuracy up to 90%

• Built MySQL database with tables, functions. Developed corresponding data flow, entity relationship diagrams Knowledge & Skills

Software

• Python (Proficient) • Tensorflow/Keras • Flask/ HTML • R • Matlab

• RESTful API • Linux • Git • SQL/ Neo4j • Bash • AWS/Sagemaker/S3 Knowledge

• Machine Learning • Deep Learning • Artificial Intelligence • Natural Language Processing • Distributed System

• Cloud Computing • Algorithm / Data-Structure • Data Mining • Database Management • Engineering Statistics OCTOBER 6, 2020 GORDON YEH · RÉSUMÉ 1

Contact this candidate