Tsungyen (Gordon) Yeh
DATA SCIENTIST · MACHINE LEARNING ENGINEER
* ******** **, ********, ** 02135, USA
+1-857-***-**** adgo5g@r.postjobfree.com www.linkedin.com/in/gordon-yeh Education
Northeastern University Boston, MA
M.S. IN DATA ANALYTICS ENGINEERING Sep. 2018 - Dec. 2020 GPA: 3.88
Shanghai Jiao Tong University Shanghai, China
B.S. IN MECHANICAL ENGINEERING Sep. 2013 - Aug. 2017 Work Experience
McKinsey & Company, Inc. Waltham, MA
PREDICTIVE ANALYTICS & DATA MODELING CO-OP Jan. 2020 – July. 2020
• Cooperated with Security Operation Team, responsible for developing machine learning algorithm for cyber-security incidents
• Experimented and replaced baseline heuristic with machine learning models for malicious domains/URL detection problem
• Engineered data pipeline that processes 50+ million data points and extracted 1300+ features per data point as ML dataset
• Experimented parallel CNN, and LSTM models for domains classifier and achieved an accuracy of 98.8% and false positive rate (FPR) 0.7%
• Modified LSTM (Long Short-term Memory) architecture for URL classification problem and achieved 99.1% accuracy and 0.8% FPR
• Built a full-stack web app that supports label verification and model retraining using Python-Flask that servedasan internal dashboard for ML algorithms
Source Data Corporation – Algorithm Department Shanghai, China DATA SCIENTIST INTERN - NATURAL LANGUAGE PROCESSING May. 2019 – Aug. 2019
• Experimented CNN, and LSTM architecture with word2vec for classifying news/non-news texts with 0.98 f1-score
• Implemented FastText bi-gram neural network to distinguish novels’ text with 0.95 f1-score, experimented self-learning training regime which further improved f1-score by 0.02
• Optimized graph query to retrieve identical events in Neo4j database less than 0.05 second per transaction
• Recalled 20%+ more data in preprocessing pipeline by deploying SVM models on spam news having structural problems
• Vectorized news titles using TF-IDF for entity merging and clustered 30,000 news into 10 super cluster using K-means with 90%+ accuracy Super-Air Compressed-air Technology Company Kaohsiung, Taiwan DATA ANALYST INTERN Jan. 2018 – Aug. 2018
• Led technical seminars for three dealers in Southeastern Asia, boosted sales in area by 20%
• Conducted regression analysis, ANOVA on compressor’s data in time series, help secured subsidy up to 40% of cost for customers Projects Experience
Gender Classifier by Face Project
NORTHEASTERN UNI. STATISTICAL ENGINEERING COURSE Sep. 2019 - Dec. 2019
• Visualized and prepossessed data through Pandas, matplotlib. Performed feature selection by SciKit-Learn
• Built fully connected neural network from scratch using Python-Numpy as prototype and achieved 80.7% accuracy
• Validated models with cross-validation to tune hyperparameters and avoid overfitting issue Fortune-Teller Full Stack Project
NORTHEASTERN UNI. DATABASE MANAGEMENT COURSE Sep. 2019 - Dec. 2019
• Led a team of four engineers building a fortune-teller system consisted of database, backend, and front-end
• Contributed to designing NLP Q&A algorithm using cosine similarity with accuracy up to 90%
• Built MySQL database with tables, functions. Developed corresponding data flow, entity relationship diagrams Knowledge & Skills
Software
• Python (Proficient) • Tensorflow/Keras • Flask/ HTML • R • Matlab
• RESTful API • Linux • Git • SQL/ Neo4j • Bash • AWS/Sagemaker/S3 Knowledge
• Machine Learning • Deep Learning • Artificial Intelligence • Natural Language Processing • Distributed System
• Cloud Computing • Algorithm / Data-Structure • Data Mining • Database Management • Engineering Statistics OCTOBER 6, 2020 GORDON YEH · RÉSUMÉ 1