Sign in

Data Scientist

Santa Clara, CA
October 10, 2018

Contact this candidate


Edward Lim 443-***-**** *** Giannani Dr, Santa Clara, CA 95051


Data Scientist with three years of experience in statistics and Machine Learning, with focus on Natural Language Processing and Deep Learning


Master of Science in Analytics June 2016 - May 2017 Institute for Advanced Analytics, North Carolina State University, Raleigh, NC Master of Science, Bachelors of Science in Applied Mathematics and Statistics December 2015 Johns Hopkins University, Baltimore, MD


Data Scientist

Leoforce Inc – Raleigh, NC June 2017 - Present

• Developed a Resume Parser in Python, achieving over 80% precision and 75% recall on twelve different entities. Used Convolutional Neural Network and Conditional Random Fields architecture (CNN-CRF) with FastText embedding vectors for the Named Entity Recognition task.

• Deployed the Resume Parser in Amazon Web Services (AWS) with Docker on a Python Flask application. Reindexed 100 million resumes in ElasticSearch using the Resume Parser.

• Developed a date normalizer in Tensorflow with 98% accuracy using a Long Short Term Memory

(LSTM) encoder-decoder sequence-to-sequence architecture (seq2seq). Data Analyst

Red Hat – Raleigh, NC September 2016 - April 2017

• Improved targeting strategy of at-risk customers with customer segmentation and retention models. Created a Logistic Regression model in Scikit-Learn to predict the probability of a customer’s subscription not being renewed in a given time.

• Developed an unsupervised document clustering algorithm with Latent Dirichlet Allocation (LDA) topic modeling. Illustrated difference in topic popularity to assist marketing efforts. Research Assistant

Human Language Technology Center of Excellence – Baltimore, MD September 2014 – May 2015

• Analyzed 32 million tweets on depression, PTSD, and bipolar disorder to detect difference in language usage for subjects across different age groups and geographic regions

• Achieved 72% precision on document classification, separating control subjects from mental health diagnosed users, using XGBoost in Scikit-Learn on character n-gram text features

• Streamed tweets, stored in MongoDB, and preprocessed tweets using regular expressions Sergeant in Artillery

Korean Military Service – Guri, South Korea June 2012 – April 2014

• Planned firing drills with Captains and Majors by simulating possible scenarios, surveying terrain, ensuring safety of firing, and planning necessary equipment and task force Research Assistant in Graph Theory

Johns Hopkins University – Baltimore, MD September 2010 – May 2012

• Disproved Tsallis’ 30 year-long conjecture on graph connectivity, using four theorems

• Used Monte Carlo simulations in MATLAB to empirically support theoretical results

• Published results in Electronic Journal of Combinatorics and Congressus Numerantium SKILLS

• Programming: Python, R, SQL

• Scientific Computing: PyTorch, Tensorflow, Keras, Scikit-Learn, Pandas, NumPy, SpaCy

• Languages : Native in Korean and English

Contact this candidate