Junzhi Sheng
**********@*****.***; 812-***-****; https://github.com/jsheng0901
SUMMARY
Highly self-motivated Data Scientist with two years data science experience focusing on Machine learning, NLP, and Computer Vision problem solving. Highly understand Python data manipulation based on business requirement. Immigration status: Green Card. Fluent in both English and Chinese. TECHNICAL SKILLS
Advanced: Python (Numpy, Pandas, Scipy, Spacy, NLTK, Gensim, Scikit-learn, Tensorflow, Keras), R (e1071, caret) Machine learning: Linear, Ridge, Lasso, Logistic Regression, Random Forests, SVM, XGB, KNN, PCA Deep learning: DNN, CNN, RNN, Bert Data visualization: Tableau, matplotlib (Python), ggplot2 (R) Median: GCP, AWS, SQL, Spark, Tableau, Git, Excel
EDUCATION
Columbia University New York City, NY
Master of Art in Statistics (GPA: 3.5) Sep 2018 – Dec 2019
Relevant Coursework: Statistics Machine learning, Deep Learning, Applied Data Science, NLP Indiana University Bloomington Bloomington, IN
Bachelor of Science in Math, Minor in Finance and Computer Science Aug 2014 - Aug 2018
Major GPA: 3.9/4.0; Overall GPA: 3.7/4.0; Honors: Awarded for academic excellence (top 5%) WORK EXPERIENCE
UTOFUN New York, USA
Data Scientist Intern Jun 2019 – Aug 2019
Daily responsibility: data integrity by excel or python, data extract by SQL, data visualization repot by tableau.
Project: Customer data analysis, based on customer searching behaviors on company website, analyze which feature mostly influence customer decision that they will finally contact agent.
Result: based on analysis results to let engineer change web design style and some functions. Finally, improved 10% monthly company website searching volume and also improved contact agents volume.
Method: Extracted data and merge data in SQL. Basically, clean and EDA. Building Logistic Regression, Random Forest and XGB model to analysis feature importance. Francis Peltast Partners New York, USA
Data Analytics Intern Oct 2018 – Dec 2018
Using gender, full or part-time status, etc. to predict the enrollment trend of undergraduate students in USA.
Method: Conducted stepwise rule to model, using t-test for stop rule. Then built Linear Regression. PROJECT
Teaching Machine Reading and Comprehension Oct 2019 – Dec2019
Use open CNN news data to teach machine answer queries based on context. Get over 60% accuracy percentage.
Method: build DeepLSTM and Attentive Reader two models combine pre-trained Stanford NLP word embedding. Movie Review Sentimental Analysis Aug 2019 – Sep2019
Applied RNN model to movie review text and obtained 70% accuracy on 5 different category sentiment labels.
Method: using trained embedding matrix combine LSTM, GRU, CNN built 3 models and ensemble the final result. Identify toxicity in online conversations Aug 2019 – Sep2019
Using online comments to classify six categories toxicity group labels serve points by applying NLP model.
Result: Obtained over 95% on whether comments are toxic and over 85% on six subgroups toxicity.
Method: using two pre-train embedding word matrix combine LSTM model. CNN in Humpback Whale Identification Jun 2019 – Jul2019
Identify whale images for 5005 categories and over 25000 images. Achieved 72% multiclass accuracy.
Method: CNN with data augmentation with regulization methods and also build transfer learning ResNet50. Machine Learning in Helping Navigate Robots May 2019 – Jun2019
Using sensor data to predict nine floor types the robot is on. Achieved above 82% multiclass classify accuracy.
Method: Aggregate some sensor data to build Random Forest, LBG and DNN models. Interests
Master of urheen (Chinese traditional instrument) perform for ten years; Brazilian Jiu-jitsu (Blue Belt), Fishing