Data Scientist

Location:

Long Island City, NY

Posted:

March 30, 2020

Contact this candidate

Resume:

Junzhi Sheng

**********@*****.***; 812-***-****; https://github.com/jsheng0901

SUMMARY

Highly self-motivated Data Scientist with two years data science experience focusing on Machine learning, NLP, and Computer Vision problem solving. Highly understand Python data manipulation based on business requirement. Immigration status: Green Card. Fluent in both English and Chinese. TECHNICAL SKILLS

Advanced: Python (Numpy, Pandas, Scipy, Spacy, NLTK, Gensim, Scikit-learn, Tensorflow, Keras), R (e1071, caret) Machine learning: Linear, Ridge, Lasso, Logistic Regression, Random Forests, SVM, XGB, KNN, PCA Deep learning: DNN, CNN, RNN, Bert Data visualization: Tableau, matplotlib (Python), ggplot2 (R) Median: GCP, AWS, SQL, Spark, Tableau, Git, Excel

EDUCATION

Columbia University New York City, NY

Master of Art in Statistics (GPA: 3.5) Sep 2018 – Dec 2019

Relevant Coursework: Statistics Machine learning, Deep Learning, Applied Data Science, NLP Indiana University Bloomington Bloomington, IN

Bachelor of Science in Math, Minor in Finance and Computer Science Aug 2014 - Aug 2018

Major GPA: 3.9/4.0; Overall GPA: 3.7/4.0; Honors: Awarded for academic excellence (top 5%) WORK EXPERIENCE

UTOFUN New York, USA

Data Scientist Intern Jun 2019 – Aug 2019

Daily responsibility: data integrity by excel or python, data extract by SQL, data visualization repot by tableau.

Project: Customer data analysis, based on customer searching behaviors on company website, analyze which feature mostly influence customer decision that they will finally contact agent.

Result: based on analysis results to let engineer change web design style and some functions. Finally, improved 10% monthly company website searching volume and also improved contact agents volume.

Method: Extracted data and merge data in SQL. Basically, clean and EDA. Building Logistic Regression, Random Forest and XGB model to analysis feature importance. Francis Peltast Partners New York, USA

Data Analytics Intern Oct 2018 – Dec 2018

Using gender, full or part-time status, etc. to predict the enrollment trend of undergraduate students in USA.

Method: Conducted stepwise rule to model, using t-test for stop rule. Then built Linear Regression. PROJECT

Teaching Machine Reading and Comprehension Oct 2019 – Dec2019

Use open CNN news data to teach machine answer queries based on context. Get over 60% accuracy percentage.

Method: build DeepLSTM and Attentive Reader two models combine pre-trained Stanford NLP word embedding. Movie Review Sentimental Analysis Aug 2019 – Sep2019

Applied RNN model to movie review text and obtained 70% accuracy on 5 different category sentiment labels.

Method: using trained embedding matrix combine LSTM, GRU, CNN built 3 models and ensemble the final result. Identify toxicity in online conversations Aug 2019 – Sep2019

Using online comments to classify six categories toxicity group labels serve points by applying NLP model.

Result: Obtained over 95% on whether comments are toxic and over 85% on six subgroups toxicity.

Method: using two pre-train embedding word matrix combine LSTM model. CNN in Humpback Whale Identification Jun 2019 – Jul2019

Identify whale images for 5005 categories and over 25000 images. Achieved 72% multiclass accuracy.

Method: CNN with data augmentation with regulization methods and also build transfer learning ResNet50. Machine Learning in Helping Navigate Robots May 2019 – Jun2019

Using sensor data to predict nine floor types the robot is on. Achieved above 82% multiclass classify accuracy.

Method: Aggregate some sensor data to build Random Forest, LBG and DNN models. Interests

Master of urheen (Chinese traditional instrument) perform for ten years; Brazilian Jiu-jitsu (Blue Belt), Fishing

Contact this candidate