Social Media Data

Location:

Hoboken, NJ

Posted:

August 01, 2018

Contact this candidate

Resume:

YingJun Pan

+1-201-***-**** • *****.***@*****.*** • ******@*******.***

EDUCATION

• Stevens Institute Of Technology Hoboken, NJ

Master of Science in Business Intelligence and Analytics May 2018

• Tianjin University of Technology TianJin, China

Combined Bachelor of Science in Logistics and Statistics Aug. 2013 Relevant Coursework: Database Systems, Process Optimization and Analytics, Multivariate Data Analytics, Machine Learning and Statistical Learning, Social Network Analytics, Text Mining, Marketing Analytics Certifications: SQL, Marketing Analytics, Project Management SKILLS

• Programming Languages/Packages: Python, Pandas, NumPy, BeautifulSoup, scikit-learn, Natural Language Toolkit

(NLTK), Latent Dirichlet Allocation (LDA), R, SQL, Git

• Data Visualization Tools: Tableau, Gephi, VOSviewer

• Big Data Tools: Hadoop

WORK EXPERIENCE

• Hangzhou Joyport Technology Co.Ltd Hangzhou, China business analyst Dec. 2013 - Dec. 2015

– Planed project promotion plan by targeting the segmentation and scheduling several promotion strategies on social media

– Created and edited promotional material which post on social media and official website for video and web games to successfully increase 1% more new gamer joined

– Examined and analyzed return on investment (ROI), customer lifetime value (CLV) and user to summarize the project performance

– Developed promotional strategies and estimated the investment amount for next time period, at the best case increased ROI by 1.9%

PROJECTS

• Biomedical Materials Topic Analysis Research - Wiley Publishing Text Mining and Machine Learning Jan. - May 2018

– Scraped 42,529 titles of Biomedical Material paper with 72 attributes from Web of Science using BeautifulSoup.

– Applied unsupervised learning methods, latent Dirichlet allocation (LDA), to generate most likely 24 topics of Biomedical Material.

– Manipulated and preprocessed data set to create feature vector, validated features by applying Analysis of Vari- ance (ANOVA) and selected high quality features by Random Forest.

– Built and trained Gaussian Naive Bayes, KNN, Logistic Regression, Support Vector Machine and used Majority vote to ensemble model, with the best accuracy score of 0.72 on validate module

• Credit Card Fraud Detection

Machine Learning-Imbalanced classes Nov. - Dec. 2017

– Explored numerical data Visualization by Python (matplotlib), normalized the data and found that data set is imbalanced.

– Applied Random Forest to analyze feature importance, and selected features that importance is greater than 1%

– Resolved the imbalanced data set by Synthetic Minority Oversampling Technique (SMOTE)

– Built and trained the modules with Logistic Regression and plot confusion matrix, Receiver Operating Character- istic (ROC) curve

– Evaluated by using cross validation and Grid search to select best parameter and penalty, and tried different threshold, plot Precision-Recall Curve to pick threshold with highest Area Under Curve (AUC) value

• Find restaurant Features

Natural Language Processing Jun. - Jul. 2017

– Scraped three cuisines’ reviews text data on Yelp, for each cuisines scraped 100,000 reviews, by using Beautiful- Soup

– Manually extracted key words from 5% data to build key words pool and labeled the rest of data base on it

– Classified key words into key features using Latent Dirichlet Allocation (LDA) and assign features back

– Annotated extra 1000 reviews for each cuisine as test data and get 98% accuracy

Contact this candidate