YingJun Pan
+1-201-***-**** • *****.***@*****.*** • ******@*******.***
EDUCATION
• Stevens Institute Of Technology Hoboken, NJ
Master of Science in Business Intelligence and Analytics May 2018
• Tianjin University of Technology TianJin, China
Combined Bachelor of Science in Logistics and Statistics Aug. 2013 Relevant Coursework: Database Systems, Process Optimization and Analytics, Multivariate Data Analytics, Machine Learning and Statistical Learning, Social Network Analytics, Text Mining, Marketing Analytics Certifications: SQL, Marketing Analytics, Project Management SKILLS
• Programming Languages/Packages: Python, Pandas, NumPy, BeautifulSoup, scikit-learn, Natural Language Toolkit
(NLTK), Latent Dirichlet Allocation (LDA), R, SQL, Git
• Data Visualization Tools: Tableau, Gephi, VOSviewer
• Big Data Tools: Hadoop
WORK EXPERIENCE
• Hangzhou Joyport Technology Co.Ltd Hangzhou, China business analyst Dec. 2013 - Dec. 2015
– Planed project promotion plan by targeting the segmentation and scheduling several promotion strategies on social media
– Created and edited promotional material which post on social media and official website for video and web games to successfully increase 1% more new gamer joined
– Examined and analyzed return on investment (ROI), customer lifetime value (CLV) and user to summarize the project performance
– Developed promotional strategies and estimated the investment amount for next time period, at the best case increased ROI by 1.9%
PROJECTS
• Biomedical Materials Topic Analysis Research - Wiley Publishing Text Mining and Machine Learning Jan. - May 2018
– Scraped 42,529 titles of Biomedical Material paper with 72 attributes from Web of Science using BeautifulSoup.
– Applied unsupervised learning methods, latent Dirichlet allocation (LDA), to generate most likely 24 topics of Biomedical Material.
– Manipulated and preprocessed data set to create feature vector, validated features by applying Analysis of Vari- ance (ANOVA) and selected high quality features by Random Forest.
– Built and trained Gaussian Naive Bayes, KNN, Logistic Regression, Support Vector Machine and used Majority vote to ensemble model, with the best accuracy score of 0.72 on validate module
• Credit Card Fraud Detection
Machine Learning-Imbalanced classes Nov. - Dec. 2017
– Explored numerical data Visualization by Python (matplotlib), normalized the data and found that data set is imbalanced.
– Applied Random Forest to analyze feature importance, and selected features that importance is greater than 1%
– Resolved the imbalanced data set by Synthetic Minority Oversampling Technique (SMOTE)
– Built and trained the modules with Logistic Regression and plot confusion matrix, Receiver Operating Character- istic (ROC) curve
– Evaluated by using cross validation and Grid search to select best parameter and penalty, and tried different threshold, plot Precision-Recall Curve to pick threshold with highest Area Under Curve (AUC) value
• Find restaurant Features
Natural Language Processing Jun. - Jul. 2017
– Scraped three cuisines’ reviews text data on Yelp, for each cuisines scraped 100,000 reviews, by using Beautiful- Soup
– Manually extracted key words from 5% data to build key words pool and labeled the rest of data base on it
– Classified key words into key features using Latent Dirichlet Allocation (LDA) and assign features back
– Annotated extra 1000 reviews for each cuisine as test data and get 98% accuracy