YANG MENG
917-***-**** *** West **nd St, **Q, New York, NY 10036 ******@********.*** LinkedIn.com/in/yang96 EDUCATION
Columbia University, New York, NY 09/2018 - 12/2019 Master of Arts in Statistics GPA: 4.13/4.00 The 2019 Chair’s List Award Courses: Bayesian Statistics, Statistical Machine Learning, Applied Data Science, Advanced Data Analysis Zhejiang University, Hangzhou, Zhejiang, China 09/2014 - 07/2018 Bachelor of Natural Sciences in Statistics GPA: 3.75/4.00 Top 10% Electives: Financial Mathematics, Economics, Management, Data Structure, Logics, Database Technology TECHNICAL SKILLS
• Programming: SQL, Python (Pandas, scikit-learn, Tensorflow, NLTK), R (dplyr, glmnet), Excel VBA
• Business Intelligence: Tableau, shiny, ggplot2, matplotlib, Pivot table
• Machine Learning: PCA, Clustering, LDA, Random Forest, Decision Trees, Boosting, SVM, Neural Network PROFESSINAL EXPERIENCE
Columbia University, New York, NY 01/2020 - Present Associate, Statistical Machine Learning
• Implemented machine learning algorithms in up-to-date research papers by Python and R, such as a survival analysis-based neural network hidden layer selection, a kNN pattern-based feature subspace learning process, and a knockoff filter-based regression coefficients directional control method
• Organized ~3.7M social media data to subtract key variables, explored the social network analysis using data science algorithms including minimum spanning tree and edge counting approaches, and presented the research outcomes to the statistics Ph.D. club weekly
• Created a sample testing method reached from weighted log-rank statistics, compared it with other well-known testing methods, then summarized their behaviors in different mixture distribution cases Kagemusha International Fund, LLC, New York, NY 12/2018 - 01/2019 Data Analyst Intern, Statistical Modeling
• Extracted 150+ records of casino rules and facilities from the top 50 US casinos, combined them with the existing global data, and regularized 200+ records by cleaning and converting redundant columns into categorical variables
• Reduced cost by 10% and increased the income by 0.3+% by qualifying gaming principles and classifying Blackjack games using Decision Trees in Python
• Evaluated 4 different shuffling methods, designed the poker games for casinos, visualized the results in pivot table, and presented the reports to the lay audience weekly Bank of Communications, Nanjing, Jiangsu, China 05/2018 - 07/2018 Business Analyst Intern, Marketing Insights
• Reduced the system response time from ~60s to ~30s by redesigning the ER Diagram of the employee assessment system, optimizing key query statements in SQL, and integrating matching workforce information
• Tracked 10+ different KPIs related to customer conversation, identified financial products metrics, and analyzed operational data to drive new acquisition, then recommended and presented innovative personalized marketing strategies to the IT department
PROJECTS
Data Visualization: NYC Crime Activities Analysis 10/2019 - 11/2019
• Extracted and cleaned ~6.8M records to identify the linear correlation between crime rate and 7+ factors such as months, locations, weather conditions, and summarize the outdoor safety guidelines based on districts
• Visualized the monthly distribution of crime types using heat maps and bar plots, created Tableau storybook, developed an R shiny app of New York citizen safety guidance, and introduced the application to the students and local residents
Data Science: Facial Emotion Recognition and Classification 10/2019 - 11/2019
• Constructed facial emotions extracted from 2,000+ training images with 20+ different emotions, and reduced the dimension from 6,006 to 50 using PCA and other creative feature selection methods
• Achieved 70+% accuracy and high efficiency of <10s for a 500-image dataset by constructing an innovative combination classifier of LDA and SVM to improve special emotions classification Natural Language Process: Lyrics Text Analysis in Music 9/2019 - 10/2019
• Created and extracted stemmed words from 15,000+ lyrics data, applied N-gram text model to different genres and visualized the related elements in word cloud plots
• Explored the reasons behind the anti-trend of word frequency, interpreted the preference of lyric topics in different states over time in R shiny, and reported the music development history to a non-tech class