Xuanxuan (Summer) Xue
******@******.*** 614-***-**** LinkedIn GitHub
Work Authorization
Green Card holder, authorized to work in U.S
EDUCATION
Georgia Institute of Technology, Master of Science in Analytics Present Relevant Course: Intro to Analytics Modeling, Regression, Data Analytics & Business, Machine Learning for Trade, Probability and statistics, Computational Data Analytics, Data and Visual Analytics The Ohio State University, Bachelor of Science in Computer Science and Engineering December 2016 Relevant Course: Data Structure, Algorithms, System, Database Administration, Principle of Programming language SKILLS
• Programming languages: Python, R, SQL, Java, HTML, JavaScript, CSS, D3, Scala
• Tools: Anaconda, pandas, SQLite, Tableau, Agile, Git, AWS, Spark, Azure, Docker, GCP, Databricks, Excel VBA, OpenRefine, NumPy
• Statistics test: correlation test, goodness of fit test, VIF test, f-test, t-test, mean squared error, confident interval, p- value, R-squared, Cook’s distance, chi-squared, ANOVA, AIC, BIC PROJECTS
Co-actor Graph (Python Node-edge visualization) August 2020
• Visualized a co-actor graph that a node represented an actor/actress, and an edge between two nodes indicates the two actors/actresses acted in a movie together
• Obtained real-time data using API for The Movie Database. Initialized a Graph object with a single node representing Meryl Streep, selected top 3 co-actors as new nodes in each of her movie credit that have a vote average greater than 8.0; for each of the new added nodes, add their top 3 co-actors as new nodes and iterated this process 3 times
• User can see the co-actor network graph of the top 10 actors/actress who have the most edges: visualization Covid-19 Risk Prediction (Supervised Learning Models with Python) July 2020
• Predicted Covid-19 risk by 36 county-level demographic features of Georgia state
• Extracted public dataset from New York Times and U.S Census Bureau; counties that have more case than median of Georgia defined as high risk; fitted Logistic, Ridge, Lasso regression, Decision tree, Random Forest, GMM, KNN, K- means, Naïve Bayes, Neural Network and SVM models
• Performed model selection that Neural Network have highest accuracy rate of 81.25% Boston Housing Price Prediction (Regression model with R) April 2020
• Used R to build regression models base on 14 features of houses around Boston area to predict house prices
• Fitted linear regression model as the benchmark, performed feature engineering on ANOVA, Mallows’ Cp value, stepwise regression, Ridge regression model, Lasso regression and Elastic net regression model
• Decreased MSE from 24.51 to 22.81; Lasso and Elastic net are best models with smallest MSE, 11 out of 13 independent various were selected
Online Book Store Database Design August 2015
• Developed an online bookstore database design using SQLite as a group project
• Designed online bookstore ER model and relational schema
• User can query books information in this database, such as all books wrote be a certain author, all books and date of purchase made by a certain customer, the number of a book left in storage, etc. WORK EXPERIENCE
Coordinator at Mentoring for Christ-Centered Home (MCCH) February 2017 - Mar 2018
• Collecting the requests and issues from MCCH owner and realized these requests on the official website
• Designing, developing, and maintaining MCCH official site: http://mentoring4christ.net/
• Teaching seniors basic networking and IT knowledge with non-technological terms Student Assistant of Safety Department in Green River College February 2013 - June 2014
• Answered phone calls from students and cooperated with drivers to give students rides