Post Job Free
Sign in

Python, R, SQL, Scala, Anaconda, pandas,SQLite, Tableau, Agile, Git

Location:
Greenville, SC
Posted:
November 28, 2020

Contact this candidate

Resume:

Xuanxuan (Summer) Xue

******@******.*** 614-***-**** LinkedIn GitHub

Work Authorization

Green Card holder, authorized to work in U.S

EDUCATION

Georgia Institute of Technology, Master of Science in Analytics Present Relevant Course: Intro to Analytics Modeling, Regression, Data Analytics & Business, Machine Learning for Trade, Probability and statistics, Computational Data Analytics, Data and Visual Analytics The Ohio State University, Bachelor of Science in Computer Science and Engineering December 2016 Relevant Course: Data Structure, Algorithms, System, Database Administration, Principle of Programming language SKILLS

• Programming languages: Python, R, SQL, Java, HTML, JavaScript, CSS, D3, Scala

• Tools: Anaconda, pandas, SQLite, Tableau, Agile, Git, AWS, Spark, Azure, Docker, GCP, Databricks, Excel VBA, OpenRefine, NumPy

• Statistics test: correlation test, goodness of fit test, VIF test, f-test, t-test, mean squared error, confident interval, p- value, R-squared, Cook’s distance, chi-squared, ANOVA, AIC, BIC PROJECTS

Co-actor Graph (Python Node-edge visualization) August 2020

• Visualized a co-actor graph that a node represented an actor/actress, and an edge between two nodes indicates the two actors/actresses acted in a movie together

• Obtained real-time data using API for The Movie Database. Initialized a Graph object with a single node representing Meryl Streep, selected top 3 co-actors as new nodes in each of her movie credit that have a vote average greater than 8.0; for each of the new added nodes, add their top 3 co-actors as new nodes and iterated this process 3 times

• User can see the co-actor network graph of the top 10 actors/actress who have the most edges: visualization Covid-19 Risk Prediction (Supervised Learning Models with Python) July 2020

• Predicted Covid-19 risk by 36 county-level demographic features of Georgia state

• Extracted public dataset from New York Times and U.S Census Bureau; counties that have more case than median of Georgia defined as high risk; fitted Logistic, Ridge, Lasso regression, Decision tree, Random Forest, GMM, KNN, K- means, Naïve Bayes, Neural Network and SVM models

• Performed model selection that Neural Network have highest accuracy rate of 81.25% Boston Housing Price Prediction (Regression model with R) April 2020

• Used R to build regression models base on 14 features of houses around Boston area to predict house prices

• Fitted linear regression model as the benchmark, performed feature engineering on ANOVA, Mallows’ Cp value, stepwise regression, Ridge regression model, Lasso regression and Elastic net regression model

• Decreased MSE from 24.51 to 22.81; Lasso and Elastic net are best models with smallest MSE, 11 out of 13 independent various were selected

Online Book Store Database Design August 2015

• Developed an online bookstore database design using SQLite as a group project

• Designed online bookstore ER model and relational schema

• User can query books information in this database, such as all books wrote be a certain author, all books and date of purchase made by a certain customer, the number of a book left in storage, etc. WORK EXPERIENCE

Coordinator at Mentoring for Christ-Centered Home (MCCH) February 2017 - Mar 2018

• Collecting the requests and issues from MCCH owner and realized these requests on the official website

• Designing, developing, and maintaining MCCH official site: http://mentoring4christ.net/

• Teaching seniors basic networking and IT knowledge with non-technological terms Student Assistant of Safety Department in Green River College February 2013 - June 2014

• Answered phone calls from students and cooperated with drivers to give students rides



Contact this candidate