Post Job Free
Sign in

Data Analyst Python

Location:
Arlington, VA
Posted:
December 09, 2020

Contact this candidate

Resume:

NUO(TINA) TIAN

Address: Arlington, VA ***** Cell: 626-***-**** Email: adii50@r.postjobfree.com LinkedIn: linkedin.com/in/nuo_tian GitHub: github.com/nuot SUMMARY

Aspiring Data Scientist and Data Analyst with 3+ year's hands-on quantitative analytics experience. Georgetown University M.S. candidate with solid academic record and excellent sense of teamwork. Served as team lead several times with proven record of analytics delivery with high quality. Looking for data scientist/data analyst opportunity. A fast learner, self-starter, great communicator and a tech lover.

• Programming: Java, Python (pandas, NumPy, scikit-learn, SciPy, dplyr, ggplot2, matplotlib, plotly), R, SQL, SAS, R Shiny, MATLAB

• Tools: Git, AWS, Hadoop, Apache Spark, Microsoft Azure, S3, EC2, Tableau, Power BI

• WebDev: HTML, CSS, JavaScript, Node.js, React.js, Bootstrap, Materialize, PHP E DUCATION

Georgetown University, Washington, DC Master of Science: Data Science and Analytics Expected in 05/2021 GPA: 3.833

• Coursework: Data Analytics in Python, Probabilistic Modeling and Statistical Computing, Natural Language Processing, Machine Learning, Inferential Statistics, Data Mining, and Massive Data fundamentals (distributed system), Neural Networks and Deep Learning, Relational Database System(SQL) University of California, Irvine, Irvine, CA Bachelor of Science: Mechanical Engineering 09/2014 - 06/2018 Certification: Full Stack Web Development 02/2019 - 07/2019 W ORK E XPERIENCE

Database Analyst Internship Optime Realty – Arlington, VA 05/2020 to Present

• Conduct dynamic data analysis and create visualizations to summarize marketing team’s monthly performance using Excel (VLOOKUP, pivot tables), VBA, SQL and Google data studio

• Use SQL and Python to extract more than 100K of customer data from MySQL database. Operated API requests (ie. Brokermint) and web scrapping to collect data as supplement to conduct agent’s performance analysis and support users’ requests. Support CRM users with their daily requests

• Build several deep learning models (ANNs) with Keras on Tensorflow for Home Value Appraisal reached accuracy around 95%

• Establish an automatize email system based on google form survey responses to reduce realtor’s repeated manual work (sent follow ups, open-house invitation, and confirmations to certain target groups) by 50% using google scripts. Web Design and Data Associate GU Center of Education and the Workforce – DC, DC 09/2019 to Present

• Remodel CEW’s webpage in Wordpress and improved contents using HTML, CSS and JavaScript, and monitor the website flow using Google Analytics. Increase the overall monthly website user flow by 18% on average

• Create and maintain interactive data visualizations for CEW's ongoing research report on CEW website (e.g. Tracking COVID-19 Unemployment and Job Losses, Upskilling and Downsizing in American Manufacturing) using R Shiny, HighCharts, Tableau and Infogram. Research Associate Intern Neighborhood Rescue of America – DC, DC 05/2020 to 08/2020

• Develop data visualizations and socioeconomic analysis (crime rate and financial literacy) for the target neighborhood and provide presentations to organization’s consulting division, leveraging visualization and analytics tools such as Tableau, Flourish and R

• Gather demographical, economical, and educational raw data through web scraping and provide data cleaning using Excel and SQL to support organization’s research for establish the next target neighborhood to conduct the turnaround of the most at risk communities for children P ROJECTS

What Makes a Great Theater (Group) 09/2019 to 12/2020

• Use Yelp Fusion API, Geolocation API and web scrapping to collect more than 15 thousand data about theaters(rating, location, comments etc)

• Apply clustering algorithm including hierarchical clustering, KMeans, DBScan and Apriori association rules algorithm to find out both internal and external factors that are correlated with the theaters rating.

• Build classifiers using Decision Tree, Support Vector Machine(SVM), Naïve Bayes and Random Forest to make predictions of the rate. Use K-fold cross-validation method to test the robustness of the model and check the ROC Curve and confusion matrix to deliver analytical findings.

• Perform sentiment analysis on theater reviews using NLP (tokenization, Vader, Naïve Bayes) to provide additional information for the analytical analysis

• Build advanced visualizations using Plotly, Matplotlib and Tableau to uncover insights for the users to support their investment decisions on theather and their operation of current business

Monarch Butterflies Study for Smithsonian Museum (collaborate With A Biologist) 09/2019 to 01/2020

• Perform data cleaning and data exploratory analysis using the datasets of Monarch Butterflies (6 different kinds, more than 10 thousand raw data provided by the biologist)

• Assess the health conditions of Monarch Butterflies as they overwinter by applying weighted factors to each aspect based on correlation analysis results

• Apply Equal-Frequency binning method to smooth the data, define key metrics and build prediction models using Naïve Bayes, SVM and random forest to predict butterflies' health conditions using wing condition and fat. Highest performance is using SVM, which achieved 79% of the accuracy AWS Yahoo News Analysis (Team Lead) 02/2020 to 05/2020

• Architect prediction models to predict news category based on the contexts of the articles

• Retrieve 20 gigabytes data from S3 and accomplish exploratory analysis, feature vector transformation (TF-IDF) and make predictions (Random Forest Classifier) using Hadoop, pySpark, SparkSQL and SparkNLP



Contact this candidate