Data analyst with nearly two years' full-time work experience in both U.S and China and proficiency with Python, R, SQL & Tableau-based data visualization. Capable of analyzing business-performance & products, making optimal use of organizational data, and engaging both internal & external clients with data-driven presentations to translate analytics into actionable solutions. SKILL
Programming Languages: Python (Pandas, Numpy, Matplotlib, Scikit-learn, h2o), R, SQL (Microsoft SQL Server, MySQL) Tools: Hadoop, Spark, MongoDB, Cassandra, Tableau, ER/Studio, MS Excel (Vlookup, Pivot Table), Google Cloud Platform Analytics: Machine Learning, Experiment Design (A/B Testing, etc.), Data Modeling, Data Visualization, Qualitative Analysis
(Content Analysis, Hypothesis Testing, etc.), Quantitative Analysis (Regression, Descriptive Statistics, Inferential Statistics, etc.), WORK EXPERIENCE
Data Analyst (Full-Time) Aura Source. Inc, Phoenix, AZ Jan 2018 Aug 2019 Aura Source is a technology-driven startup focusing on developing the newest mineral processing technologies.
• Worked for C-suit and analytics team performing metal price research and analysis using Python to predict the trend of metal price, improving analytics efficiency by 15 % and saving 12% of trading costs.
• Collected past 20 years' precious metals price data from database using SQL, and used Python Pandas to clean and structure the dataset in preparation for making predictions.
• Built machine learning models using Python h2o.ai to predict the price trend of precious metals (Gold, Silver, Platinum, etc.), enhancing accuracy of performance by 20%.
• Visualized findings through Tableau & delivered presentations directly to CEO & Partners promoting a $6 million USD deal. Data Analyst (Full-Time) China Everbright Bank Co., Lt, Shenzhen, China May 2017 - Aug 2017 China Everbright Bank is a Chinese state-owned commercial bank & part of Fortune Global 500 companies.
• Analyzed user dataset of over 1 million rows via Python & designed a recommendation system with the engineering team to support sales increasing bank financial products revenue by 1.25 million USD.
• Extracted relevant user dataset (500k+) via writing SQL queries and cleaned the dataset through Python Pandas.
• Used affinity analysis to explore co-occurrence relationships between different financial products and optimized the prediction model using Python Scikit-learn to find out potential users, leading to a 50% reduction in sales research-time.
• Built the sales business strategy and visualized the result using Tableau, increasing revenue growth by 20%. PROJECT
Columbia University Capstone Project: Optimize Geographic Expansion for Vertoe (Python, SQL, Tableau) Summer 2020 Vertoe is America's first and leading short-term storage startup. It raised $1.8M from Techstars in 2018.
• Supported Vertoe's employees in designing a product expansion plan & increasing revenue growth by 10%; wrote Python script to collect geographic information by Google API - reducing person-hours needed by 98%, and visualized results via Tableau.
• Used SQL script to extract data (over 130K rows) from database and built prediction models using Python h2o.ai. Columbia University Course Project: Managing Data (SQL, Cassandra, MongoDB) Spring 2020
• Implemented online library system database, normalized all MySQL tables to 3NF which eliminated different types of anomalies and pipelined the normalized data to next procedure of data processing.
• Wrote 200+ MongoDB and Cassandra queries to replace the SQL database and interacted with new NoSQL databases. Columbia University Kaggle Competition: Predicting Airbnb Rental Price (R, Tableau) Ranking of 3/450 Fall 2019
• Constructed models (From RMSE:101 to RMSE:55.12) to predict the price of a set of Airbnb rentals using supplied dataset with 90 variables for over 75k+ Airbnb rentals in New York.
• Used R (dplyr, XGBoost) to build prediction models and improve accuracy of the model by 20% using feature engineering.
• Led the team; visualized data via Tableau & presented results of models and business strategy, achieving grade of 100/100 (A+). EDUCATION
Columbia University New York, NY
M.S. in Applied Analytics (STEM) GPA: 4.0/4.0 Dec 2020 Coursework: Machine Learning (Python), Statistics (R), Managing Data (Spark, Hadoop, Cassandra, MongoDB, SQL), Storytelling with Data (Tableau), Experiment Design (Hypothesis Testing), Data Modeling (ER/Studio) Arizona State University Tempe, AZ
B.S. in Business Data Analytics (STEM) B.S. in Management B.A. in Business Sustainability GPA: 3.8/4.0 May 2018 Honors: New American University Scholarship 2014 -2018 Total $40000; Dean's List 2014-2018 (Top 10%) Coursework: Big Data (Spark, Amazon Cloud), Data Modeling & Mining (Python), Info. System, Enterprise Analytics