Post Job Free
Sign in

Data Analyst

Location:
Sunnyvale, CA
Posted:
April 02, 2020

Contact this candidate

Resume:

*****, *********, ** 617-***-**** pengyu****@Pengyu gmail.com https://www.Huang linkedin.com/in/pengyu-huang https://github.com/PengyuHuang EDUCATION

Northeastern University, Boston, MA Jan. 2018-Dec. 2019 M.S. in Data Analytics Engineering GPA: 4.0/4.0

Courses: Statistics, data visualization, data mining, database, data management, Neural Network and Deep learning. Activity: Member of NUCSSA; Volunteer of global leaders of Boston. Guangdong University of Technology, Guangzhou, China Sep. 2009-Jul. 2013 B.E. in Vehicle Engineering GPA: 3.4/4.0

Honors: Outstanding Student and the First Class Scholarship (4 out of 110); Outstanding Student Union Cadres. Activity: Leader of Student Union; Volunteer in the National University Game. SKILLS

• Programming: SQL, Python (Pandas, Numpy, Sklearn, Matplotlib and etc.), R (ggplot2, dplyr)

• Tools: MySQL, mongoDB, Databricks(Spark), Tableau, Gephi, AWS, Microsoft Tools

• Analysis Techniques: Classification (Decision Tree, Random Forest), Regression, Regularization, Gradient Decent, Neural Network (ANN, CNN, RNN), ETL Processes, Feature Engineering, Hypothesis Testing, A/B Testing WORK EXPERIENCE

Teaching Assistant of Data Mining course in Northeastern University Jul. 2019-Dec. 2019

• Answered related questions for students after class, debug.

• Graded the assignments, helped the professor to prepare lecture materials. GAC Mitsubishi Motors Co., Ltd Jul. 2013-Feb. 2016 Position: Purchasing & Price Analyst

• Collected and cleaned history price in the database to ensure data integrity and consistency.

• Applied statistics modeling (linear regression) in R to help make predictions on the price of transmission.

• Compared and analyzed the cost-benefit balance to make a price cut plan using R, achieved 4 percent decrease of the cost.

• Defined metrics and created story telling visualization dashboards in Tableau to report and negotiate with suppliers.

• Translated SPEC of Mitsubishi. Evaluated different suppliers in person and made dashboards to report. ACADEMIC PROJECTS

Movie Recommendation System in Apache Spark Feb. 2020

• Used Alternating Least Squares (ALS) algorithm to make movie recommendation based on users’ preference or previous ratings.

• Explored and processed the data to an efficient format on DataFrame and SparkSQL tables for big data OLAP and visualization.

• Trained ALS algorithm in PySparkML with MapReduced distributed architectures. Optimized the model used Cross Validation.

• Evaluated the model with RMSE (0.8) and recommended movies with top rating predictions to users. San Francisco Crime Analysis and visualization Dec. 2019

• Using Spark to analyze the crime distribution pattern in San Francisco, provided potential insights.

• ETL and built data processing pipeline to transform the data into an efficient format.

• Visualized and explored the variation of spatial distribution of crime over time by SQL and Python packages.

• Applied the Kmeans clustering algorithm to cluster data and got insights of when and where the crime rate is high. Demand Forecasting with Python Oct. 2019

• Built two Machine Learning models to help the pharmacy store predict the future demand on different products (history and new).

• Explored correlation of categorical features by calculating Cramer V; Joined tables, split variables for data preprocessing.

• Developed a Random Forest model (used Grid Search methods) to evaluate the future demand of product in store with MAE 0.91.

• Selected similar products by cosine value, used old products to build another model for predicting the demand of new product. Amazon Prime Video Popularity in Python Jul. 2019

• Developed Machine Learning algorithms to predict the popularity of Amazon Prime Video to help decide the advertising fee.

• Explored and preprocessed the raw data by cleaning, transforming, feature engineering to ensure the data quality.

• Trained supervised machine learning models including linear and polynomial regression, random forest, and applied regularization

(lasso and ridge) with optimal parameters to remove over fitting.

• Found the best model by comparing MSE. Analyzed feature importance to identify top factors and decide different position fee. Game Website with MySQL database (group project) Oct. 2018 YouTube: https://www.youtube.com/watch?v=v5s_PGiWjv8.

• Built a game website for users to share game related news, communicate with each other and search popular games.

• Lead a team of 4 to design a structure and functions by drawing a UML Graph as the blue print of the whole project.

• Created a schema and database in MySQL; Connected back end with the database and inserted data online into the database. Realized searching functions by writing SQL statements and window functions.

• Test the website through creating users to create, read, update, delete a comment, join and quit an event, search a game randomly.



Contact this candidate