Resume

Sign in

Data Analyst

Location:
Toronto, Ontario, Canada
Posted:
March 13, 2019

Contact this candidate

Resume:

Raven Sun

ac8rxy@r.postjobfree.com j 416-***-**** j 280 Wellesley St East

TECHNICAL SKILLS

Programming: Python, R, SAS, HTML, CSS, JavaScript, STATA

Machine Learning: TensorFlow, Scikit-Learn, Numpy, PIL

Database: SQL, MySQL, MongoDB, Firebase

Big Data: Spark, Hadoop

Data Visualization: ggplot, seaborn, D3.js, Matplotlib, Tableau

Cloud: AWS, Azure, GCP

EDUCATION

University of Toronto Toronto, ON

Bachelor of Science in Computer Science and Statistic Science Nov 2017 Relevant Coursework: Data Analyst, Regression Analyst, Machine Learning, Machine Learning and Data Ming in Statistic Aspects, Database Design, Programming on Web

EXPERIENCE

YoLife: Food Delivery App Tornoto, ON

App Developer & Data Scientist May 2018 - Present

Database: Transformed old Firebase JSON file into PySpark DataFrame and merged into MySQL Database.

Data Visualization: Loaded data by SQL queries and built data charts by D3.js on HTML Admin Page. Plotted chart by ggplot in R.

Data Mining & Analysis: Extracted and loaded data in R, ran regression analysis and predict model on multiple tasks such as user order behaviour. Built Data Association Rule to enhance In-App promotion sales. Combined analysis and prediction report by Rmarkdown with ggplot data visualization graph.

Machine Learning and Cloud Computing: Built KNN classification on user location defined by longitude and latitude to find more accurate delivery area and delivery fee by AWS ML and AWS S3/RDS. Built CNN classification by TensorFlow on food image uploaded by restaurant and made classification tag for the food. Celer.ca: Education Resource Guide Platform Tornoto, ON Start-up Business & Machine Learning Implementation Feb 2017 - Apr 2018

Data Mining & Analyst: Extracted time and duration that user spent on the page of the platform. Estimated the time and duration of how user study material. The result showed most users started study right before the exam starts. Few people studied consistently.

Machine Learning: Trained a CNN to classify question topic tag on question screen shot. Made prediction on unknown question topic and allocate to corresponding tutor.

PROJECTS

Statistic R Project: : ANOVA Analysis Factors affect GPA

Ran a two sample t-test on GPA for groups of video game player and non video game player

Ran One Way ANOVA and Two way ANOVA to investigate between GPA for groups of "video game player" and groups of "expected GPA"

Analyzed the output (pvalue > 0.05) showing that there is no significant difference among those groups

Face Recognition and Gender Classification: Linear Regression Classification in Python

Wrote Python(os,urllib) Script to download the face image online automatically.

Resized and recolored all the images by Python(scipy) Script

Built a linear regression classifier for each person

Minimized a Quadratic loss function by gradient decent

Selected some other learning rates and choose = 0.001 as model learning rate by plotting the cost function vs. number of iterations by Matplotlib and reached a 83.8333% accuracy

Deep Neural Networks for Handwritten Digit and Face: Convolution Neuron Network in Python

Trained a Neural Network that work with MNIST digits

Used a liner combination of vector X as an activation function in the output layer

Plotted the learning curves. Displayed the weights going into each of the output units.

Used Back-propagation for computation of the gradient for MNIST dataset. Used the same procedure but work with Face image. Selected two actors, visualized the weights of the hidden units.

Supervised and Unsupervised Learning for Movie Review: Naive Bayes and Logistic Regression for Sentiment Analysis (TensorFlow)

Implemented Naive Bayes algorithm for predicting whether the review is positive or negative, and reached 90% and 85% accuracy on training set and test set.

Trained a Logistic Regression model on the same dataset and it reached similar accuracy as it for Naive Bayes method

Plotted the learning curves (performance vs. iteration) of the Logistic Regression model and Naive Bayes model by Matplotlib

Reinforcement Learning with Policy Gradients: Virtual Machine, OpenAI gym and TensorFlow

Implemented reinforcement learning that will run for the Cart Pole



Contact this candidate