Post Job Free

Resume

Sign in

Data Analyst

Location:
Worcester, MA
Posted:
November 20, 2020

Contact this candidate

Resume:

JING LI

Address: Worcester, MA ***** Cell: 508-***-**** adh0ix@r.postjobfree.com

LinkedIn:https://www.linkedin.com/in/jing-li-b64a16129/ SUMMARY

A master graduate who aims at finding data analyst opportunities, with strong knowledge of statistics, analysis and solid programming skills in Python and SQL. Excellent communication and teamwork skills from teaching and micro-tutorial video producing experience for the past 5 years.

EDUCATION

University of California, Riverside Mar. 2017

Master of Science in Applied Mathematics GPA: 3.6

Millersville University of Pennsylvania, Millersville May. 2015 Bachelor of Science in Mathematics GPA: 3.5

SKILLS AND CERTIFICATION

Programming Language & Tools:

● Languages: Python (Scikit-learn, pandas, NumPy), SQL, Matlab.

● Tools: Apache Spark & PySpark, Google Colab, Tableau, Microsoft Office. Statistical Background:

● Topics: Regularization, Statistical testing, Feature selection, Time Series, Data visualization.

● Models: Decision Tree, Random Forest, Logistic Regression, Gradient Boosting, Alternating Least Squares. Mathematical Background:Numerical Methods, Combinatorics, Classic Analysis (Real, Complex, Functional), ODE/PDE, Mathematical Statistics, Graph Theory, Number Theory. Financial Background: Ecommerce, ACCA-F1 course

Certification: Oracle Database SQL Certificate

PROJECTS

Youtube Comments Analysis in Apache Spark

● Analyzed Youtube comments to identify potential cat or dog owners.

Extracted, transformed and loaded 5.8M rows of data from online open-source dataset.

Implemented logistic regression, random forest, and gradient boosting tree model with k-fold cross validation and hyper-parameter tuning with the best AUC of 0.9568.

Classified the dataset with the gradient boosting and predicted the cat or dog owner ratio in the comment session to be 19%.

● Adopted LDA (Latent Dirichlet Allocation) model to predict potential topics which the cat or dog owners would be interested in.

● Discovered video channels whose main audiences are cat or dog owners. Movie Recommendation Engine Development in Apache Spark

● Built a recommendation engine to recommend movies based on Alternating Least Squares.

Extracted, transformed and loaded data with 9742 movies and 610 users from grouplens.

Explored the dataset with OLAP using Spark SQL.

Implemented the ALS (Alternating Least Squares) model, and tuned model parameters with ParamGridBuilder, improving RMSE (Root Mean Square Error) of 0.88 on the testing data.

Leveraged k-fold cross validation to provide personalized movie recommendations and cosine similarity to identify similar movies.

Bank Customer Churn Prediction

● Recognized bank customers who are likely to churn in the future.

Extracted, transformed and loaded data with 10K rows from Kaggle.

Explored the dataset with OLTP (Online Transaction Processing) using pandas.

Implemented logistic regression, random forest, SVM (Support Vector Machine) and KNN

(K-nearest-neighbor) with k-fold cross validation.

Applied grid search to find optimal parameters for the LR, RF and KNN models.

Evaluated model performance by confusion matrices (accuracy, precision and recall) and AUC curve, and achieved AUC scores of 0.773, 0.855 and 0.789 for LR, RF and KNN models.

Analyzed feature importance to identify top factors that influenced the results, which are age, geography, gender and balance.

San Francisco Crime Analysis in Apache Spark

● Analyzed SF Crimes from 2013 to 2018 to identify the safest times/locations to visit SF.

Conducted spatial and time analysis for a 15-year dataset of 2 million reported incidents from SFPD website.

Built the data processing pipeline based on DataFrame and Spark SQL for OLAP (Online Analytical Processing).

Analyzed the most crimes that happened in the city for different districts and visualized the results with seaborn.

Computed the number of crimes on Sundays in San Francisco Downtown with the customized spatial location since the downtown area was not well-defined.

Analyzed the number of crimes in each month from 2015 to 2018 and w.r.t the hour on specific days.

Analyzed crime events w.r.t category and time by hour from top-3 dangerous districts and illustrated the percentage of resolution of different crimes.

WORK EXPERIENCE

Data candidate - LaiOffer Online June 2019 - July 2020 Adjunct Faculty

● Mt San Jacinto College - San Jacinto, CA Aug. 2017 - June 2019

● Riverside City College - Riverside, CA Jan. 2018 - June 2019

Duties: Developed course syllabus and lectured prepared materials. Communicated with students & faculties outside the class to enhance the learning experience. Designed, graded exams and assigned grades.

Topics taught: Probability, Statistical testing (Hypothesis Testing, A/B testing), Time series, Regression, Central Limit Theorem, Multivariable Calculus, Algebra.

Teaching Assistant / Micro-Tutorial Video Production

● University of California, Riverside - Riverside, CA Sept. 2015 - Dec. 2018

Video Production: Prepared and wrote plots with other graduate students on STEM topics in story-formatting. Practiced and filmed with other members in the studio. Cooperated with the production manager on outcomes of the videos.



Contact this candidate