Post Job Free

Resume

Sign in

Data Scientist Machine Learning

Location:
Raleigh, NC
Posted:
November 20, 2023

Contact this candidate

Resume:

Chengyu Zhou, Ph.D.

Phone: 919-***-**** Email: ad1bky@r.postjobfree.com Linkedin: /in/chengyu-zhou/ Github: /czhou9 SUMMARY

Data scientist with 4+ years of experience in applying Machine Learning, Deep Learning and Statistics across fields, specializing in Image/Text Processing, Recommendation System, Anomaly Detection and Privacy Protection. SKILLS

Python, SQL, R, MATLAB, Scikit-Learn, Numpy, Pandas, Matplotlib, Spark, HPC, AWS, PyTorch, Tensorflow, Keras, Git, Linux, Transformer, Bert, Hugging Face, NLP, LLM, Fine-tuning, Time Series Forecasting, Tableau. EXPERIENCE

NC State University Raleigh, NC

Data Scientist/Teaching Assistant 08/2019 - 08/2023 Equipment Failure Time Predictor using Incomplete Image Data To help engineers (from factories/hospitals, e.g.) predict equipment failure time using incomplete infrared image streams and save cost generated by the unexpected failure, proposed a supervised failure time prediction model. Trained two tradition methods as benchmarks (1. Deep Learning-based Model using CNN and LSTM; 2. Unsupervised Statistical Learning-based Model using Tensor Decomposition/Regression techniques) Applied Projection Operator to the feature extraction loss function to make it handle missing data and designed a term based on LLS Distribution and Maximum Likelihood Estimation to supervise the feature extraction. Implemented Block Updating Optimization to solve the model, derived Analytical Solutions and used High Performance Computing (HPC) to improve computation speed. Achieved 10 times reduction on prediction Mean Absolute Percentage Error (MAPE) vs. traditional methods in multiple missing rates, expected to save cost by over $8 million per year. Paper Code Slides Award Privacy Protection for Equipment Failure Time Prediction To help multiple users (stores, hospitals, factories, e.g.) collaboratively predict equipment failure time using infrared image streams and no need to share their own image data, proposed a Federated Learning-based failure time predictor based on Differential Privacy, Incremental Algorithms and Gradient Descent techniques. Protected data privacy while not affecting the prediction accuracy vs. traditional method, expected to generate over $5 million revenue per year. Paper Code Slides JOBLOGIC-X Remote, US

Data Scientist Intern 05/2022 - 02/2023

Built a Location Recommendation System to help Meetfresh open new stores and increase revenue. Collaborated with 3 employees to collect a dataset, conduct Exploratory Data Analysis (EDA), apply Ridge/Lasso, XGBoost, Neural Networks, e.g. to construct recommendation system and selected the best model based on RMSE. Generated recommended scores for all the zip codes across US and checked solution reasonability. Deployed a web service using Flask and AWS, expected to help Meetfresh open over 300 new stores and generate more than $10 million revenue per year. Report Code Constructed a Movie Recommendation System with IMDB dataset. Developed Retrieval Model based on Two Tower Neural Networks, Genre, Keywords. Built Ranking Model using content-based filtering, neural collabarative filtering (NCF), and LightFM hybrid filtering, DeepFM and Deep&Cross Network (DCN). Increased MAP@K by over 80% and AUC score of CTR by 30% vs. traditional methods. Provided top 10 personalized movies for each user and deployed a web application using Flask and AWS. Code EDUCATION

Raleigh, NC

08/2019 - 08/2023

North Carolina State University

Ph.D. in Industrial Engineering (Track: System Analysis & Optimization) (GPA: 3.93/4.0) M.S. in Statistics (GPA: 3.97/4.0)

Selected Honors: Best Paper Award Outstanding Student Award 08/2020 - 12/2022



Contact this candidate