Post Job Free
Sign in

Data Scientist Machine Learning

Location:
Chapel Hill, NC
Salary:
100000
Posted:
June 21, 2025

Contact this candidate

Resume:

Jie Yang

Phone: 919-***-**** Email: *******@*****.*** Linkedin: /in/jieyang112/

SUMMARY

Data Scientist with 5+ years of experience in the Retail industry. Proficient in applying Statistical Analysis and Machine Learning modeling in Marketing and Inventory Management. Specialized in Time Series Analysis and Sales Forecasting, data-driven decision-making and optimized resource allocation. Experienced in working with cross- functional teams and communicating with stakeholders. SKILLS

Python, NumPy, Scipy, Pandas, Matplotlib, Seaborn, Scikit-Learn, SQL, PCA, ARIMA, SARIMA, SVM, SHAP, GitHub, Flask, Beautiful Soup, Selenium, TensorFlow, PyTorch, CNN, RNN, LSTM, NLTK, Natural Language Processing (NLP), Large Language Models (LLMs), Generative AI, AWS, Snowflake, Databricks, Hadoop, Spark, Tableau, Power BI, A/B Testing

EXPERIENCE

Techlent Inc. remote

Data Scientist Fellow 06/2023 - Present

Drug Store Sales Predictor

To assist drug store managers in optimizing budget allocation based on sales demand, developed a Sales Forecast Model using time series technology.

Collected drugstores sale data and store data, carried out EDA and feature engineering with domain knowledge. Found the stationarity and seasonality of the sale data. Trained and evaluated multiple time series models such as ARIMA, SARIMAX, Prophet. Identified SARIMAX as the best model with the lowest RMSE of 825.32. Deployed the SARIMAX model as a Flask API on GCP, streamlining accessibility and utilization. This model lowered the error by 30% accordingly by the same margin. As a result, the revenue of the store increased by 30% quarterly.

Old Friend Bakery Chapel Hill, NC

Data Scientist and Consultant 01/2023 - Present

Bakery Sale Predictor

To help effectively predict the weekly bakery sales, built a Sales Forecasting Engine using time series forecasting technology.

Collaborated with stakeholders to gather the sales and product data. After EDA and feature engineering, trained and evaluated the timeseries models ARIMA, SARIMAX and other models as Random Forest, XGBoost to predict sales trends. XGBoost emerged as the top-performing model, achieving a test accuracy score 0.87, surpassing the random-selection benchmark by 20%.

This provided tailored production schedule for the high-demand days and resulted in a 40% cost decrease and revenue increase yearly.

South China University of Technology Guangzhou, China Data Scientist and Professor 06/2004 - 03/2018

Spam Detector on Mobile Terminal

To identify the spam and scam text messages received by mobile users, devised an end-to-end spam detection model.

After EDA and feature engineering with the text data acquired from the business team, built and trained a regression model to rank text content based on the presence of suspicious keywords to assess the risk level. Compared to the random-check benchmark, this model decreased 40% spam and scam messages by 40%. Bio Products and Service Customer Detector

To enhance sales efficiency and revenue through targeted email marketing, developed a classification model to detect valuable customers.

Collected customer data and historical sale data from sales teams and biological/medical publications from journals and conferences, engineered domain-specific features and trained binary classification models to identify valuable customers. The new model achieved 90% improvement over traditional methods, doubling annual revenue with existing sales investments.

EDUCATION

Wuhan University Wuhan, China

Ph. D. in Computer Science 09/2001 - 06/2004

Master in Computer Science 07/1999 - 06/2001

Bachelor in Computer Science 09/1994 - 07/1998



Contact this candidate