Jie Yang
Phone: 919-***-**** Email: *******@*****.*** Linkedin: /in/jieyang112/
SUMMARY
Data Scientist with 5+ years of experience in the Retail industry. Proficient in applying Statistical Analysis and Machine Learning modeling in Marketing and Inventory Management. Specialized in Time Series Analysis and Sales Forecasting, data-driven decision-making and optimized resource allocation. Experienced in working with cross- functional teams and communicating with stakeholders. SKILLS
Python, NumPy, Scipy, Pandas, Matplotlib, Seaborn, Scikit-Learn, SQL, PCA, ARIMA, SARIMA, SVM, SHAP, GitHub, Flask, Beautiful Soup, Selenium, TensorFlow, PyTorch, CNN, RNN, LSTM, NLTK, Natural Language Processing (NLP), Large Language Models (LLMs), Generative AI, AWS, Snowflake, Databricks, Hadoop, Spark, Tableau, Power BI, A/B Testing
EXPERIENCE
Techlent Inc. remote
Data Scientist Fellow 06/2023 - Present
Drug Store Sales Predictor
To assist drug store managers in optimizing budget allocation based on sales demand, developed a Sales Forecast Model using time series technology.
Collected drugstores sale data and store data, carried out EDA and feature engineering with domain knowledge. Found the stationarity and seasonality of the sale data. Trained and evaluated multiple time series models such as ARIMA, SARIMAX, Prophet. Identified SARIMAX as the best model with the lowest RMSE of 825.32. Deployed the SARIMAX model as a Flask API on GCP, streamlining accessibility and utilization. This model lowered the error by 30% accordingly by the same margin. As a result, the revenue of the store increased by 30% quarterly.
Old Friend Bakery Chapel Hill, NC
Data Scientist and Consultant 01/2023 - Present
Bakery Sale Predictor
To help effectively predict the weekly bakery sales, built a Sales Forecasting Engine using time series forecasting technology.
Collaborated with stakeholders to gather the sales and product data. After EDA and feature engineering, trained and evaluated the timeseries models ARIMA, SARIMAX and other models as Random Forest, XGBoost to predict sales trends. XGBoost emerged as the top-performing model, achieving a test accuracy score 0.87, surpassing the random-selection benchmark by 20%.
This provided tailored production schedule for the high-demand days and resulted in a 40% cost decrease and revenue increase yearly.
South China University of Technology Guangzhou, China Data Scientist and Professor 06/2004 - 03/2018
Spam Detector on Mobile Terminal
To identify the spam and scam text messages received by mobile users, devised an end-to-end spam detection model.
After EDA and feature engineering with the text data acquired from the business team, built and trained a regression model to rank text content based on the presence of suspicious keywords to assess the risk level. Compared to the random-check benchmark, this model decreased 40% spam and scam messages by 40%. Bio Products and Service Customer Detector
To enhance sales efficiency and revenue through targeted email marketing, developed a classification model to detect valuable customers.
Collected customer data and historical sale data from sales teams and biological/medical publications from journals and conferences, engineered domain-specific features and trained binary classification models to identify valuable customers. The new model achieved 90% improvement over traditional methods, doubling annual revenue with existing sales investments.
EDUCATION
Wuhan University Wuhan, China
Ph. D. in Computer Science 09/2001 - 06/2004
Master in Computer Science 07/1999 - 06/2001
Bachelor in Computer Science 09/1994 - 07/1998