Jie Yang
Phone: 919-***-**** Email: *******@*****.*** Linkedin: /in/jieyang112/
SUMMARY
Data Scientist with 5+ years of experience in the Retail industry. Proficient in applying Statistical Analysis and Machine Learning modeling in Marketing and Inventory Management. Specialized in Time Series Analysis and Sales Forecasting, data-driven decision-making and optimized resource allocation. Experienced in working with cross- functional teams and communicating with stakeholders. SKILLS
Python, NumPy, Scipy, Pandas, Matplotlib, Seaborn, Scikit-Learn, SQL, PCA, ARIMA, SARIMA, SVM, SHAP, GitHub, Flask, Beautiful Soup, Selenium, TensorFlow, Keras, PyTorch, CNN, RNN, LSTM, NLTK, Natural Language Processing (NLP), Large Language Models (LLMs), Generative AI, AWS, Microsoft Azure, Google Cloud Platform (GCP), Snowflake, Databricks, Hadoop, Spark, Tableau, Power BI, A/B Testing, Dataiku, MATLAB, VBA EXPERIENCE
Techlent Inc. remote
Data Scientist Fellow 06/2023 - Present
Drug Store Sales Predictor
To assist drug store managers in optimizing budget allocation based on sales demand, developed a Sales Forecast Model using time series models as ARIMA, SARIMAX, Prophet, and other regression models as Random Forest, XGBoost.
Collaborated with stakeholders to obtain and analyze the sale and store data sets. Identified SARIMAX as the top-performing model to predict the daily sales with the lowest RMSE. Reduced staff costs by 30% and boosted revenue 30% accordingly by the same margin.
Deployed the SARIMAX model as a Flask API on GCP, streamlining accessibility and utilization. Old Friend Bakery Chapel Hill, NC
Data Scientist and Consultant 01/2023 - Present
Bakery Sale Predictor
To help effectively predict the weekly bakery sales, built a Sales Forecasting Engine using time series forecasting models and classification regression models.
Collaborated with sales and bakery teams to gather and analyze sales and product data. Utilized Linear Regression, Random Forest, XGBoost, and ARIMA, SARIMAX models to predict sales trends, accounting for stationarity and seasonality.
XGBoost emerged as the top-performing model, achieving a test accuracy score 0.87, surpassing the random- selection benchmark by 20%. This provided tailored production schedule for the high-demand days and resulted in a 40% cost decrease and revenue increase.
South China University of Technology Guangzhou, China Data Scientist and Professor 06/2004 - 03/2018
Spam Detector on Mobile Terminal
To identify the prevalent issue of spam and scam text messages received by mobile users, devised an end-to-end spam detection model.
Collaborated with user managers and text database managers, built a regression model to rank text content based on the presence of suspicious keywords to assess the risk level. The model resulted in High-risk messages alerts enabling swift detection and mitigation of harmful content. Decreased 40% spam and scam messages compared to random-check benchmark. Bio Products and Service Customer Detector
To enhance sales efficiency and revenue through targeted email marketing, developed a Customer Detector for Bio-Cytogen Co.
Collaborated with sales teams to gather historical sales data and biological/medical publications from journals and conferences. Engineered domain-specific features and trained binary classification models to identify valuable customers.
Achieved 90% improvement over traditional methods, doubling revenue with existing sales investments. EDUCATION
Wuhan University Wuhan, China
Ph. D. in Computer Science 09/2001 - 06/2004
Master in Computer Science 07/1999 - 06/2001
Bachelor in Computer Science 09/1994 - 07/1998