Data Analyst Science

Location:

Madison, WI

Posted:

May 18, 2025

Contact this candidate

Resume:

Xiangyi Li

+1-608-***-**** *************@*******.*** https://github.com/Luxflamy

SUMMARY

Technical: Python, R, SQL, Julia, Linux, Emacs, SPSS, Java, SAS, HTML, CSS, Json, C++, Matlab, Shell, Git, pytorch EDUCATION

Southern University of Science and Technology, Shen Zhen B.S. Statistics 09/2020 – 07/2023 University of Wisconsin Madison, WI M.S. Statistics & Data Science 09/2023 – 05/2025 WORK EXPERIENCES

Ping An Bank Business Data Analysis Assistant Shenzhen, Guangdong 01/2022 – 04/2022

• Automated client file management using Python and SQL database optimization. Reduced data retrieval time by 5%.

• Developed forecasting models in R to analyze 12 thousand transaction records. Improved sales prediction accuracy by 8% .

• Developed Power BI standard functions to track real-time sales KPIs. Reduced report generation time by 20%.

• Conducted k-means analysis on 80,423 client profiles. Increased customer retention by 12% in high-value segments. Yu Cai Co. Sales Data Analyst Shenzhen, Guangdong 06/2024 – 11/2024

• Analyzed 20 thousand sales transactions using R. Performed factor analysis to identify top 3 revenue drivers and k-means clustering to segment 15 thousand customers into 5 high-value groups. Optimized ad spend strategy for these five groups.

• Designed A/B testing frameworks for marketing campaigns. Increased product launch success and achieving a 4.7/5 score.

• Built Time-Series models to forecast product demand. Improved inventory accuracy and reducing stockouts during peak seasons. Automated sales reports using SQL. Saved 10 hours per month for my team and enabling performance tracking. MAJOR PROJECTS

Credit Risk Prediction Model for Small Business Loans 01/2023

• Processed 22,160 loan records using Cook's distance analysis in R.

• Selected 17 key variables using LASSO regression in R. The mean square error of model is 0.0072. Gave excellent regression using Random Forest in R with 96.4% R . Improved the accuracy of the original model. reduced the error by 47%.

• Founded 367 high-risk cases by repayment risk prediction for 2,500 new loans. Avoided high-risk cases with confidence. Wordle Game Analytics & Difficulty Prediction, MCM Honor Mentioned 01/2023 – 02/2023

• Led a team to analyze online player data for the popular Scrabble game. Predicted trends in the game's future numbers. Predicted daily players using ARIMA in R. Forecasted 20,599 active players on March, and compared to other methods.

• Predicted attempt distributions with a test set absolute error of 6.63 using XGBoost in R. Reduced error by 47% compare to linear models. Classified word difficulty with MSE=31.84 using MLP in R. Predicted "EERIE" difficulty at 4.05 steps.

• Identified Shannon Entropy and word frequency as core factors in K-means clustering. It shows high-entropy words required 30% more guesses. ANOVA analysis revealed rare second letters reduced Hard Mode success rates by more than 15%.

• Models identified high-difficulty word patterns, improved player experience, and drove a 12% increase in daily active users. Weather Data Analysis & Forecasting System 01/2024 – 06/2024

• Download 6 years of hourly weather data for Madison through automated requests to the NCEI API. Developed a suite of Bash scripts and R to filter weather data for Madison. Generated cleaned datasets and utilized Central for High-Throughput Computing to handle large-scale datasets. Reduced processing time by 80% with this series of Bash scripts and R code.

• Conducted exploratory analysis and implemented SARIMA models, achieving 93% accuracy in temperature, precipitation forecasting. Identified key climate drivers using multiple linear regression. Achieved a robust model fit with R =0.87.

• Developed an Shiny website to visualize key climate factors and forecast trends. Improved accessibility for non-technical stakeholders. Contrast Madison and Los Angeles climate trends. Revealed 18% higher precipitation variability in Madison. Diffusion-Normalizing-Flow MCMC Algorithm 02/2024 – 06/2024

• Developed a Diffusion-Normalizing-Flow MCMC algorithm combining diffusion models and normalizing flows. Improved sampling efficiency by 80% in high-dimensional spaces compared to traditional Metropolis-Hastings. Saved arithmetic power.

• Achieved 80% acceptance rate in multimodal Gaussian distributions by training diffusion models on multiple MCMC samples. The method also reduced errors by 47% compared to pure normalizing flow approaches. Significantly faster sampling speed.

• Reduced noise interference by 25% and stabilized diffusion model training by Leveraged K-means clustering classified initial samples. Reduced computational overhead by 40% via adaptive Gibbs-like sampling for high-dimensional parameter spaces. Airline Performance Analytics & Prediction System 02/2024 – 06/2024

• Led end-to-end development of a flight performance prediction system. Analyzed 2 Million flights form 2021 to 2024 using Pandas, Scikit-learn, Numpy in Python. Reduced data preprocessing time through automated requests to the NCEI API.

• Developed a Random Forest classifier to predict flight cancellations.The model achieved a ROC AUC score of 0.89. Implemented a ResNet deep learning model for departure delay prediction. Achieved in a mean squared error of 1.2 hours.

• Designed a interactive 3D visualization dashboard. Used Three.js and Flask in HTML, enabling real-time risk analysis. Reduced airline operations team decision-making time via dynamic delay probability maps. Integrated 307 climate stations with airport data through lat/long matching (geojson).

Contact this candidate