Tanfu Shi
+1-447-***-**** *******@********.***
Linkedin: https://www.linkedin.com/in/tanfu-shi-02245b28a/ EDUCATION
University of Illinois of Urbana-Champaign Champaign, Illinois Master of Statistics, Minor: data science GPA: 3.7/4.0 08/2023-05/2025(expected) Nanjing University Jiangsu, China
B.S. in Physics, Minor: Financial Engineering GPA: 4.4/5.0 (top 30%) 09/2019-06/2023 SKILLS
· Data Analysis & Visualization: PowerBI, Tableau, Google Analytics, Microsoft Office
· Data Processing & Tools: AWS, Hadoop, Spark Apache Airflow
· Programming Languages: proficient in Python, R, SQL; Basic familiarity with JavaScript, HTML, CSS, C
· Languages: English (Native), Chinese (Native), Japanese (Reading/Writing) EXPERIENCES
Gies College of Business, University of Illinois Champaign, Illinois Data Science Research Assistant 05/2024-Current
· Cluster Analysis: Engineered K-means and Gaussian Mixture Models on a 100+ MB US–Japan social dataset to identify 4 distinct behavioral clusters, informing cross-cultural policy research.
· Latent Profile Analysis: Extracted 7 latent variables using AIC/BIC-optimized models, reducing analysis time by 30% through automated R scripting.
· Dashboard Development: Developed an interactive PowerBI dashboard integrating SQL queries and R scripts to monitor longitudinal changes in 16 social groups over 5 years; the tool is actively used by 10+ researchers to support policy evaluations. Beijing Institute of Big Data Research Beijing, China Data Analyst Intern 06/2022-10/2022
· Industry Trend Analysis: Analyzed 12 years of industrial data across 5 sectors using Python and Pandas; identified 3 high-growth industries (e.g., AI, green energy) that influenced over $5M in R&D investment decisions.
· Process Automation: Automated data pipelines with Python and SQL, reducing manual data cleaning efforts by 50%.
· Predictive Modeling: Developed a TensorFlow-based NLP model (with Scikit-learn/NLTK support) to evaluate the potential of 500+ companies from patent data and social sentiment, achieving a 90% confidence level in high-potential identifications.
· Executive Reporting: Designed a dynamic dashboard (Python/SQL-powered) for real-time KPI tracking across 5 key business metrics, directly supporting strategic executive decisions.
School of Atmospheric Science, Nanjing University Nanjing, China Data Science Research Assistant 09/2020-04/2022
· Large-Scale Data Processing: Processed 50 years of high-resolution (5km) precipitation data using Python and NCL with Ensemble Empirical Mode Decomposition (EEMD), reducing spatial complexity by 40% and categorizing data into 3 precipitation scales.
· Statistical Analysis: Applied Empirical Orthogonal Function (EOF) analysis to decompose climate variability, contributing to data-driven flood prevention strategies adopted by 3 provincial governments.
· Data Visualization: Created detailed visualizations with Matplotlib and Seaborn for seasonal precipitation patterns; outputs were incorporated into national climate risk assessment reports, aiding agricultural planning in flood-prone regions. LEADERSHIP &PROJECT
Investigation on the Resumption of Work and Production of SMEs Group Leader 06/2022-07/2022
· Conducted interviews with 10 enterprises and analyzed over 10,000 SME records to identify 3 critical survival factors during the COVID-19 recovery phase.
· Authored an economic report that directly influenced the allocation of over $1.5M in government relief funds. Illinois Obesity Rate Prediction Group Leader 09/2023-01/2024
· Analyzed CDC and state health datasets comprising 10,000+ records to forecast obesity trends across 15 counties.
· Developed a Random Forest model with 92% accuracy, identifying 3 high-risk demographics (e.g., low-income urban youth) and providing actionable insights adopted by local NGOs—projecting an 8% reduction in obesity rates in pilot regions. Dynamic Restaurant Analytics Dashboard Group Leader 02/2024-05/2025
· Designed an interactive dashboard using R Shiny and the Google Maps API to analyze data from over 2,000 restaurants in Illinois.
· Implemented user-driven filters (e.g., price range, cuisine type, proximity) and visualized key metrics via Leaflet; results boosted user engagement by 35% through data-driven restaurant recommendations. YouTube Comment Sentiment Analysis Group Leader 09/2024-12/2025
· Led a project team to scrape over 1,000,000 YouTube comments via API; built a BERT-based sentiment classifier using TensorFlow achieving 89% accuracy.
· Analyzed content patterns to provide recommendations that boosted a channel’s engagement by 25% within 3 months. User Behavior Analysis: Workforce Well-being Classification 09/2024-03/2025
· Developed a comprehensive machine learning pipeline in Python to analyze 500,000+ survey responses, engineering 20+ features from text and Likert-scale data.
· Employed XGBoost with grid search optimization to achieve 95% classification accuracy; identified 3 key drivers of employee well-being and integrated findings into a real-time PowerBI dashboard, aiding HR teams in reducing low-motivation rates by 15% among over 10,000 employees.