Post Job Free
Sign in

Data Scientist, Machine Learning, Data Modeling, Data Visualization

Location:
Mount Pleasant, SC
Posted:
February 18, 2025

Contact this candidate

Resume:

Yiwen Yuan

***********@*****.*** +1-872-***-**** Authorized to work in the U.S.

Summary

• Ph.D. in Statistics with over six years of hands-on experience in data science, machine learning, and analytics projects, including internships in business sectors and academic research.

• Strong exposure to data modeling and machine learning (ML), with experience in developing and deploying models using Natural Language Processing (NLP), Large Language Models (LLM), Neural Networks, and AI/ML frameworks like PyTorch and TensorFlow.

• Skilled in generating business insights, deploying ML/DL models in production, and designing A/B tests while addressing experimentation bias.

• Proficient in leveraging cloud-based ML frameworks (AWS, Azure, GCP) and tools like GitHub and MLOps.

• Experienced with SQL and Exploratory Data Analysis on large datasets.

• Strong ability to collaborate with technical teams to optimize and automate model development and deployment processes. Education

Ph. D. in Statistics - Bowling Green State University, Bowling Green, OH Aug 2020 - Aug 2024 Dissertation Topic: Lasso Method with SCAD Penalty for Estimation and Variable Selection in Sequential Models

• Enhanced predictive accuracy and reduced risk by 76% on large datasets.

• Applications in finance and healthcare, supporting forecasting and risk management. Award: J. Robert & G. Overman Award

M.A. in Applied Statistics - Bowling Green State University, Bowling Green, OH Aug 2018 - May 2020 Experiences

Business Insights Intern (R Studio, R Shiny, SQL, Snowflake) May 2022 – Aug 2022 Welltower. Inc Toledo, OH

• Built and deployed end-to-end predictive pricing models using UK demographic data for senior housing costs.

• Collected and analyzed demographic data from diverse sources using R and SQL to extract business insights.

• Developed and refined data models to accurately reflect the organization’s data structures and business needs.

• Created interactive dashboards and reports using R Shiny for real-time decision support.

• Optimized data visualization and model selection processes, improving performance and efficiency by reducing time by 20%.

• Designed and deployed a heatmap in R Studio to visualize model prediction results and integrated it into an R Shiny dashboard, effectively communicating findings to stakeholders.

Business Insights Intern (R Studio, R Shiny, Snowflake, SQL, Python) May 2021 – Aug 2021 Welltower. Inc Toledo, OH

• Independently self-learned and developed a Python web scraper to automate data collection, reducing costs by 30%.

• Enhanced data integration efficiency by 15% using R Studio instead of manual SQL processes.

• Processed and analyzed demographic data to uncover insights supporting senior housing strategies.

• Trained models, fine-tuned parameters, and generated forecasts to align financial projections with business needs.

• Created heatmaps in R Shiny to visualize model predictions, improving stakeholder engagement. Graduate Teaching Associate Aug 2020 – Feb 2023

Bowling Green State University Bowling Green, OH

• Independently led undergraduate Statistics courses, fostering a collaborative and engaging learning environment.

• Assessed and provided constructive feedback on students' classwork, assignments, and tests to support their academic growth.

• Designed, evaluated, and revised curricula, course materials, and teaching methods to enhance student understanding and engagement. Projects

Image Classification with Neural Networks (Python)

• Developed and trained a neural network for image classification using TensorFlow and Keras.

• Implemented forward/backward propagation and gradient descent to optimize the model.

• Achieved 92% test accuracy by fine-tuning hyperparameters and analyzing performance with Matplotlib to detect overfitting issues. Multivariate Statistics Design (R Studio)

• Analyzed the quality of white wine using R, leveraging multiple variables across thousands of observations.

• Applied methodologies include multivariate normality tests, cluster analysis, classification, and principal component analysis.

• Selected and implemented the most suitable models and algorithms for accurate predictions. Spam Identification Using Machine Learning (R Studio)

• Built supervised ML models in R Studio for spam detection.

• Applied discriminant analysis (LDA, QDA), tree-based methods, support vector machines (SVM), and MARS.

• Used cross-validation, bootstrapping, and random forests for model evaluation and selection.



Contact this candidate