Yaxin Su
*********@*****.*** 617-***-**** https://www.linkedin.com/in/yaxin-su-05811323a/
EDUCATION
Yale University Expected May 2026
Master of Public Health in Biostatistics
● Courses: Advanced Regression Models, Advanced Stat Programming with SAS&R, Probability Theory Boston University Jun 2024
Bachelor of Arts in Mathematics and Computer Science, Bachelor of Arts in Economics TECHNICAL SKILLS
● Programming: Python (Pandas, Scikit-Learn, Pytorch), SQL, R, Java, C, HTML/CSS, JavaScript, Ocaml
● Data Tools: Tableau, Excel, Databricks, Pyspark, Power BI, Git, LaTeX, SAS, GCP, AWS, Snowflakes
● Data Science Methods: A/B Testing, Data Wrangling, Database Management, Data Visualization, Time Series Forecasting, Machine Learning, Deep Learning, Statistics PROFESSIONAL EXPERIENCE
Data Scientist Intern Jan - May 2024
Elite Talent Human Resources Co., Ltd. Chengdu, CN
● Gathered and cleaned candidates’ data in SQL, built and tested Logistic Regression & Random Forest model in Python to predict the likelihood of candidates' success in a particular job role, improving overall hiring efficiency
● Conducted exploratory data analysis to understand patterns in the human resources portrait of candidate groups and the correlation between employee types and placements using Python data visualization tools such as Matplotlib and Seaborn
● Collaborated closely with cross-functional business development team to extract meaningful business insights and create targeted marketing strategies based on RFM analysis Research Assistant Jun - Aug 2023
Lichtman Lab at Harvard University Boston, MA
● Built SQL-based data collection, storage, and processing infrastructure, integrating multiple complex databases
(UniProt, ChEMBL, and Allen Brain databases) to create a unified platform for streamlined biomedical research data retrieval and analysis.
● Designed and implemented ETL solutions on datasets exceeding 200GB, establishing robust data pipelines using CDK- KNIME to support the seamless transformation of molecular data into actionable insights. Developed innovative methods for molecular quality evaluation, focusing on adaptability within diverse protein pockets for biomedical applications.
● Designed, tested, and deployed Random Forest and XGBoost models to predict cyclic molecule formation, which significantly advanced research in molecular chemistry and helped define project scope through exploratory data analysis Data Scientist Intern Jun - Aug 2022
Gaia Soulution Management Consulting Hangzhou, CN
● Conducted comprehensive data collection and categorization of oral medication literature for chronic rare diseases, utilizing Python and SQL to enhance research database quality at Gaia Solutions
● Developed a Random Forest algorithm to classify and predict patient responses to oral medications, applying machine learning techniques to support personalized treatment approaches in rare disease management
● Leveraged NLP for sentiment analysis on patient feedback from online forums to gauge perceptions about treatments, incorporating patient-centric insights into research Data Analyst Intern Jun 2020 - Jan 2021
Xiamen Meiya Pico Information Xiamen, CN
● Conducted and analyzed A/B testing experiments to evaluate different web features, enhancing product marketing and customer relationship management, increased customer interaction by 30%
● Constructed and tested data pipeline and fully automated & interactive Tableau dashboards to track key SEO metrics, like search engine rankings, organic traffic, bounce rates, and page load times
● Performed SQL queries to build data collection, storage, and processing infrastructure. Analyzed email campaign performance, increasing user interaction with product updates and promotions by 40% PROJECTS
E-commerce Customer Churn Prediction and Early Warning System Nov 2022 - Feb 2023
● Cleaned historical order data and selected features based on correlation insights using variance thresholds and mutual information, creating new statistical, cross, and aggregate features to enhance model performance
● Applied Decision Trees, Random Forests, and XGBoost for churn prediction, optimizing with Random Search and cross-validation. Achieved an F1 score above 0.8, increasing monthly retention by 12 percentage points