Hailey Xiaodong Xue
*********@*.****.*** https://www.linkedin.com/in/hailey-xue/ 424-***-**** Los Angeles, CA EDUCATION
University of California, Los Angeles (UCLA), Master of Data Science in Health Expected, June, 2025
• GPA: 4.0/4.0.
National University of Singapore, BS in Business Analytics (with Honors) 2023 SKILLS
Programming languages: Python (NumPy, Pandas, Matplotlib, Seaborn, scikit-learn, TensorFlow, PyTorch, PySpark), R
(dplyr, ggplot2), SQL, HTML, CSS.
Tools and Software: MySQL, Jupyter, Tableau, R shiny, AWS SageMaker, Databricks, Git, Docker. Skills: Machine Learning, NLP, Data Visualization, Neural Network, Database Management. Healthcare Standards: ICD-10-CM, SNOMED, OMOP
PROFESSIONAL EXPERIENCE
UCLA Health Westwood, CA
Data Scientist 06/2024 - Present
• Fine-tuned BERT-based models for Named Entity Recognition (NER) on Spanish clinical notes using PyTorch and Hugging Face Transformers in AWS SageMaker, achieving a mean F1 score of 0.79 across 88 clinical phenotypes.
• Implemented weak supervision to enrich training data and applied weighted loss functions to distinguish between strong and weak labels, improving model performance in imbalanced classification tasks.
• Optimized GPU utilization through batch size tuning and mixed precision training to accelerate model training and reduce memory overhead during experimentation.
Epidemiology & Biostatistics Department @ UCI Irvine, CA Data Analyst 07/2024 - Present
• Extracted and processed EHR data in OMOP format using SQL and PySpark to define patient cohorts and clinical covariates for survival analysis via Cox regression.
• Supported drug efficacy research by transforming and validating large-scale observational data and contributing to grant-related data deliverables.
Himalaya Wellness Company Singapore
Data Analytics Intern 12/2021 - 05/2022
• Designed and integrated a centralized Tableau dashboard system for operational performance metrics, improving data retrieval speed by 5x and significantly reducing manual reporting errors through automation.
• Built and deployed 10 interactive Tableau dashboards to track sales, costs, and KPIs across business units, supporting data-driven decision-making that contributed to a 5.1% revenue increase and 8.3% cost reduction over one fiscal year. PROJECT EXPERIENCE
RetiMark Singapore
Lead Data Analyst and Full-Stack Developer 08/2022 - 11/2022
• Consolidated and cleaned three years of Korean national health survey data, integrating demographics, clinical biomarkers, comorbidities, and lifestyle variables to construct a robust dataset for predictive modeling.
• Developed and evaluated multiple machine learning models, including Random Forest and Gradient Boosting, to predict diabetes risk; achieved an optimized F1 score of 0.877, supporting early screening strategies.
• Built a full-stack diabetes risk monitoring application using HTML, CSS, Flask, and Firebase, translating machine learning predictions into a user-friendly web interface for public health outreach and education. Sentiment-based Stock Price Prediction Singapore
Lead Dara Scientist 08/2022 - 10/2022
• Developed a sentiment classification model for over 5,000 English-language financial news headlines from OMX Helsinki, significantly improving signal extraction accuracy for downstream forecasting tasks.
• Integrated sentiment features into a stock price prediction pipeline using binary classification and Gradient Boosting Regressor, boosting prediction accuracy from 0.552 to 0.88.