Post Job Free
Sign in

Data scientist

Location:
Los Angeles, CA
Posted:
May 19, 2025

Contact this candidate

Resume:

Hailey Xiaodong Xue

*********@*.****.*** https://www.linkedin.com/in/hailey-xue/ 424-***-**** Los Angeles, CA EDUCATION

University of California, Los Angeles (UCLA), Master of Data Science in Health Expected, June, 2025

• GPA: 4.0/4.0.

National University of Singapore, BS in Business Analytics (with Honors) 2023 SKILLS

Programming languages: Python (NumPy, Pandas, Matplotlib, Seaborn, scikit-learn, TensorFlow, PyTorch, PySpark), R

(dplyr, ggplot2), SQL, HTML, CSS.

Tools and Software: MySQL, Jupyter, Tableau, R shiny, AWS SageMaker, Databricks, Git, Docker. Skills: Machine Learning, NLP, Data Visualization, Neural Network, Database Management. Healthcare Standards: ICD-10-CM, SNOMED, OMOP

PROFESSIONAL EXPERIENCE

UCLA Health Westwood, CA

Data Scientist 06/2024 - Present

• Fine-tuned BERT-based models for Named Entity Recognition (NER) on Spanish clinical notes using PyTorch and Hugging Face Transformers in AWS SageMaker, achieving a mean F1 score of 0.79 across 88 clinical phenotypes.

• Implemented weak supervision to enrich training data and applied weighted loss functions to distinguish between strong and weak labels, improving model performance in imbalanced classification tasks.

• Optimized GPU utilization through batch size tuning and mixed precision training to accelerate model training and reduce memory overhead during experimentation.

Epidemiology & Biostatistics Department @ UCI Irvine, CA Data Analyst 07/2024 - Present

• Extracted and processed EHR data in OMOP format using SQL and PySpark to define patient cohorts and clinical covariates for survival analysis via Cox regression.

• Supported drug efficacy research by transforming and validating large-scale observational data and contributing to grant-related data deliverables.

Himalaya Wellness Company Singapore

Data Analytics Intern 12/2021 - 05/2022

• Designed and integrated a centralized Tableau dashboard system for operational performance metrics, improving data retrieval speed by 5x and significantly reducing manual reporting errors through automation.

• Built and deployed 10 interactive Tableau dashboards to track sales, costs, and KPIs across business units, supporting data-driven decision-making that contributed to a 5.1% revenue increase and 8.3% cost reduction over one fiscal year. PROJECT EXPERIENCE

RetiMark Singapore

Lead Data Analyst and Full-Stack Developer 08/2022 - 11/2022

• Consolidated and cleaned three years of Korean national health survey data, integrating demographics, clinical biomarkers, comorbidities, and lifestyle variables to construct a robust dataset for predictive modeling.

• Developed and evaluated multiple machine learning models, including Random Forest and Gradient Boosting, to predict diabetes risk; achieved an optimized F1 score of 0.877, supporting early screening strategies.

• Built a full-stack diabetes risk monitoring application using HTML, CSS, Flask, and Firebase, translating machine learning predictions into a user-friendly web interface for public health outreach and education. Sentiment-based Stock Price Prediction Singapore

Lead Dara Scientist 08/2022 - 10/2022

• Developed a sentiment classification model for over 5,000 English-language financial news headlines from OMX Helsinki, significantly improving signal extraction accuracy for downstream forecasting tasks.

• Integrated sentiment features into a stock price prediction pipeline using binary classification and Gradient Boosting Regressor, boosting prediction accuracy from 0.552 to 0.88.



Contact this candidate