Post Job Free
Sign in

Machine Learning Engineer

Location:
Brookline, MA
Posted:
June 10, 2025

Contact this candidate

Resume:

Nadia Zhao

838-***-**** ***********@*****.*** www.linkedin.com/in/xiaohz11 Boston, MA

EDUCATION

Harvard University Cambridge, MA

Master of Science in Computational Biology and Quantitative Genetics Aug. 2023 - Jun. 2025 Relevant Coursework: Computing for Big Data, Regression Analysis, Genomic Data, Longitudinal & Survival Analysis, Epidemiological Methods, Quantitative Methods for Pharmaceutical Regulatory Science University of California, Irvine (GPA: 3.72) Irvine, CA Bachelor of Science in Applied Mathematics Sep. 2019 - Jun. 2023 Relevant Coursework: Numerical Analysis, Probability, Linear Algebra WORK EXPERENCE

Machine Learning Engineer NextTier Sacramento, CA Predictive Clinical Trial Qualification Web App May 2024 - Sep. 2024

● Designed workflow and deployed machine learning model (XGBoost) for real-time patient qualification predictions using Streamlit framework in SQL and Python; reduced manual screening time by 33%.

● Conducted model selection (Random Forest, XGBoost) using PyTorch and fine-tuning for models assessing patients' eligibility of obesity clinical trials using medical and genetic data; improved enrollment rates by 24%.

● Built Databricks dashboard for recruitment analytics, enabling trial administrators to monitor recruitment performance and optimize strategies; saved manual data-pulling time by 80%.

● Integrated explanatory analytics in R to identify key factors (BMI, age, ...) affecting patient engagement and retention, enhancing patient self-assessment efficiency and long-term business development strategies.

● Technologies Used: Python, R, SQL, Databricks, MS Copilot, ChatGPT, GitHub Copilot, OpenAI. RESEARCH EXPERENCE

Research Assistant Harvard T.H. Chan School of Public Health Cambridge, MA Drug Repurposing for Eye Diseases Using Mendelian Randomization (MR) May 2024 - May 2025 Dr. Liming Liang’s Group in Biostatistics Department

● Investigated causal relationships between genetic variants, protein biomarkers, and eye diseases (AMD, Glaucoma, Cataracts) through data engineering and modeling in R; identified 3 potential drug repurposing opportunities.

● Developed an R package for cleaning and integrating data from different upstream sources (drug target genes, protein expressions, and outcome GWAS data); improved analysis efficiency by 85%.

● Applied clumping and linkage disequilibrium analysis to ensure genetic variant independence, reducing confounding effects; performed MR analyses using TwoSampleMR; validated causal effects and significance through visualizations (Volcano Plot, Forest Plot).

● Tools Used: R, TwoSampleMR, biomartR, UniProt, GWAS summary statistics. Medical Oncology Project Harvard T.H. Chan School of Public Health Cambridge, MA Breast Cancer Prediction Research Assistant Sep. 2023 - Dec. 2023 Dr. Erin Lake s Applied Regression Analysis Course

● Developed machine learning models to predict breast cancer likelihood from digitized images of a Fine Needle Aspirate in breast mass in Python; provided complementary information of traditional diagnostic methods that rely solely on medical professionals; facilitated decision-making and improved time-to-diagnosis by 35%.

● Improved training data quality by missing value imputation and feature selection; conducted Exploratory Data Analysis (e.g. heatmaps, boxplots in Tableau), and summary statistics to uncover patterns in cancer diagnostics.

● Implemented Linear and Generalized Linear Models (GLMs) for feature analysis and Survival Models for prognostic evaluation in R, increasing model accuracy.

● Tools Used: Python, R, Tableau, statistical modeling techniques. SKILLS

Programming Languages: Python (4 yrs), R (2 yrs), SQL (1 yr), MATLAB (2 yrs) Machine Learning Frameworks: Scikit-learn, TensorFlow, PyTorch Data Visualization: Tableau, Power BI, ggplot2, Matplotlib Bioinformatics Tools: TwoSampleMR, biomartR, UniProt, GWAS and EWAS Collaboration Tools: GitHub, Jira, Slack



Contact this candidate