Post Job Free

Resume

Sign in

Data Science Analyst

Location:
Boston, MA
Posted:
November 07, 2023

Contact this candidate

Resume:

Isabella Liu 425-***-**** ad0w2o@r.postjobfree.com Boston, MA

EDUCATION

Harvard T.H. Chan School of Public Health, Boston August 2022 - December 2023(expected) Master of Science in Health Data Science

Massachusetts Institute of Technology, Boston

Cross-registered graduate student

Relevant Coursework: Clinical Data Learning, Data Science, Big Data, Social and Biological Networks, Statistical Inference, Database Analytics, Deep Learning

The Pennsylvania State University, University Park August 2018 - December 2021 Bachelor of Science in Statistics, Data Science

Bachelor of Science in Mathematics, System Analysis Relevant Coursework: Machine learning, Graph Theory, Web Design in HTML, Computational Statistics, Survey Sampling, Data Privacy, Data Management, Calculus, Regression Analysis, Linear Algebra TECHNICAL SKILLS

Analytical Techniques: Data Wrangling, Machine Learning, Deep Learning, Statistics, Data Visualization, Database Management

Analytical Tools: Python, R, SQL, HTML, Linux, Mathematica, GitHub, SAS, D3.js, AWS, Tableau, Excel RELEVANT WORK EXPERIENCE

Clinical Data Analyst – Penn State College of Medicine May 2023 - Present

• Collaborated as a clinical data analyst with a multidisciplinary team of clinicians, professors, and researchers from prestigious institutions.

• Conducted comprehensive data analysis including constructing a baseline dataset for patients, utilizing ANOVA and providing critical descriptive statistics.

• Managed data quality by imputing missing data and addressing follow-up losses across multiple hospital sites.

• Executed advanced statistical techniques, including non-inferiority analysis and subgroup analyses, using linear mixed-effects models to assess the comparative effectiveness of treatment groups.

• Contributed significantly to identifying predictive factors for treatment responsiveness and evaluating clinical outcomes for potential publication through PCORI and The New England Journal of Medicine.

• Offered expert data consultation to medical professionals, facilitating data-driven decision-making within the healthcare domain.

Research Data Scientist – Penn State University November 2019 - January 2021

• Utilized pandas to perform web scraping, extracting relevant information and transforming it into structured data formats for subsequent statistical analysis.

• Conducted analysis of genome-wide sequencing data, employing established pipelines and creating customized data visualizations to glean insights from complex genetic datasets.

• Orchestrated and executed computational research projects, overseeing the planning and implementation of intricate research plans and method development.

• Maintained meticulous records of research findings and analysis results, facilitating efficient communication with fellow team members.

• Designed and implemented a SIR model, tailored for investigating virus evolution dynamics. RELEVANT PROJECTS

Network Analysis and Simulation of Pathogen Spread Python December 2022

• Analyzed network data from 75 rural Indian villages, visualizing networks and assessing degree distributions.

• Conducted SIR spreading simulations, estimated reproduction numbers, and executed randomized vaccine trials to assess efficacy.

Reinforcement Learning for Gastrointestinal Bleeding SQL & Python December 2022

• Extracted patient demographics, vitals, and treatment data from MIMIC-IV for real-time adaptation.

• Applied reinforcement learning to optimize blood product ratios, executing a Policy Iteration model on the dataset, employing a Markov decision process with clustering to define states. NLP with ClinicalBERT Embeddings Python November 2022

• Implemented sentence completion with ClinicalBERT and examined model biases.

• Trained logistic regression models on embeddings to predict hypertension.

• Leveraged UMAP and LDA for data visualization and evaluating the suitability of ClinicalBERT embeddings for medical NLP label extraction.

Pneumonia Chest X-ray Classification Python November 2022

• Utilized the Chest X-ray images dataset to develop a classification model distinguishing pneumonia vs. normal.

• Conducted data preprocessing, including resizing images, one-hot encoding labels, and implementing data augmentation techniques for enhanced model generalization.

• Developed a CNN using TensorFlow and Keras, employing transfer learning with the VGG16 pre-trained model. Fine-tuned the model architecture, achieving a training accuracy of 92.589% and a loss of 0.386. Covid ICU Beds Searching Website Python November 2021

• Developed a user-friendly website for visualizing and forecasting ICU bed capacity and classification by disease.

• Implemented healthcare data anonymization in Python, ensuring data privacy compliance.

• Built an interactive dashboard using D3.js and Matplotlib for daily refreshed Covid cases data analysis.

• Created and interlinked ICU beds databases using AWS and MySQL for efficient data management. AirBnB New User Bookings Prediction Python December 2020

• Applied XGBoost algorithm on 2M+ AirBnB transactions data with Python and predicted the country of booking made by a new user from among 34,000+ cities across 190+ countries with 85.9% accuracy. Homeless Youth Risk and Resilience Study R October 2020

• Implemented a multi-class classification with an LGBM model and predicted risk and protective factors associated with youth homelessness in seven cities across the United States with a minimum 10-fold cross-validation loss. YouTube Thumbnails Classification Python August 2020

• Analyzed 75k+ YouTube video thumbnails and topics along with a label indicating if it is clickbait, treated missing data and constructed a binary classifier to distinguish between clickbait YouTube video thumbnails from legitimate ones.

• Applied a DNN classifier using OpenCV with 10-fold cross-validation, tuned hyperparameters, and achieved an F1 score of 0.89.

Reddit Topics R July 2020

• Extracted 1M+ Reddit posts, cleansed and processed the data, removed stop-words, stemmed, and tokenized the keywords.

• Implemented a t-SNE model on the tokenized keywords for dimensionality reduction and built an XGBoost classifier on comments keywords to determine the subreddits to which the post belongs with an accuracy of 93.7%. TMDB Box Office Prediction Python April 2020

• Processed metadata for 7,000+ films from the Movie Database, imputed missing data and conducted exploratory data analysis.

• Predicted overall worldwide box office movie revenue with linear regression; optimized the model using stepwise regression and achieved an R-squared value of 0.94.

LEADERSHIP & INVOLVEMENT

• Penn State Math Club, Treasurer January 2020 - December 2021

• Conference Services & Commons Desk Operations, Student Scheduler September 2019 - February 2020

• THON Committee, Manager 2019

• Teaching Assistant in Statistics September 2019 - December 2019

• Penn State Food Service, Student Manager September 2018 - April 2019

• World in Conversation, Dialogue facilitators September 2018 - December 2018 HONORS & AWARDS

• American Statistical Association 2021

• National Statistics Honor Society 2021

• The National Society of Collegiate Scholars 2020

• Eberly College of Science Scholarship 2020

• Dean's List 2018-2021



Contact this candidate