Yaqi Wu
Data Scientist, Green card holder
Boston, MA • 608-***-**** • *********@*****.*** • /in/yaqi-wu55 • github.com/wuyaq SUMMARY
Data scientist with 5+ years of experience in Machine Learning and Statistical Inference. IBM Data Science Certificate. Experts extract insights from data and transfer them into actionable solutions. SKILLS
Python, R, SQL, Pandas, SciPy, Matplotlib, Plotly, scikit-learn, Dask, PyTorch, Regression, Classification, Clustering, Transformer, MLflow, ClickHouse, GCP, BigQuery, GitHub, Kubernetes, Tableau EXPERIENCE
Data Scientist, INNIO Group 2023/05 – present
● Built a predictor for proactive maintenance scheduling and enhancing asset longevity o Manage the collection and processing of complex, large, and unstructured sequence data from different resources, like Prefect flow, SQL, etc.
o Conduct rigorous data preparation, including statistical analysis, resampling, feature engineering
(over 2000 features), and imputation, to ensure data quality and relevance for modeling. o Bild a transformer model to accurately predict the Remaining Useful Life (RUL) of engine part, and deploy the model as an interactive User Interface using Dash. o The model achieved a 10% reduction in loss compared to traditional methods. Research Specialist Data Scientist, University of Illinois at Chicago 2020/11 – 2023/04
● Built a treatment outcome predictor to advise on a cancer treatment plan, a pilot study o Collected and extracted patients’ medical records (~100 features), including age, cancer stage, blood indexes, biomarkers (cancer-related genes), etc., and treatment outcome (90% good and 10% bad). Maintained data in Redcap database management.
o Oversampled imbalanced data; Selected features, conducted feature engineering, encoding, and imputation; constructed pipelines to combine numerical and categorical data for modeling dataset preparation.
o Built ML models (Logistic Regression, Random Forest, XGBoost) to predict the treatment outcomes.
o Compared to traditional manual assessment methods, the selected model with 0.79 accuracy can help physicians save approximately 20% of treatment design time. Data Analyst Study Analyst, Deibel Laboratories, Inc. 2018/05 – 2020/10
● Approached statical inference to validate new food contamination test kits for biotech clients o Collect experimental data of food samples: contamination level, and test results using new kits and traditional test methods: contaminated or not
o Built logistic regression model for the new test kit and the traditional one respectively. Conducted statistical testing: Probability of Detection (POD) and 95% confidence interval. o Validated new test kid performed as same as traditional method but reduced the testing time by 50% o Wrote client reports and provided valuable business and technical suggestions Research Assistant Strategy Modeler, University of Wisconsin – Madison 2015/09 – 2017/12
● Developed a cost-saving strategy for dairy farmers o Collected and organized dairy cow data from DairyCOMP 305 database management (age, milk yield, feed intakes, etc.). Filled missing data using KNN. o Grouped the cows into 3 subgroups using KMeans clustering. A new strategy was adopted to provide feed to each group based on the average nutritional requirements of that group. o Reduced the overall feed cost by 10% compared to the traditional management strategy. EDUCATION
University of Wisconsin – Madison Madison, WI
M.S. in Dairy Science Mathematical Modeling 2015/09 – 2017/12