Data Scientist Research

Location:

Posted:

October 14, 2022

Resume:

Han (Michelle) Bao, PhD

E-mail: **.***********@*****.*** Tel: 702-***-**** LinkedIn: /in/hanbao10 U.S. permanent resident Data scientist with 5+ years’ experience in quantitative research design, end-to-end machine learning model development, business insights discovery, and solutions delivery; excellent communication and data storytelling skills. SKILLS

Python (Numpy, Pandas, Scipy, Scikit-learn, Imblearn, Matplotlib, Seaborn), Jupyter Notebook, Flask, SQL, C++, Spark, Google Cloud Platform (GCP), Git, GitHub, Linux/Unix, Machine Learning (regression, classification, clustering), anomaly detection, Time Series, A/B testing, Feature engineering, Data Processing and Visualization, SHAP WORK EXPERIENCE

Techlent Inc., San Mateo, CA 02/2022 – present Data Scientist Fellow

• Developed a coupon redemption insight engine to understand customer needs and improve marketing strategy o Established connections between large-scale relational datasets using SQL and applied several balancing algorithms to deal with the highly imbalanced dataset.

o Set up a pipeline to iteratively improve model performance from data querying, data wrangling, feature engineering to model building and evaluation.

o Utilized binary classification models (Logistic Regression, Supporter Vector Machine, Random Forest, LightGBM, XGBoost) to predict target customers and evaluated model performance by F score and Precision-Recall AUC score. Selected XGBoost model and analyzed feature importance (e.g. SHAP) to gain business insight on how to increase coupon redemption rate.

o Wrapped the optimized model as an API using Flask and deployed it to Google Could Platform. Michigan State University, MI 03/2016 – 10/2021 Research Scientist/Data Scientist

• Buit regression model to predict plant carbon fixation to increase crop yield o Collected protein and metabolite concentration data related to photorespiration pathway that causes significant loss of carbon in plants. Used data augmentation to increase the instances for learning. o Employed a tree-based pipeline optimization tool to select several regression models to predict rate of carbon loss using protein concentration data as input features. o Evaluated model performance by RMSE. The machine learning approach outperforms the classical kinetic model

(RMSE = 8 vs 10). Analyzed feature importance to help target enzymes for bioengineering to minimize photorespiratory carbon loss. Increased biomass production by ~25% in transgenic plants.

• Developed a polymer quality optimizer to improve the sensitivity of chemical sensors for food safety o Collaborated with a startup company to collect production data and selected features from processing parameters and polymer compositions.

o Applied classification models (Logistic Regression, Support Vector Machine, Random Forest) to classify the binding capacity of polymer films produced under different conditions. Selected Random Forest model with the highest precision score.

o Presented insights gained from feature importance to collaborators and worked with their research and development team to develop implementation plans to optimize the production process. o Reduced process development time by ~ 60% and saved ~$150K in development cost. EDUCATION

Ph.D. Physical Chemistry Chinese Academy of Sciences Beijing, China 2009 B.S. Chemistry China Agricultural University Beijing, China 2003

Contact this candidate