Data Science Machine Learning

Location:

Evanston, IL

Posted:

November 04, 2023

Contact this candidate

Resume:

ZIYAN LIU

+1-510-***-**** ********@********.*** Linkedin Github

EDUCATION

Northwestern University, M.S. in Machine Learning and Data Science Sept 2022 - Dec 2023 (Expected) University of California - Berkeley, B.A. in Applied Math and Data Science, double majors Aug 2018 - May 2022 SKILLS

Programming Language: Python, R, Java

Databases: SQL, Advanced Microsoft Excel, MongoDB, Spark, Hadoop, Extract Transform Load (ETL,ELT) Data Science: Regression, Classification, Applied Statistics, Experimental design, Hypothesis Testing, Neural Network, Deep Learning, Data Visualization

Machine Learning: Natural Language Processing(Transformers, LLM), ML libraries(scikit learn, TensorFlow) PROFESSIONAL EXPERIENCE

FocusKPI, Data Science Intern Oct 2023 - Now

Remote, US

• Design, implement, and maintain database schemas optimized for NLP tasks, ensuring data integrity and security. Partic- ipate in design, implementation, and evaluation of models.

• Stay updated with the latest NLP research and techniques to contribute to our models and algorithms. Contribute to the planning, execution, and delivery of NLP projects.

• Collaborate closely with other team members to ensure that project requirements are met. Document methodologies and results in a clear and structured manner.

Wells Fargo, Advanced Data Analyst Intern June 2023 - Aug 2023 New York City, NY

• Utilized Cox PH model to conduct machine learning on the key drivers influencing opt-out rates in 20k+ email campaigns, and used Python/pandas to analyze available data, identify trends, and build skills around modeling.

• Conducted feature selection through multiple data sources to produce analyses that provide actionable business insights with causal inference models (A/B testing) and discovered 10+ significant features affecting email opt-out.

• Examined inconsistencies across diverse customer transaction data sources from Teradata, pinpointing a 40% overlap in data points. Identified, stored, and organized data from data warehouses using Python and SQL to enable analytically sound decisions to be made quickly and at scale.

TikTok, Product Operations Intern - Data Analytics Feb 2021 - July 2021 Shanghai, China

• Utilized PostgreSQL and Tableau to gather, analyze, and transform large datasets from Apache Hive, resulting in an average 5k+ increase in fanbase across 50+ KOLs and a 10% reduction in time spent on running SQL queries.

• Monitored campaign performance using Python, and conducted predictive analysis with cross-functional teams to maximize value and optimize first-party customer data sets, resulting in a noteworthy 5% increase in live streaming Daily Active Users (DAU) and an 8% improvement in user retention. PROJECTS

Cloud Engineering Pipeline for Credit Fraud Detection Model (XGBoost, Machine Learning Pipeline)

• Implemented end-to-end machine learning architecture on AWS Cloud using Docker, S3, ECS, EventBridge, and SQS Que.

• Developed a pipeline for data acquisition, EDA, feature engineering, training, hyperparameter tuning, and model evaluation.

• Developed Streamlit application on ECS for model inference and enabled model updates from new data inputs. Data Consulting Project with ABC Supply Co. Inc. (Clustering, Explanatory Models)

• Applied dimensionality reduction using PCA and UMAP; Deployed 8 clustering algorithms (Birch, K-means, Agglomerative, etc.) with hyperparameter tuning and domain knowledge, drawing on over 70 features from internal and external datasets.

• Implemented cluster ensemble to select the most robust clustering output, categorized 600+ branches into 15 distinct clusters, and analyzed the key factors using Random Forest models and SHAP plots.

Contact this candidate