Data Scientist

Location:

San Francisco, CA, 94102

Salary:

70000

Posted:

April 16, 2025

Contact this candidate

Resume:

YIFEI ZHANG

510-***-**** ****************@*****.*** Linkedin: https://www.linkedin.com/in/yifei-zhang-648096327/ Address CA EDUCATION

New York University Tandon School of Engineering Brooklyn, NY Master of Science in Electrical Engineering GPA: 3.53/4.0 May 2024 Relevant Coursework: Machine Learning, Probability and Stochastic, Applied Matrix Theory, Introduction to System Engineering, Physics of Quantum Computing

University of California, Santa Cruz Santa Cruz, CA Bachelor of Arts in Network & Digital Technology Dean’s Honors for Fall 2020 - Spring 2021 Jun. 2022 Relevant Coursework: Computer Systems and C Programming, Applied Discrete Mathematics SKILLS

Programming Languages & Tools: Python (Pandas, NumPy), SQL, Tableau, Excel Data Processing & Statistical Techniques: Data Analysis, Data Cleaning, Exploratory Data Analysis (EDA), Data Visualization, Statistical Analysis

Machine Learning: Logistic Regression, Linear Regression, Random Forest, SVC, Decision Tree, KNN, XGBoost, K-means Clustering, ARIMA, LSTM

PROFESSIONAL EXPERIENCE

Quantumera AI Remote

Data Scientist Intern Nov. 2024 - Jan. 2025

● Optimized the data management of the metro system by developing a scalable SQL solution, which improved query response times by 15%, lowered operational costs, and increased the efficiency of data storage and retrieval.

● Utilized Python (Pandas, NumPy) to analyze large-scale transportation data, identifying peak congestion periods and recommending optimized routes and schedules, resulting in a 10% reduction in commute delays.

● Developed predictive machine learning models (regression, clustering) to forecast peak traffic times and refine transportation routes, leading to enhanced operational efficiency and reduced delays.

● Created and deployed interactive Tableau dashboards, providing actionable insights that aligned with business objectives, boosted client satisfaction, and informed strategic decision-making. New York Human Resources SaaS Software Company Remote Data Scientist Intern Nov. 2023 - Dec. 2023

● Developed and optimized a user interest recommendation model using natural language processing (NLP), Python, and annotation techniques, resulting in more accurate and personalized recommendations.

● Conducted predictive analytics to analyze customer data, segment customers using classification models (Logistic Regression, Random Forest), and assist in formulating targeted marketing strategies based on user preferences.

● Applied time series forecasting techniques using ARIMA to predict product sales trends, contributing to more effective business planning.

● Translated complex data analysis findings into actionable business insights, using Tableau for data visualization, and Excel for presenting model performance to support decision-making.

● Collaborated with cross-functional teams to implement data-driven solutions, improving recommendation system accuracy, sales forecasting, and overall customer engagement.

PROJECT EXPERIENCE

Credit Card Fraud Detection via ML Method Feb. 2024 - May. 2024

● Leveraged Python to develop a robust fraud detection model on a highly imbalanced credit card dataset containing over 40,000 rows, achieving an impressive (P-R) score of 0.75 despite the extreme class imbalance.

● Conducted comprehensive exploratory data analysis (EDA) and data preprocessing, uncovering critical patterns in fraudulent transactions and addressing significant sample imbalance issues

● Applied Synthetic Minority Over-sampling Technique (SMOTE) for oversampling, which demonstrated better performance compared to down-sampling

● Conducted cross-validation across multiple machine learning models, including Logistic Regression, Support Vector Classifier (SVC), Decision Tree, and K-Nearest Neighbors (KNN), ultimately selecting Logistic Regression due to its high AUC score, ensuring reliable fraud detection performance

YouTube Advertising Strategy For E-commerce Sales Jan. 2023 - Apr. 2023

● Conducted in-depth data analysis on YouTube trending data to optimize E-commerce sales, leading to the formulation of a data-driven advertising strategy that resulted in a 25% increase in category-specific views

● Applied rigorous cross-validation on multiple regression models, including Linear Regression, Random Forest, and XGBoost, for effective outlier removal on a transformed dataset, ensuring cleaner and more reliable insights

● Performed advanced time series forecasting using ARIMA and LSTM to predict category-specific view trends, achieving an impressive R score of 0.9 for the selected category, demonstrating high model accuracy

● Conducted K-Means clustering for strategic segmentation and better decision-making, further enhanced by leveraging word embedding techniques with large language models (LLMs) like BERT to extract valuable insights from textual data

Contact this candidate