Xiao Jie (Nick) Pan Zhao
*******@********.*** +1-510-***-**** www.linkedin.com/in/xjpz Berkeley, CA https://github.com/nickpan2002 EDUCATION
University of California, Berkeley Berkeley, CA
Bachelor of Arts - B.A. in Data Science & B.A. in Economics GPA: 3.7/4.0 May 2025 (Expected) Leadership and Extra - Curricular: External vice president (CPU) - UC Berkeley Student Organization SKILLS
Programming Languages: Python, SQL, R, Java, Scheme Big Data & Machine Learning: Python (Scikit-learn, NumPy, Pandas, PyTorch, Keras, Selenium), Tableau, MongoDB, Spark Data Science & Analytics: ETL, Data pipeline (Cleansing, Wrangling, Modelling, Visualization, Interpretation), Statistics, BI Dashboards, A/B Testing, Regression Models, Random Forest, K-Means, KNN, Deep Learning PROFESSIONAL EXPERIENCE
Geisinger Danville, PA (Remote)
Business Intelligence Analyst Intern July 2024 - Present
• Implemented data-driven Business Intelligence for both customers and company-wide functionalities, achieving good customer feedback by generating business claim visualization dashboard for associated insurance companies.
• Collaborated with multiple departments to update and enhance the ticket tracking platform, including implementing a default year range, adding key performance indicators (KPIs), refining filters, and improving change management tracking and post - implementation review metrics.
• Utilized SQL in Azure DevOps and Microsoft SQL Management Studio to manage and manipulate datasets, preparing data for business analysis and visualization.
• Designed and developed an interactive Tableau dashboard with live data integration, leveraging SQL to extract and manage data stored in Microsoft SQL Server Management Studio, enabling real-time monitoring of flu shot compliance for Employee Health Department.
Rimble San Francisco, CA
Data Scientist /Machine learning Engineer Intern May 2024 - July 2024
• Developed comprehensive prediction pipelines for video games and sports betting, focusing on predicting win rate and player statistics for titles like Rocket League and Valorant, directly boosting user engagement and partner revenue.
• Engineered end-to-end data pipelines, including data retrieval from AWS DynamoDB, data transformation steps such as data mapping and anomaly removal into structured frames, and the integration of ELO ranking systems, resulting in enhanced data quality.
• Performed feature selection, feature engineering and built regression models to optimize predictions, achieving a 0.74 ROC AUC score in predicting winning results and reinforcing Rimble’s market leadership. PROJECT EXPERIENCE
Customer Churn Analysis and Prediction in Bank Industry https://github.com/nickpan2002/Bank-customer-churn-prediction
• Analyzed customer profile data within the banking industry and developed predictive algorithms to estimate the likelihood of customer churn based on a labeled dataset of 10,000 customers using Python.
• Conducted comprehensive data preprocessing, including data exploration, cleaning, visualization, categorical feature encoding, and standardization, to ensure the data quality and prepare it for advanced analysis.
• Trained supervised machine learning models (Logistic Regression, K-Nearest Neighbors, and Random Forest) with regularization using GridSearchCV.
• Identified Random Forest as the top-performing model, achieving an AUC score of 0.8598.
• Conducted a thorough exploratory analysis to pinpoint key characteristics and reasons driving customer churn, extracting valuable insights into prevalent patterns and trends within the data. Natural Language Processing and Topic Modelling on Watch Customer product Reviews https://bit.ly/NLP-TopicModelling
• Clustered watch customer reviews into groups for watch merchant using NLP and Topic Modelling.
• Preprocessed review text by tokenization, stemming, removing stop words, and extracted features by Term Frequency - Inverse Document Frequency (TF-IDF)
• Trained unsupervised learning models of K-means and Latent Dirichlet Allocation (LDA) to identify latent topics and extract business- relevant keywords for each cluster.