NGUYEN LE VY
Data Analyst
036******* *************@*****.*** linkedIn
EDUCATION
University of Information Technology - VNU Ho Chi Minh City Bachelor of Data Science - Faculty of Information Science and Engineering Graduation: 6/2025 GPA: 3.5/4
English: Full Professional Proficiency
SKILLS
Programming & Data Analysis : Python (Pytorch, Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, PySpark), SQL, Machine Learning, A/B Testing (hypothesis testing). Business Intelligence: Power BI (Advanced DAX, Dashboard Development), SQL Server (Advanced). ETL/ELT Processes: Familiar with Databricks, dbt (data build tool). Data Modeling: Star Schema, Snowflake Schema.
Soft Skill: Communication, Critical thinking, Problem-solving, Adaptability & Learning mindset. Microsoft Office: Excel, Word, PowerPoint.
CERTIFICATE
• Business Intelligence Course - Full Stack Data Science
• Advanced SQL on HackerRank
• The Power of Statistics
• Go Beyond the Numbers: Translate Data into
Insights
PROJECT
Classification of Cardiac Arrhythmia Using GA Stacking in Ensemble Learning September-2024 Tech Stack: Python (Pandas, Scikit-learn, Pytorch, Matplotlib, seaborn), Machine Learning link
Improved the performance of heart rate classification task using ensemble models such as SVM, Decision Tree, XGBoost, Logistic Regression, and CNN.
Combined ensemble models with genetic algorithms and grid search to find the optimal model combination. As the result, it improved 10% F1-Score compared to the baseline model. E-Commerce Customer Churn Prediction—Personal Project October-2024 Tech Stack: Python (Pandas, Scikit-learn, Matplotlib), Power BI, Machine Learning link
Developeda predictive model to identify at-risk customers, achieving 96% accuracy, 89.6% F1-score, and 86% recall for proactive retention strategies in a simulated e-commerce environment.
Executed ETL processes using Python and Pandas to clean and engineer features from a 50,000-row dataset, ensuring data integrity for robust modeling.
Built an interactive Power BI dashboard with advanced DAX calculations to visualize churn drivers (e.g., purchase frequency, customer lifetime value), empowering stakeholders with actionable insights.
Presented findings to hypothetical business clients, simulating real-world data-driven decision-making for retention campaigns.
Fraud Detection in Reviews using Graph Neural Networks— Big Data Subject Project June-2024 Tech stack: Python(Pytorch, pandas), Spark, Pyspark link
Preprocessing the YelpChi dataset by normalizing sparse matrices, streamlining adjacency lists, classifying nodes, and applying undersampling techniques for balanced data analysis.
The dataset was applied to GraphSAGE and CARE-GNN to identify the best model for detecting fraudulent reviews, archived 85% Accurancy.
The trained model was loaded and deployed to make predictions on each batch of data from Spark Streaming. ACTIVITY
A member of Data Science UIT Club Ho Chi Minh City Data Science, Faculty of Information Science and Engineering 6/2023-6/2024 COMPETITION
Top 13/50 Group B UIT Data Science Challenge 2023 Ho Chi Minh City Data Science, Faculty of Information Science and Engineering 8/2023-11/2023