LY VINH THUAN
Data Scientist / AI Engineer
Email: **************@*****.*** Phone: +84-855******
GitHub: ThuanLy-0092 LinkedIn: Vinh Thuan Ly
About Me
I am a third-year Data Science student with a strong foundation in machine learning, NLP, and computer vision. Experienced in deploying AI systems and contributing to academic research (CVPRW 2025). Passionate about building real-world AI solutions through practical projects and research. Education
Bachelor of Science, Mathematics and Computer Science - Data Science Aug 2022 – Present University of Science, Ho Chi Minh, Vietnam GPA: 3.56/4 Skills
• Languages: English - VSTEP B2 (Fluent)
• Programming Languages: Python, SQL, C, C++.
• Concepts: PyTorch, Scikit-learn, FastAPI, Streamlit, LangChain, Huggingface, Docker, Render, Qdrant, MongoDB, RAG, Self-Prompting, Function Calling, Fine-Tuning, Docker, Render, API Development. Soft Skills
• Problem-solving and Analytical Thinking.
• Effective Teamwork and Communication.
• Ability to Work Independently and Collaboratively.
• Time Management and Task Prioritization.
Course Work
• Data Structures and Algorithms
• Discrete Mathematics
• Database Fundamentals
• Object-Oriented Programming
• Statistical Theory
• Introduction to Data Science
• Linear Programming
• Pattern Recognition
• Introduction to Artificial Intelligence
• Python for Data Science
Experience
Research Assistant - Aisia Lab Sep – Present
Developing a multimodal sarcasm detection model for NLP and CV challenges. Publication
Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling, CVPRW 2025. Accepted for publication. (2nd author) 1
Project Experience
ChatBot RAG - UIT DS Challenge 2024 Board A Jul 2024 – Oct 2024 Role: Team Leader Demo and Report: View Project Demo and Report
• Extracted text from documents using OCR (pytesseract), refined via LLM prompt tuning, and indexed in Qdrant for fast retrieval.
• Built preprocessing pipelines for student-related data and optimized data indexing with chunk size adjustments for efficient similarity search.
• Developed a RAG pipeline using Hybrid Search, Long Context Reorder, BGE-M3, and BM25, incor- porating Maximal Marginal Relevance (MMR) to enhance retrieval diversity.
• Enhanced chatbot context retention by storing the last 5 user queries, improving question generation and maintaining conversational coherence.
• Deployed a web application and API, achieving an F1 BertScore of 0.7, using FastAPI and Streamlit to provide real-time access to the RAG system.
Sarcasm Detection in Multimodal - UIT DS Challenge 2024 Board B Sep 2024 – Nov 2024 Role: Team Leader
• Built a multimodal sarcasm detection model combining BGE-M3 and ViT.
• Applied Fully Connected layers for feature fusion and gradient clipping.
• Addressed class imbalance with sqrt class weighting.
• Improved macro-F1 from 0.42 to 0.54 (Top 1 team: 0.45).
• Currently working on a research paper based on this project. House Price Prediction Apr 2024 – May 2024
Role: Member
• Developed web scraping tools to extract key house price features from Batdongsan.vn.
• Analyzed and extracted features using LLM, handled multicollinearity, outlier removal, and filtering.
• Built a custom geocoding tool to handle OpenStreetMap API errors, improving location-based fea- tures.
• Trained and tuned Polynomial Regression, Ridge Regression, and XGBoost models. Multi-Class Text Classification for Course Categorization March 2025 Role: Sole Contributor
• EDA: Analyzed data distribution and class imbalances.
• Machine Learning: Evaluated baseline models (Logistic Regression, RF, SVM, XGBoost) with 20% accuracy.
• Deep Learning: Fine-tuned a pre-trained model, ensemble models for improved accuracy to 79%. Event-based Eye Tracking March 2025
Role: Member
• Developed the model using the EfficientNet-BiGru-LTVSSM architecture for robust event-based eye tracking.
• Preprocessed raw event-based eye tracking data by converting it into a voxel grid, then cached the processed data to facilitate efficient training.
• Paper accepted for publication at CVPRW 2025.
• Built and deployed an MLflow server with a database to store and manage model training logs, ensur- ing reproducibility and monitoring of training metrics. Awards and Achievements
• Top 6 - Event-based Eye Tracking Challenge 2025 (Event-based Eye Tracking).
• Top 8 - UIT Data Science Challenge 2024, Board B (Sarcasm Detection in Multimodal).
• Top 18 - UIT Data Science Challenge 2024, Board A (ChatBot RAG). 2