Post Job Free
Sign in

Machine Learning Data Science

Location:
Ho Chi Minh City, Vietnam
Posted:
April 05, 2025

Contact this candidate

Resume:

LY VINH THUAN

Data Scientist / AI Engineer

Email: **************@*****.*** Phone: +84-855******

GitHub: ThuanLy-0092 LinkedIn: Vinh Thuan Ly

About Me

I am a third-year Data Science student with a strong foundation in machine learning, NLP, and computer vision. Experienced in deploying AI systems and contributing to academic research (CVPRW 2025). Passionate about building real-world AI solutions through practical projects and research. Education

Bachelor of Science, Mathematics and Computer Science - Data Science Aug 2022 – Present University of Science, Ho Chi Minh, Vietnam GPA: 3.56/4 Skills

• Languages: English - VSTEP B2 (Fluent)

• Programming Languages: Python, SQL, C, C++.

• Concepts: PyTorch, Scikit-learn, FastAPI, Streamlit, LangChain, Huggingface, Docker, Render, Qdrant, MongoDB, RAG, Self-Prompting, Function Calling, Fine-Tuning, Docker, Render, API Development. Soft Skills

• Problem-solving and Analytical Thinking.

• Effective Teamwork and Communication.

• Ability to Work Independently and Collaboratively.

• Time Management and Task Prioritization.

Course Work

• Data Structures and Algorithms

• Discrete Mathematics

• Database Fundamentals

• Object-Oriented Programming

• Statistical Theory

• Introduction to Data Science

• Linear Programming

• Pattern Recognition

• Introduction to Artificial Intelligence

• Python for Data Science

Experience

Research Assistant - Aisia Lab Sep – Present

Developing a multimodal sarcasm detection model for NLP and CV challenges. Publication

Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling, CVPRW 2025. Accepted for publication. (2nd author) 1

Project Experience

ChatBot RAG - UIT DS Challenge 2024 Board A Jul 2024 – Oct 2024 Role: Team Leader Demo and Report: View Project Demo and Report

• Extracted text from documents using OCR (pytesseract), refined via LLM prompt tuning, and indexed in Qdrant for fast retrieval.

• Built preprocessing pipelines for student-related data and optimized data indexing with chunk size adjustments for efficient similarity search.

• Developed a RAG pipeline using Hybrid Search, Long Context Reorder, BGE-M3, and BM25, incor- porating Maximal Marginal Relevance (MMR) to enhance retrieval diversity.

• Enhanced chatbot context retention by storing the last 5 user queries, improving question generation and maintaining conversational coherence.

• Deployed a web application and API, achieving an F1 BertScore of 0.7, using FastAPI and Streamlit to provide real-time access to the RAG system.

Sarcasm Detection in Multimodal - UIT DS Challenge 2024 Board B Sep 2024 – Nov 2024 Role: Team Leader

• Built a multimodal sarcasm detection model combining BGE-M3 and ViT.

• Applied Fully Connected layers for feature fusion and gradient clipping.

• Addressed class imbalance with sqrt class weighting.

• Improved macro-F1 from 0.42 to 0.54 (Top 1 team: 0.45).

• Currently working on a research paper based on this project. House Price Prediction Apr 2024 – May 2024

Role: Member

• Developed web scraping tools to extract key house price features from Batdongsan.vn.

• Analyzed and extracted features using LLM, handled multicollinearity, outlier removal, and filtering.

• Built a custom geocoding tool to handle OpenStreetMap API errors, improving location-based fea- tures.

• Trained and tuned Polynomial Regression, Ridge Regression, and XGBoost models. Multi-Class Text Classification for Course Categorization March 2025 Role: Sole Contributor

• EDA: Analyzed data distribution and class imbalances.

• Machine Learning: Evaluated baseline models (Logistic Regression, RF, SVM, XGBoost) with 20% accuracy.

• Deep Learning: Fine-tuned a pre-trained model, ensemble models for improved accuracy to 79%. Event-based Eye Tracking March 2025

Role: Member

• Developed the model using the EfficientNet-BiGru-LTVSSM architecture for robust event-based eye tracking.

• Preprocessed raw event-based eye tracking data by converting it into a voxel grid, then cached the processed data to facilitate efficient training.

• Paper accepted for publication at CVPRW 2025.

• Built and deployed an MLflow server with a database to store and manage model training logs, ensur- ing reproducibility and monitoring of training metrics. Awards and Achievements

• Top 6 - Event-based Eye Tracking Challenge 2025 (Event-based Eye Tracking).

• Top 8 - UIT Data Science Challenge 2024, Board B (Sarcasm Detection in Multimodal).

• Top 18 - UIT Data Science Challenge 2024, Board A (ChatBot RAG). 2



Contact this candidate