Le Van Hoang — AI Engineer
I (+**) * *** ***** • # ********@**.***.***.** • ð hoanglvuit • § hoanglvuit Summary
A passionate AI/Machine Learning student with a strong foundation in Deep Learning (DL), Machine Learning (ML), and Natural Language Processing (NLP). Eager to gain hands-on experience through an internship in a production environment. Highly motivated to learn and grow.
Education
University of Information Technology (UIT) HCMC, Vietnam Bachelor of Computer Science 9/2022–1/2026 (expected) GPA: 8.95/10.0 (current)
Technical Skills
Languages: Python, HTML, CSS
ML & Frameworks: PyTorch, scikit-learn, FastAPI
Tools: Git, Docker, CI/CD, Linux, Jenkins
Database: SQL (Microsoft SQL Server)
Projects
Chatbot about Vietnam criminal law QA Solo Project 05/2025 – 06/2025 [GitHub] [Website]:
Developing a Vietnamese Criminal Law QA Chatbot using RAG (Retrieval-Augmented Generation) architecture.
Developed the backend with FastAPI and the frontend using Tailwind CSS and Vite.
Deployed the frontend on Vercel (free-tier) and initially deployed the backend on Railway (free-tier), but due to auto-sleep limitations, migrated to a VPS
Encountered a Mixed Content issue after VPS deployment, resolved it by purchasing a custom domain and configuring Nginx as a reverse proxy with SSL
Implemented CI/CD automation using Jenkins to build and push Docker images, SSH into the VPS, pull images, and run containers. Full deployment workflow is documented on GitHub RAG Chatbot-Machine learning QA Solo Project 10/2024 – 01/2025 [GitHub]:
Built a CLI-based Retrieval-Augmented Generation (RAG) chatbot to answer machine learning questions in Vietnamese.
Converted a 422-page textbook to Markdown format, then applied recursive chunking (based on the Markdown characteristic) for document segmentation.
Used BERT embeddings and semantic routing to ensure accurate passage retrieval.
Applied custom prompt engineering techniques to maintain factual consistency. Legal Document Retrieval Team Size: 4 10/2025 – 12/2025 [GitHub]:
Lead team and served as main developer, developed legal document retrieval system by integrating a Bi-Encoder model for retrieval and Cross-Encoder models for ranking, enabling effective retrieval from text-based queries.
Addressed the limitation of having only question-answering datasets by fine-tuning the Bi-Encoder with Multi-Negative Ranking Loss and enhancing Cross-Encoder training by applying negative mining techniques. Drought Forecasting - TimeSeries Team Size: 5 04/2025 – 05/2025 [GitHub]:
Led the development of the AI core for the project.
Built and trained deep learning models (GRU, LSTM, Attention-LSTM, Transformer) for drought prediction using 20 years of data across 3,109 regions
Enhanced prediction accuracy by integrating both dynamic and static features into the temporal modeling pipeline Adversarial Patch Attacks on Sign Classifiers Solo Project 03/2025 – 05/2025 [GitHub]:
Applied the CamoPatch technique to deceive a self-trained sign classifier, reducing its accuracy from 95% to 1%.
Enhanced the genetic algorithm by customizing the individual update mechanism for real-world constraints. 1/2
Certifications
Machine Learning Specialization and Deep Learning Specialization : Coursera - Andrew Ng (2024) Applications of AI for Anomaly Detection: NVIDIA (2025) TOEIC: Listening & Reading: 765 (2024), Speaking & Writing: 280 (2024) Research and Competitions
SoICT Hackathon 2024: Top 3 - Legal Document Retrieval Optimizing Legal Document Retrieval in Vietnamese with Semi-Hard Negative Mining: Submitted to ICCCI 2025 (first author - Under Review)
Prompt Manipulation for Targeted Adversarial Object Generation in Stable Diffusion: (first author - Manuscript ready for submission)
2/2