TRIEU-HUY PHAN
039******* ****************@*****.*** github.com/kiyotaka1102 District 2, Ho Chi Minh City, Viet Nam
EDUCATION
Ho Chi Minh City University of Technology and Education (HCMUTE), Viet Nam
• GPA: 3.2 / 4.0
• Major: Information technology
• Year: 3rd year student (entering 4th year in upcoming semester) EXPERIENCE
• Information Technology student at HCMUTE with experience in AI, deep learning, and large language models (LLMs).
• Research experience in computer vision and multi-modal learning, including work on multi- camera tracking, 3D scene understanding, and vision-language models
• Familiar with recent techniques in LLM fine-tuning such as LoRA and RLHF, and have applied VLMs in real-world applications like text-video retrieval
• Comfortable working with modern AI tools and frameworks, and able to learn and adapt to new concepts and technologies when needed.
• Actively seeking opportunities as an AI Research or AI Engineer to contribute to ongoing work in multi-modal AI and generative models while gaining deeper practical experience. PROJECTS
1. AI City Challenge 2025 – Track 1: Multi-Camera 3D Perception January – June, 2025 Designed a real-time multi-camera multi-class tracking system for synthetic indoor scenes using RGB, depth, and calibration data across 504 cameras and 19 layouts (e.g., warehouse, hospital, office). Our method achieved Top 4 on the public leaderboard with a 3D HOTA score of 25.40%, outperforming the baseline by over 11%.
Key contributions include:
• Trajectory-level Fréchet Distance Affinity: Captured long-term motion to improve association under occlusion.
• 3D IoU Affinity: Enforced spatial consistency across views to reduce identity duplication.
• VGCR Module: Proposed a View-Aware Geometric Center Refinement that fuses multi-view depth, geometry, and temporal filtering.
• Hybrid Orientation Estimation: Combined pose-based yaw with a model adapted from Orient Anything for reliable orientation prediction.
• 3D Box Estimation: Used ViTPose, YOLOv12x, FastReID (MGN), Orient-Anything (Dinov2), and camera calibration to produce accurate 3D bounding boxes. Role: Lead Coder, AI Researcher
Technologies: PyTorch, OpenCV, YOLOv12x, ViTPose, FastReID (MGN), Orient-Anything (Dinov2), Multi-View Geometry, Kalman Filter, Algorithm Tracking. Paper: VGCRTrack: Multi-Camera 3D Tracking with View-Aware Geometric Center Refinement
(Pulished: 20/10/2025)
2. HDmoVie February – May, 2025
A movie review platform where users can explore movie information, write reviews, blogs, and manage personalized watchlists. Includes interactive features like liking, commenting, and following users to stay updated. Built with clean architecture and applied design patterns to support scalability and maintainability.
Role: System Designer, Full-Stack Developer
Technologies: Node.js, Express.js, TypeScript, TypeORM, MySQL, Vite, ReactJS, TailwindCSS, JWT, Redis, Git, Docker
GitHub: https://github.com/zzVu77/HDmoVie
Website: https://hdmovie-oose.netlify.app/
3. Text-Video Retrieval System September – November 2024 Developed and deployed a full-stack web application for text-to-video retrieval as a solo project for the IT Project course. The system addresses challenges in matching natural language queries with relevant video content using multi-modal learning and vector similarity search. Built a three-stage architecture: video preprocessing (scene segmentation via AutoShot), feature indexing (image-text embeddings using Nomic, BLIP, EasyOCR), and inference with re-ranking
(ImageReward). Implemented FAISS-based vector search and FastAPI for backend, with an interactive user interface.
Role: Developer – System Architect, Researcher, Full-Stack Engineer Team size: 1
Technologies: Python, PyTorch, Nomic, BLIP, EasyOCR, FAISS, SentenceTransformers, FastAPI, HTML/CSS, JavaScript
Features: Multimodal embedding, re-ranking with human preference model, Vietnamese-English translation support, dynamic reranking index switching, image and text search modes GitHub: https://github.com/kiyotaka1102/IT_Project SKILLS
• Programming Languages: Python, JavaScript, TypeScript
• Frontend: HTML, CSS, ReactJS, Bootstrap 4, Tailwind CSS
• Backend: Node.js, Express.js, FastAPI
• Database: MySQL
• Dev Tools: Git, Docker, VS Code, DogAPI
• Cloud Services: Cloudinary
• AI/ML & Deep Learning: PyTorch, TensorFlow, OpenCV, Transformers, CLIP, FAISS
• Multi-Modal AI: Vision-Language Models (e.g., CLIP, InternVideo, CLIP-ViP)
• Language Proficiency: Vietnamese (native), English (strong reading & listening comprehension)
AWARDS/RECOGNITIONS/VOLUNTEER WORK
• First Author, ICCV Workshops 2025 — "VGCRTrack: Multi-Camera 3D Tracking with View- Aware Geometric Center Refinement"
• Top 4, AI City Challenge 2025 – Multi-Camera 3D Perception (hosted by NVIDIA, ICCV Workshop)
• Finalist, AI Challenge 2024 – Text-Video Retrieval (DOST HCMC)
• Top 2, Git & GitHub Study Jam 2025, GDSC – HCMUTE
• Encouragement Award, Hackathon HCMUTE 2025