THANG PHAN CHAU
********@**.***.***.** 096*******
EDUCATION
University of Information Technology, Vietnam National University (VNU-HCM) Sep. 2020 - Mar. 2025 Ho Chi Minh City, Vietnam.
Majoring in Data Science. GPA: 3.5/4.0.
Thesis: ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing. Mark 4.0/4.0. WORK EXPERIENCES
AI Center, FPT Software, Vietnam. Mar. 2024 - Present AI Research Resident
Topic: Language Large Models, Multimodal Learning, Agent-based Systems, AI4Code. Project: AgileCoder Dynamic Collaborative Agents for Software Development based on Agile Methodology.
• Proposed a novel multi-agent framework, powered by LLMs and Agile, allowing for flexible and dynamic progess, thereby enhancing the adaptability and the likelihood of success.
• Developed a dependency graph capturing the relationships among code files for efficient code generation.
• Conducted comprehensive experiments to demonstrate the efficacy of the proposed method. Our project on GitHub has received over 410 stars from the community.
Project: CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs.
• Developed a synthesis pipline to construct a new benchmark with nearly 20,000 questions, spanning diverse domains, to evaluate the depth of software and code comprehension in LLMs.
• Conducted extensive experiments to analyze the behavior of various LLMs in code understanding and reasoning, reveal- ing that GPT4o or Claude Chain-of-Thought (CoT) reasoning fails in many scenarios. On-going Project: Visual Long Program of Thought.
• Develop vision-language model for agentic tasks to enhance long-form reasoning and tool-using code capabilities.
• Address several limitations in Vision-Language Models (VLMs) like GPT4V that fails to tackle tasks, such as fine-grained object detection and many object counting.
The UIT@NLP Group, University of Information Technology, Vietnam. Nov. 2022 - Aug. 2024 Undergraduate NLP Research Student
Topic: NLP, Pretrained Language Models.
Project ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing.
• Introduced a new pre-trained language model for Vietnamese social media tasks.
• Achieved state-of-the-art (SOTA) performance on multiple Vietnamese social media benchmarks.
• Our public model on Hugging Face receives approximately 2.5K downloads per month and got total 88k downloads up-to-date.
Project Link Prediction for Wikipedia Articles as a Natural Language Inference Task
• Introduced a novel approach that frames the prediction of links between Wikipedia articles as a Natural Language Inference task
• Our method achieved a top-3 position on the private test leaderboard in DSAA-2023 Competition. SELECTED PUBLICATIONS denotes equal contribution.
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs Dung Nguyen Manh, Chau-Thang Phan, Nam Le Hai, Thong T. Doan, Nam V. Nguyen, Quang Pham, and Nghi D. Q. Bui. Proceedings of the 30th International Conference on Learning Representations (ICLR 2025)[pdf] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology Minh Huynh Nguyen, Chau-Thang Phan, Phong X. Nguyen, and Nghi D. Q. Bui. Proceedings of The 2nd ACM international conference on AI Foundation Models and Software Engineering (FORGE 2025) [pdf] ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing Quoc-Nam Nguyen, Chau-Thang Phan, Duc-Vu Nguyen, Kiet Van Nguyen. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023, oral paper), [pdf]. Link Prediction for Wikipedia Articles as a Natural Language Inference Task Chau-Thang Phan, Quoc-Nam Nguyen, Kiet Van Nguyen. Proceedings of the 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA-2023, short paper), [pdf].
TECHNICAL SKILLS
Programming: C/C++, Cmake, Git, Docker, Python.
Framework: Pytorch, Tensorflow, vLLM (VLMs and LLMs serving for Agentic evaluation), Transformers. AWARDS AND HONORS
• Outstanding Research Student (2021, 2022, 2023).
• University Academic Achievement Scholarship (2022).
• Third Prize at DSAA-2023 Competition.
• Student Volunteer Award: EMNLP2023
CONFERENCE PRESENTATIONS
EMNLP2023 Oral Presentations Dec. 2023
• Paper: ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing.
• Location: Singapore
SERVICES & VOLUNTEERS
• Reviewer:
– EMNLP (2023).
• Student Volunteer:
– EMNLP (2023).