Data Scientist, AI Engineer

Location:

Quan 1, 710000, Vietnam

Posted:

February 05, 2023

Contact this candidate

Resume:

Phan Quốc Long Data Scientist, AI Engineer/Researcher

***********@*****.***

**/**, ***** **** ***, Tan Phu, Ho Chi Minh City

Vietnam

https://github.com/phqlong

+848********

****/**/**

https://www.linkedin.com/in/phqlong/

https://orcid.org/0000-0003-2205-8812

Profile

I'm a Data Scientist and AI Engineer/Researcher with a passion for developing innovative solutions and delivering value to organizations. I have a strong foundation in Maths, Data Analysis, Machine Learning, Deep Learning, and especially NLP, and my ultimate goal is to become an expert and future thought leader in the eld of AI. In this rapidly changing era, my enthusiasm and positive attitude drive my pursuit of continuous learning to expand my skills and use my expertise to tackle real-world problems and make a meaningful impact. Education

2018 – 2022

Ho Chi Minh City,

Vietnam

Bachelor of Engineering - Major: Computer Science

Ho Chi Minh University of Technology (HCMUT) - Vietnam National University, HCM GPA: 8.26/10.0 (Very-good Degree), with excellent academic achievements:

•Machine Learning Course: 9.5/10.0

•Natural Language Processing Course: 9.5/10.0

•Graduation Thesis: 9.93/10.0

Professional Experience

2022/05 – 2022/11

Ho Chi Minh City,

Vietnam

Data Scientist Fresher

ZingMP - Zalo - VNG Corporation

•Main responsibilities:

•Data Mining and Collection: gather data from various sources, such as user log data, metadata of songs, and playlists in Zing MP3, work mainly on PySpark and Pandas.

•Processing and analyzing data to uncover insights and trends that could be leveraged to improve the platform.

•Building Item Embeddings for Recommendation Systems: applying collaborative ltering approaches to better understand user preferences and recommend relevant artists.

•Network Analysis and Graph ML: build Artist Network, and utilize graph ML algorithms to analyze relationships between artists and recommend similar artists to users.

•Recognition and Gains:

•Improving User Experience: By implementing solutions in the similar song recommender system, reducing the skip rate gure as seen in AB testing.

•Acquiring Valuable Skills: gain hands-on experience and deep knowledge of ML/AI techniques and big data processing used in industrial, dynamic, and fast-paced environments.

2021/06 – 2021/09

Ho Chi Minh City,

Vietnam

Odoo developer - Internship

NOVOBI Vietnam

•Understanding the Business Procedure, RMA process, and Python - Odoo Framework.

•Developing an Odoo Project: create and automate the RMA process in business. Create custom modules and extensions in Odoo, as well as integrate into existing systems.

•Understanding Agile/SCRUM and QaS: ensure that the project is delivered on time and to a high standard of quality.

Certificates

ETS TOEIC Listening and Reading

2022/04

IIG Vietnam

Scoring: 900/990

Natural Language Processing Specialization.

2021/10 - 2022/03

Coursera & DeepLearning.AI

It includes 4 courses certi cates: NLP with Classi cation and Vector Spaces, NLP with Probabilistic Models, NLP with Sequence Models and NLP with Attention Models Publications

2022/03 Vietnamese Sentence Paraphrase Identi cation Using Sentence-BERT And PhoBERT

International Conference on Intelligence of Things - ICIT 2022 Publisher: Springer

Number of Authors: 5

Role: First/Corresponding Author.

This publication developed Vietnamese SBERT model using PhoBERT in combination with Sentence-BERT architecture. Then we conducted the evaluation on the sentence paraphrase identi cation task. This paper is accepted by ICIT2022 and published by Springer with DOI: 10.1007/978-3-031-15063-0_40

Huggingface: https://huggingface.co/keepitreal/vietnamese-sbert Awards

2022/11 Second rank in 2022 VLSP - EVJVQA challenge Association for Vietnamese Language and Speech Processing (VLSP) Multilingual Visual Question Answering: EVJVQA Challenge - multilingual English- Vietnamese-Japanese Visual Question Answering (corpus released by VNUHCM-UIT) My team has put immense e ort into building a solution for Multilingual Visual Question Answering problem. Finally, We gained 2nd position amongst 60 teams with a slightly lower result than the 1st team.

Projects

2023/01 Meta Learning: Few-Shot Classi cation on Omniglot Dataset Research project

•Dataset: Omniglot is a dataset containing a large number of handwritten characters from various alphabets.

•This project develops a MAML (Model-Agnostic Meta-Learning) model, a meta learning algorithm, to perform few-shot image classi cation on the Omniglot dataset, where it will be given a small number of examples (k shots) from n classes (n ways) and must learn to generalize from these examples to classify new instances. It does this by learning an initialization of the model's parameters that can be quickly adapted to new tasks.

•Utilized PytorchLightning for simplify code, integrate with Hydra for con g management, WandB for logger, Higher for supporting higher-order optimization and Torchmeta for many datasets and benchmarks in meta learning.

2022/12 Multi-class disease-type prediction using Multi-omics data Research project

•This project focuses on multi-class disease-type prediction using multi-omics data. The goal is to classify 5 di erent disease classes and 1 healthy control class based on a multi- view dataset that consists of 9 groups of features, each feature group has high-dimensional data and potential noise.

•Multi-view learning approaches are adopted to address the challenges posed by the multi- dimensional dataset. A transformation-based method, speci cally the Multi-view Attention

+ KNN Pooling approach (Tianle et al., 2019), is applied in the project. The method uses a KNN attention pooling layer to analyze the neighborhood of each sample.

•The results show that the ensemble model approach outperforms single model solutions and that the transformation-based method has potential for future development. 2022/10 – 2022/11 En-Vi-Ja Visual Question Answering - VLSP-2022 Second prize in EVJVQA Task on 2022 VLSP Competition

•Team size: 3

•Based on an image and a question about it, we built Multilingual Visual Question Answering - mVQA system that can predict correct answers in 3 languages respectively to its question language.

•We have experimented with some Multilingual models such as mT5, mBERT, or XLM- Roberta model and used the VIT model as the main image feature extractor. We then ne- tuned on train set evaluating BLEU and F1 scores. Also, we have tried image augmentation and warm-starting encoder-decoder techniques. 2022/07 – 2022/11 ZingMP3 Music Artist Network

Role: Data science team member

•Collecting, processing, and doing lots of EDA for data from Google, Spotify, and ZingMP3 to gain insights about data for building co-listened artists.

•Then, building an Artist Network and doing a lot of research for leveraging the power of GNN to give predictions on related artists. Evaluating using many metrics and testing on production.

2021/08 – 2022/05 Plagiarism Detection System - KeepItReal Thesis, Research-oriented, Application Project - Role: Team leader

•Overview:

•The team consists of 3 members working on a web application to detect plagiarism in documents. The app is designed for students and instructors at HCMUT who are looking to avoid plagiarism in their educational environment.

•This is our thesis project with the goal of raising awareness about plagiarism among students throughout Vietnam.

•Front-end: ReactJS, Axios, and Redux Toolkits.

•Back-end: Django, integrated with JWT, OAuth2 for authentication and authorization.

•Database: MongoDB and PostgreSQL, allowing for e cient storage and retrieval of data.

•Advanced machine learning, deep learning, and natural language processing techniques are used to enhance the e ciency of plagiarism detection.

•The core of the plagiarism detection system is composed of 4 main modules:

•Pre-processing: read and prepare the data for analysis from documents.

•Candidate Retrieval: retrieve a pool of potentially suspicious documents from both online sources (using Bing search) and an o ine database.

•Exhaustive Comparison and Analysis: This module performs an exhaustive comparison and analysis of the documents, using techniques such as SBERT for semantic similarity detection in English or Vietnamese and string-based techniques like N-grams for lexical and syntactic similarity detection.

•Post-processing: Combining the results from the previous steps and providing evidence for the detection of plagiarism.

Contact this candidate