Bùi Đức Nhân
AI/ML Engineer
Profile
***************@*****.***
Linkedin:
https://www.linkedin.com/in/duc-nhan-
bui-a25648237/
Github: https://github.com/Narius2030
Thu Duc city, Ho Chi Minh city
Skills
Programming Languages
Python, Java, C#, C++, OOP
Data Modelling
SQL Server, Relational Database (SQL), SSAS
Machine Learning/Deep Learning
Scikit-learn, Tensorflow, PyTorch, Google Colab,
Kaggle, Regression, Classification, Clustering, NLP
(text generation, text classification, ...), Computer Vision (object detection)
Data Manipulation and Visualization
Numpy, Matplotlib, Pandas, Power BI, Excel
Data Pipeline and Collection
Apache Airflow, BeautifulSoup, SSIS
Big Data
Apache Hadoop, Apache Hive
Mathematics
Probability and Statistics, Linear Algebra
Objective
After graduation, I want to work officially as AI/ML Engineer experts who will deal with machine learning problems like Computer Vision or NLP, and implement on practical products such as Web apps or IoT devices. Futhermore, I passionate to process data, erect data pipeline and storage. Education
HCMC University of Technology and Education 2021 - 2025 Data Engineering
GPA: 8.61
Certifications
09/2023 IELTS 6.0 Certificate of IDP Vietnam
08/2022 SQL (Intermediate) Certificate of Hackerank 03/2024 Supervised ML: Regression and Classification of Coursera Projects
Applying Artificial Neural Networks to Build Vietnamese Text Generation Models as Part of the Generative AI Problem Team: 1
(Individual Project)
4/2024 - 5/2024
- I developed a large language-based model for automatically generating text. I collected news data in various genres from VnExpress website using BeautifulSoup and preprocessed the data using NLP techniques. This includes extracting sentences, determining the meaning of related phrases, building a corpus, and generating input sequences using the N- GRAM method.
- I implemented Deep Learning architecture with embedding layers and LSTM, leveraging TensorFlow and Keras libraries for model development and evaluation. I embeded the model to the website using Streamlit framework.
Source: https://github.com/Narius2030/Vietnamese-Text- Generator.git
Applying Deep Learning and Machine Learning to Build Text Classification Model and Clustering Algorithm to Automatically Classify News' genre and Search Similar News Team: 1
(Individual Project)
4/2024 - 5/2024
- I implement data pipeline using Airflow to schedule news data collection process from VnExpress, I use BeautifulSoup to scrape. I apply several cleaning techniques for dataframe and text like. I embeded the model to the website using Streamlit framework.
English
IELTS 6.0 (9/2023)
Other
Streamlit, NET Framework, Ubuntu
Desktop/Server, Git
Activities
GDSC member 2022 - 2025
Google Developer Student Club Ho Chi Minh UTE
In this club, I usually take part in creating webinars as logistic department member. Those webinars
discuss about modern and trending technologies in
many fields such as Web, Cloud or Data. We also
host many academic activities like Hackathon,
BeCoder or CTF.
Strength
• Teamwork and Planning
• Time Management
• Hard working and curious
• Creative
g
- In Text Classification task, I apply natural language pre-processing techniques firstly to normalize text like including removing punctuation, stop words, and symbols, combining meaningful Vietnamese words, reformatting text, encoding words, and creating a corpus. Then I design a neural network using LSTM and Hybrid (CNN, LSTM) as to learn the features of each article.
- In the Text Clustering task, I implement Word2Vec (skip-gram) to discover the relationships among words. Then, I embed word for whole articles and calculate the mean vectors which represent for embedded ones. Finally, I calculate the Cosine to estimate the similar among articles so that I can cluster and search similar articles into groups. Source: https://github.com/Narius2030/Vietnamese-Text- Classification-and-Clustering
Building The Movie Recommender System by Content-based method Team: 2
2/2024 - 5/2024
- I use TF-IDF technique (Item profile) to define the important level of each category (words) in the numrical values through the whole movies
(documents) on movie's content. I apply PCA and highly-correlated elimination methods to reduce dataset dimension.
- This RS is based on Content-based method. I use Cosine Similarity for measuring the level of similarity among movies. Besides, I use Rigde model, a regression model, for learning the rating of users and figure out the User profile, which is W parameter matrix and the bias b. Then, I filter top N highest rated movies for that user.
Source: https://github.com/Narius2030/Recommendation- System.git
© topcv.vn