HOA NGUYEN THI THANH
Data Scientist
Ho Chi Minh, Viet Nam
adxsjf@r.postjobfree.com
github.com/huuminh365
kaggle.com/nguynththanhho
Work experience
Researcher Feb. 2023 — May. 2023
• Collection, processing and analysis data.
• Research language modeling.
Education
Industrial University of Ho Chi Minh City 2019 — 2023 Bachelor of Science in Data Science GPA: 3.35/4.0
Skills
Technical skills
• Programming languages: Python, C++, Java.
• System: Windows, Linux.
• Tools: Github, MongoDB, pgAdmin4, DBeaver, Microsoft Office.
• Proficiency in using machine learning toolkits on data analyzing and modeling: Pandas, Numpy, SciPy, Scikit-Learn, Sympy
Data science skills
• Research.
• Statistics, Mathematics, Data Visualization.
• Deep learning(CNN, LSTM, RNN, Transformer, Conformer)
• Machine Learning(Linear Regression, Logistic Regression, Decision Tree, Random Forest...)
• Data structure and algorithm.
• Crawl data using Beautiful Soup and Selenium.
• Big Data: Firebase, Apache Spark, Hadoop.
Communication
• Vietnamese (Native), English (strong reading, listening, and intermediate writing and speaking). Projects
Diagnosis of breast cancer. Dec 2022 - May 2023
Project Computer Vision - link
• Leader
• Framework: pytorch.
• Access to cancer diagnosis from mammography using deep learning. Depth estimation from a single image using CNN with Fully Connected. Sep 2022 — Oct 2022 Project Computer Vision - link
• Leader
• Framework: pytorch.
• Using CNN with a Fully Connected model to solve the problem of estimating depth from a single image.
• Results were obtained using MSEloss: 16 for the validation and 21 for the training set. Spelling Correction May 2022- Aug 2022
Project NLP - link
• Leader
• Framework: pytorch.
• Crawling and processing data with Beautiful Soup and Selenium.
• Using the Transformer model.
• Averages a BLEU score of 74% on the test set and an accuracy of 97% on the validation set. Anomaly Detection Dec 2021- Mar 2022
YSC 2021 Eureka 2021, JST 2022
• Co-author
• Data analysis and visualization.
• Successfully applied Hypothesis Testing to an unsupervised Autoencoder for an Anomaly Detection task.
• Experiment on NSL-KDD Dataset. Achieved accuracy: 96% A Novel Approach for Vietnamese Speech Recognition Using Conformer. Aug 2022- Nov 2022 Springer 2022 Eureka 2022, FDSE 2022
• Co-author
• Model Conformer-CTC
• 115 hours Speech Training Dataset: Vivos dataset (15 hours) and VLSP 2020 ASR Corpus (100 hours).
• 0.45 hours testing dataset: Vivos dataset
• Result: WER 20% in test dataset.
Awards & Honors
Third Prize
ACM - ICPC Vietnam National Round. 2022
Scholarships
Industrial University of Ho Chi Minh City 2019-2022 Consolation prize
TDMU Entropy Data Analytics data mining competition. Apr - 2021 Activities
Co-administrator of AI Club IUH 2021 - Present
Industrial University of Ho Chi Minh City. Ho Chi Minh City Mentor of AI Club 2021 - Present
• Mentor Machine Learning fundamentals of AI Club IUH. (2022)
• Mentor Python fundamentals for K16 K17 of AI Club IUH. (2021)