Post Job Free
Sign in

Data Analysis Machine Learning

Location:
Ho Chi Minh City, Vietnam
Posted:
April 21, 2025

Contact this candidate

Resume:

Nguyen Phuc Thanh Danh

090-***-**** VN ****.*************@*****.***.** Linkedin DanhNguyennene (GitHub) EDUCATION

Ho Chi Minh City University of Technology (HCMUT)

Undergraduate Bachelor of Computer Science GPA: 3.0/4.0 Sept 2022 - May 2026

(Expected)

PROJECTS

Chart Data Extraction (Sponsored by HCMUT) Aug 2024 - Present

• Developed an Image-to-Text deep learning model for end-to-end data extraction from chart images, converting extracted data into structured JSON files for tabular representation.

• Data Analysis & Preprocessing: Conducted Exploratory Data Analysis (EDA) on open-source datasets (ChartQA, PlotQA, DVQA), managed chart-type balancing, augmented bounding boxes, and generated synthetic data for model enhancement.

• Technology Stack: Fine-tuned the pretrained Matcha model (Math Reasoning and Chart Derendering Pretraining) using PyTorch, later transitioning to PyTorch Lightning for improved training efficiency.

• Model Optimization: Frozen 40% of the vision encoder layers during training to enhance efficiency.

• Advanced Techniques: Integrated Exponential Moving Average (EMA) and Stochastic Weight Averaging (SWA) for improved training stability and performance.

• Distributed Training: Successfully implemented Distributed Data Parallel (DDP) in PyTorch before transitioning to PyTorch Lightning.

• Benchmark: Achieved F1 Score: 99% and TED: 78%.

• GitHub Repository: Link, Google Colab Notebook: Link, Dataset: Link, Presentation: Link Interpolation models July 2024 - August 2024

• Developed various interpolation models, including Least Squares Regression, Lagrange Interpolation, Chebyshev Polynomials, and Hermite Polynomials, leveraging Linear Algebra techniques.

• Technology Stack: Implemented interpolation algorithms exclusively using NumPy, applying mathematical formulations and numerical methods.

• Reference: Based on Crista Arangala - Linear Algebra with Machine Learning and Data (CRC Press, 2023).

• Google Colab Notebook: Link

Chinese MNIST - Digit Recognizer (Kaggle) June 2024 - July 2024

• Designed and implemented a deep learning model combining Convolutional Neural Networks (CNNs) with PyTorch Transformers to recognize Chinese digits using the Kaggle dataset.

• Optimization Techniques: Designed and implemented batch normalization, dropout, and multiple convolutional layers to enhance model performance.

• Benchmark: Achieved 96% accuracy.

• Google Colab Notebook: Link

TECHNICAL SKILLS

• Programming Languages: Python, C++, SQL, R, JavaScript

• Frameworks: Pytorch, Pytroch Lightning, TensorFlow.

• Libraries: Pandas, Mathplotlib, Transformers, OpenCV, NumPy, Spark

• Tools: Docker, Git, Nvim and Vim, Linux, MySQL

• English: Proficient (IELTS 6.5)



Contact this candidate