Data Science & ML Engineer - NLP & MLOps Enthusiast

Location:

Hoa Khanh Tay, Long An, Vietnam

Posted:

May 06, 2026

Contact this candidate

Resume:

Bùi Thị Cẩm Ngoan

Data Science & Machine Learning Engineer

********@*****.*** Ho Chi Minh City, Vietnam github.com/ngoanxd Summary

Third-year Data Science student with strong foundation in mathematics, statistics, and Machine Learning. Ex- perienced in building end-to-end ML pipelines, fine-tuning transformer-based models, and developing NLP appli- cations. Skilled in handling imbalanced datasets, optimizing model performance, and preventing data leakage to ensure reliable and generalizable results.

Education

Ho Chi Minh City Open University Ho Chi Minh City, Vietnam Bachelor of Data Science (3rd Year Student)

Technical Skills

• Programming: Python, C++

• Machine Learning: Logistic Regression, Decision Tree, Random Forest, XGBoost

• Deep Learning: RNN, LSTM, GRU, Transformer

• NLP: BERT, PhoBERT, Text Classification

• LLM Engineering: Fine-tuning (LoRA, QLoRA), RAG Systems

• Data Analysis & Visualization:

– Excel: Data cleaning, pivot tables, exploratory data analysis

– Power BI: Building interactive dashboards and visualizing data insights

– Strong ability to analyze and interpret numerical data patterns

• Imbalanced Data Handling:

– SMOTE, undersampling techniques

– Class weighting and threshold tuning

– Evaluation metrics: F1-score, ROC-AUC, PR-AUC

• Data Leakage Prevention:

– Applied proper train-test split before preprocessing

– Prevented leakage in feature engineering and scaling

– Used cross-validation for reliable evaluation

– Identified unusually high accuracy as potential leakage signal

• Tools & Frameworks: PyTorch, HuggingFace Transformers, Scikit-learn, Pandas, NumPy

• Workflow & Coding Practices:

– Git, Jupyter Notebook

– Writing clean, readable, and maintainable code

– Structured problem-solving and experiment-driven development Projects

Cardiovascular Disease Prediction

• Built ML models: Logistic Regression, Random Forest, XGBoost

• Performed feature engineering and data preprocessing

• Handled imbalanced data and tuned classification threshold

• Prevented data leakage throughout training pipeline

• Evaluated models using F1-score and ROC-AUC

BERT for SMS Classification

GitHub

• Fine-tuned BERT model for text classification

• Improved model performance via hyperparameter tuning PhoBERT for E-commerce Comment Classification

GitHub

• Built Vietnamese NLP classification system

• Performed text preprocessing and tokenization

RAG Chatbot (Vietnamese Law)

GitHub

• Developed Retrieval-Augmented Generation pipeline

• Integrated vector database for context-aware responses Core Strengths

• Strong analytical and quantitative thinking

• Solid foundation in mathematics and statistics

• Data-driven problem-solving mindset

• Effective teamwork and communication

• Leadership and proactive initiative in projects

Languages

• Vietnamese: Native

• English: Basic communication, able to read technical documentation

Contact this candidate