Bùi Thị Cẩm Ngoan
Data Science & Machine Learning Engineer
********@*****.*** Ho Chi Minh City, Vietnam github.com/ngoanxd Summary
Third-year Data Science student with strong foundation in mathematics, statistics, and Machine Learning. Ex- perienced in building end-to-end ML pipelines, fine-tuning transformer-based models, and developing NLP appli- cations. Skilled in handling imbalanced datasets, optimizing model performance, and preventing data leakage to ensure reliable and generalizable results.
Education
Ho Chi Minh City Open University Ho Chi Minh City, Vietnam Bachelor of Data Science (3rd Year Student)
Technical Skills
• Programming: Python, C++
• Machine Learning: Logistic Regression, Decision Tree, Random Forest, XGBoost
• Deep Learning: RNN, LSTM, GRU, Transformer
• NLP: BERT, PhoBERT, Text Classification
• LLM Engineering: Fine-tuning (LoRA, QLoRA), RAG Systems
• Data Analysis & Visualization:
– Excel: Data cleaning, pivot tables, exploratory data analysis
– Power BI: Building interactive dashboards and visualizing data insights
– Strong ability to analyze and interpret numerical data patterns
• Imbalanced Data Handling:
– SMOTE, undersampling techniques
– Class weighting and threshold tuning
– Evaluation metrics: F1-score, ROC-AUC, PR-AUC
• Data Leakage Prevention:
– Applied proper train-test split before preprocessing
– Prevented leakage in feature engineering and scaling
– Used cross-validation for reliable evaluation
– Identified unusually high accuracy as potential leakage signal
• Tools & Frameworks: PyTorch, HuggingFace Transformers, Scikit-learn, Pandas, NumPy
• Workflow & Coding Practices:
– Git, Jupyter Notebook
– Writing clean, readable, and maintainable code
– Structured problem-solving and experiment-driven development Projects
Cardiovascular Disease Prediction
• Built ML models: Logistic Regression, Random Forest, XGBoost
• Performed feature engineering and data preprocessing
• Handled imbalanced data and tuned classification threshold
• Prevented data leakage throughout training pipeline
• Evaluated models using F1-score and ROC-AUC
BERT for SMS Classification
GitHub
• Fine-tuned BERT model for text classification
• Improved model performance via hyperparameter tuning PhoBERT for E-commerce Comment Classification
GitHub
• Built Vietnamese NLP classification system
• Performed text preprocessing and tokenization
RAG Chatbot (Vietnamese Law)
GitHub
• Developed Retrieval-Augmented Generation pipeline
• Integrated vector database for context-aware responses Core Strengths
• Strong analytical and quantitative thinking
• Solid foundation in mathematics and statistics
• Data-driven problem-solving mindset
• Effective teamwork and communication
• Leadership and proactive initiative in projects
Languages
• Vietnamese: Native
• English: Basic communication, able to read technical documentation