Khang Nguyen — Fresher Data Scientist
Thu Duc City, Ho Chi Minh City – Viet Nam
I 094******* • # ****************@*****.*** • § khang3004 — ï KhangDS Biography
I am a recently graduated Data Science student with a strong background in Mathematics, Statistics, Optimization, and Machine Learning. I build end-to-end data/ML pipelines grounded in rigorous math and practical engineering. My internship at Vietnam Silicon (May–Aug 2025) gave me hands-on experience in HR Intelligence and Agentic AI. Experience
Vietnam Silicon Company Ho Chi Minh City
Data Scientist Intern May 2025–Aug 2025
Project: Talent Market Intelligence System (team size: 4 members). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem: Traditional recruitment was time-consuming and inaccurate: HR spent 80% of time screening CVs manually, lacked objective evaluation vs. JD, no standardized scoring, no automated CV-JD matching, and fragmented candidate data. Goal: cut screening time by 70% and improve accuracy.
Solution:
Architecture: Microservices with API Gateway, Parsing, Extraction, and Benchmark services; async task distribution via RabbitMQ; Docker/Compose for scalable deployment.
AI/ML Pipeline: CV/JD parsing with Google Gemini + prompt engineering; FAISS + Sentence Transformers for semantic search; pgvector with HNSW for fast similarity search; multi-criteria scoring (Skills, Experience, Education, Language, Certificates).
Scoring System: Config-driven JSON rules; weighted algorithm (Skills 30%, Experience 30%, Education 15%, Language 15%, Certificates 10%); cosine similarity for skills matching; requirement-based logic for JD compliance.
Security: Enterprise-ready with Azure AD SSO, JWT, role-based authorization, secure file storage on S3. Results:
Reduced screening time by 75% (20 min/CV to 5 min/CV); batch-processed 100+ CVs in parallel.
Achieved 92% accuracy in skills matching; response time 3s for full pipeline.
Delivered multi-dimensional scoring (5 criteria, 100 pts each) with top-K ranking.
Ensured 99.5% uptime with health checks, optimized DB indexes, and robust error handling. Tech Stack: FastAPI, Python 3.11, SQLAlchemy, Pydantic, PostgreSQL + pgvector, Redis, AWS S3, FAISS, Sentence Transformers, spaCy, PyTorch, RabbitMQ, Docker/Compose, Azure AD, JWT. Extended Flow: ChatTMI Conversational Agent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Challenge: HR professionals still lacked a natural interface to interact with complex HR data. There was no AI assistant capable of reasoning, multi-turn memory, or real-time streaming for smooth conversations. Approach:
MCP-Based Agent: Built on Model Context Protocol; implemented 50+ MCP Prompts, Resources, Tools for HR operations; integrated with LangChain MCP Adapters; applied ReAct Agent pattern orchestrated with LangGraph.
Advanced AI: Leveraged Gemini 2.5 Flash as the core LLM with intelligent routing; memory-enabled conversations with MemorySaver; 24 Gemini API keys with automatic rotation for high availability; real-time streaming with SSE; 8+ customizable system prompt personalities.
Data Intelligence: Autonomous schema learning; semantic relationship discovery; recursive JSON parsing for complex structures; anomaly detection and data profiling.
Modern Conversational UI: Glass-morphism responsive PWA; progressive markdown rendering; interactive quick actions; tables and charts embedded directly in chat responses. Impact:
Fully automated HR database queries, eliminating manual SQL interactions.
Achieved sub-2s latency for real-time conversations with persistent multi-thread memory.
Delivered 50+ MCP tools, 24-API key failover rotation, 8+ conversation personalities, and 25+ production-ready endpoints.
Enhanced user experience with 60fps glass-morphism animations, mobile-first PWA design, and progressive markdown rendering.
1/2
Education
HUTECH University Ho Chi Minh City
Bachelor in Data Science, GPA: 3.7/4.0 2021–2025
Class President; Bronze Medalist at Vietnam Mathematical Olympiad 2023–2024 AI Vietnam Remote
AIO2023 Program 2023–2024
Intensive program on advanced AI/ML & Data Science. Ly Tu Trong Highschool for the Gifted students Can Tho City Mathematics (Gifted Program), GPA: 9.0/10.0 2018–2021 Encouragement Prize at Vietnam Mathematical Olympiad 2020 Selected Personal Projects
Booking.com Hotel Analytics GitHub. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem: Predict hotel review scores, segment the market, and classify quality tiers using multi-modal data
(text/image/metadata).
Solution: Designed a combined pipeline: ResNet18 for image features fused with tabular inputs for regression; unsupervised K-Means/DBSCAN for segmentation; multi-class quality classification with a stacking ensemble (SVM, KNN, Decision Tree, Random Forest Logistic Regression). Included Docker scripts for reproducible tasks and directory structure for raw/processed data, models, and results. Results: Regression — RMSE = 0.85, R2 = 0.78, MAE = 0.67. Classification — Accuracy = 0.84, F1 = 0.82, ROC–AUC = 0.89. Clustering — Silhouette = 0.76 with 3 optimal clusters. Tech Stack: Python, PyTorch, Scikit-learn, Pandas, Seaborn/Matplotlib, Docker. Vietnamese Financial Sentiment Analysis GitHub. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem: Classify sentiment of Vietnamese stock-market headlines and link to market movement signals. Solution: Full NLP pipeline with Vietnamese tokenization (VnCoreNLP), feature engineering, and hybrid ML/DL mod- eling (GaussianNB/LogReg/RandomForest/XGBoost and LSTM/BiLSTM/PhoBERT). Managed data and predictions with MongoDB; provided environment/Docker specs and out-of-sample evaluation protocol. Results: Best model (PhoBERT) — Accuracy 89.2%; BiLSTM/Ensemble 87.5%. Data splits reported: train 2, 151 rows; test 1, 668 rows; out-of-sample 365 rows with balanced labels. Tech Stack: Python, Transformers/PhoBERT, scikit-learn, XGBoost, MongoDB, Docker. Technical Skills
Programing Languages: Python (Intermediate, 3 yrs), R, Java & JavaScript (Basic) ML/DL: PyTorch, FAISS, scikit-learn, Transformers, Statsmodels, CVXOPT LLMs & NLP: A2A & MCP Protocols, LangChain, LangGraph, ReAct Agent, CrewAI, Ollama, NLTK Data Analysis & Manipulation: Pandas, NumPy, Visualization (Matplotlib/Seaborn) Big Data: Spark, Databricks, Kafka
Databases: PostgreSQL, MongoDB, MySQL, SQL Server, Milvus, Chroma Tools: Git, Docker, n8n, Figma, Streamlit, Postman, Trello/Notion, Excel, LATEX Accomplishments
Certifications: DeepLearning.AI – CNN Specialization; DataCamp – Data Scientist Professional with Python Scholarship: HUTECH Talent Scholarship Level 1 (2023–2024) Competition: HCM-AIC 2024 with team AIO-Chef, 2 Bronze Medals – Vietnam Mathematical Olympiad (2023, 2024) Languages
IELTS: Overall 6.0
2/2