Phạm Tấn Phước
MSSV: ********
082******* **************@*****.*** linkedin.com/in/tấn-phước-phạm-3363162b0 github.com/PhamTanPhuoc66
Summary
Detail-oriented Data Science student with a strong foundation in statistics, machine learning, and deep learning. Experienced in building end-to-end ML pipelines and computer vision systems. Passionate about applying data-driven methods to real-world problems and eager to grow as a Data Science Intern. Skills
Programming: Python, R, SQL, C/C++
Libraries/Frameworks: scikit-learn, PyTorch, TensorFlow, Pandas, NumPy, Matplotlib, dbt, Airflow Machine Learning: Regression, Classification, Clustering, PCA, Decision Tree, Random Forest, XGBoost, Deep Learning Statistical Methods: Hypothesis Testing, Confidence Interval, ANOVA, Regression Analysis Databases: SQL (PostgreSQL, MySQL), NoSQL (MongoDB, DynamoDB) MLOps & Cloud: MLflow, Docker, AWS, KServe, Kubernetes Education
University of Science (VNU-HCM) 2022 – Present
Bachelor of Science in Data Science
Certifications
– TOEIC Listening & Reading: 935/990 Speaking & Writing: 280/400
– Coursera Specializations: Data Analysis, Machine Learning, Deep Learning Projects
End-to-End Machine Learning & Deep Learning Data Pipeline
– Built a complete ML pipeline using Databricks, Airflow, and dbt within a Lakehouse (Bronze–Silver–Gold) architecture.
– Developed churn prediction and recommendation models using scikit-learn, Surprise, and PyTorch (collaborative filtering
& neural recommenders).
– Automated experiment tracking and model deployment with MLflow + KServe (auto-scaling to zero).
– Technologies: Databricks, Airflow, dbt, MLflow, KServe, PyTorch, scikit-learn, Surprise.
– § github.com/PhamTanPhuoc66/End-to-end-olist-project Body Performance Analysis Project
– Analyzed a 13k-record fitness dataset to study correlations between body metrics and performance levels (A–D).
– Conducted EDA, feature engineering (fitness_score, pulse_pressure), visualization, and statistical testing (t-test, ANOVA, permutation).
– Trained and compared several classification models (Naive Bayes, LDA, Decision Tree, Random Forest, XGBoost, etc.); Random Forest performed best.
– Technologies: R, tidyverse, ggplot2, caret, randomForest, xgboost, corrplot. Real-Time Face Recognition & Attendance (YuNet + PCA)
– Developed a real-time face recognition system using YuNet (OpenCV) for detection and PCA (Eigenfaces) + KNN for recognition.
– Implemented FastAPI WebSocket streaming with async processing, batching, and multi-client support for low-latency inference.
– Designed modular backend and integrated YOLOv8 (FP16, CUDA) for optional object detection demo.
– Technologies: FastAPI, asyncio, OpenCV, PCA, KNN, YuNet, YOLOv8, WebSocket.