Nguyen Van Toan
Data Engineer
037*-***-*** · ***************@*****.*** · District 12, HCM City · github.com/Vantoan252003 CAREER OBJECTIVE
Developer with 1+ year of production experience building mobile applications, now focused on data engineering. Designed and built a real-time fraud detection pipeline end-to-end — Kafka streaming, PySpark structured streaming with 3-layer detection, Delta Lake storage, and Airflow batch automation — all containerised with Docker and deployed to Oracle Cloud via GitHub Actions CI/CD. Seeking a Data Engineer (Fresher/Junior) role to apply these skills at scale. KEY SKILLS
Streaming Apache Kafka · Zookeeper · Kafdrop
Batch processing PySpark Structured Streaming · PySpark MLlib Orchestration Apache Airflow · DAG authoring
Storage PostgreSQL · Delta Lake · MinIO (S3-compatible) DevOps Docker · Docker Compose · GitHub Actions CI/CD · Oracle Cloud deploy Languages Python · SQL · Jupyter Notebook
Mobile (prior) Flutter/Dart · Bloc · WebSocket · Firebase · Spring Boot · Redis WORK EXPERIENCE
Flutter Developer – Logistics Super App 02/2025 – Present di4l.vn
Built and shipped a production logistics Super App (ride-hailing, food delivery, parcel) with 3 user interfaces: Customer, Vendor, Driver.
Implemented real-time GPS order tracking via WebSocket and Google Maps SDK; integrated Firebase Auth, Firestore, and FCM.
Applied Bloc/Cubit state management; optimised widget rebuild cycles to reduce UI jank on mid-range Android devices.
Maintained Spring Boot backend APIs and Redis caching layer shared with the mobile team. PERSONAL PROJECTS
Real-Time Fraud Detection Pipeline Python · PySpark · Kafka · Airflow · Delta Lake · MLflow · Docker
github.com/Vantoan252003/BigDataFraudDetection
Ingests 6.3M PaySim financial transactions via a Kafka producer at 100 tx/s into the transactions-data topic; monitors lag and offsets with Kafdrop.
3-layer PySpark Structured Streaming detection: Layer 1 — Redis blacklist lookup, Layer 2 — rule engine
(TRANSFER/CASH_OUT), Layer 3 — Random Forest model loaded from MLflow registry.
Dual-write sink: flagged transactions written to PostgreSQL for live dashboard queries and to Delta Lake (MinIO) in Parquet with ACID guarantees and time-travel versioning.
Fraud alerts published to a dedicated Kafka fraud-alerts topic for downstream consumers.
3 Airflow DAGs: blacklist_daily (reloads Redis), model_retrain (logs F1/AUC-ROC to MLflow), delta_compaction (runs OPTIMIZE
+ VACUUM on MinIO).
Streamlit dashboard — real-time fraud feed, transaction explorer with filter/pagination, Plotly histograms for fraud type analysis.
Fully containerised with Docker Compose (8 services: Zookeeper, Kafka, Kafdrop, PostgreSQL, MLflow, MinIO, Streamlit, Airflow); CI/CD via GitHub Actions Oracle Cloud auto-deploy. Pianex – Piano Learning App with AI Flutter · Spring Boot · Python · MySQL · Docker
Google Play Store
Sole developer of a published piano learning app: Flutter frontend, Spring Boot backend, Python pitch-detection service — real- time AI feedback with <100 ms latency.
Gamification: EXP system, daily streaks, global leaderboard, user-uploaded sheet music with community ratings.
Managed full Play Store release pipeline: versioning, signing, staged rollout, Shorebird OTA hotfixes.