Post Job Free
Sign in

Data Engineer Real-Time Analytics Specialist

Location:
Long Hoa, Vinh Long, Vietnam
Posted:
June 04, 2026

Contact this candidate

Resume:

Nguyen Van Toan

Data Engineer

037*-***-*** · ***************@*****.*** · District 12, HCM City · github.com/Vantoan252003 CAREER OBJECTIVE

Developer with 1+ year of production experience building mobile applications, now focused on data engineering. Designed and built a real-time fraud detection pipeline end-to-end — Kafka streaming, PySpark structured streaming with 3-layer detection, Delta Lake storage, and Airflow batch automation — all containerised with Docker and deployed to Oracle Cloud via GitHub Actions CI/CD. Seeking a Data Engineer (Fresher/Junior) role to apply these skills at scale. KEY SKILLS

Streaming Apache Kafka · Zookeeper · Kafdrop

Batch processing PySpark Structured Streaming · PySpark MLlib Orchestration Apache Airflow · DAG authoring

Storage PostgreSQL · Delta Lake · MinIO (S3-compatible) DevOps Docker · Docker Compose · GitHub Actions CI/CD · Oracle Cloud deploy Languages Python · SQL · Jupyter Notebook

Mobile (prior) Flutter/Dart · Bloc · WebSocket · Firebase · Spring Boot · Redis WORK EXPERIENCE

Flutter Developer – Logistics Super App 02/2025 – Present di4l.vn

Built and shipped a production logistics Super App (ride-hailing, food delivery, parcel) with 3 user interfaces: Customer, Vendor, Driver.

Implemented real-time GPS order tracking via WebSocket and Google Maps SDK; integrated Firebase Auth, Firestore, and FCM.

Applied Bloc/Cubit state management; optimised widget rebuild cycles to reduce UI jank on mid-range Android devices.

Maintained Spring Boot backend APIs and Redis caching layer shared with the mobile team. PERSONAL PROJECTS

Real-Time Fraud Detection Pipeline Python · PySpark · Kafka · Airflow · Delta Lake · MLflow · Docker

github.com/Vantoan252003/BigDataFraudDetection

Ingests 6.3M PaySim financial transactions via a Kafka producer at 100 tx/s into the transactions-data topic; monitors lag and offsets with Kafdrop.

3-layer PySpark Structured Streaming detection: Layer 1 — Redis blacklist lookup, Layer 2 — rule engine

(TRANSFER/CASH_OUT), Layer 3 — Random Forest model loaded from MLflow registry.

Dual-write sink: flagged transactions written to PostgreSQL for live dashboard queries and to Delta Lake (MinIO) in Parquet with ACID guarantees and time-travel versioning.

Fraud alerts published to a dedicated Kafka fraud-alerts topic for downstream consumers.

3 Airflow DAGs: blacklist_daily (reloads Redis), model_retrain (logs F1/AUC-ROC to MLflow), delta_compaction (runs OPTIMIZE

+ VACUUM on MinIO).

Streamlit dashboard — real-time fraud feed, transaction explorer with filter/pagination, Plotly histograms for fraud type analysis.

Fully containerised with Docker Compose (8 services: Zookeeper, Kafka, Kafdrop, PostgreSQL, MLflow, MinIO, Streamlit, Airflow); CI/CD via GitHub Actions Oracle Cloud auto-deploy. Pianex – Piano Learning App with AI Flutter · Spring Boot · Python · MySQL · Docker

Google Play Store

Sole developer of a published piano learning app: Flutter frontend, Spring Boot backend, Python pitch-detection service — real- time AI feedback with <100 ms latency.

Gamification: EXP system, daily streaks, global leaderboard, user-uploaded sheet music with community ratings.

Managed full Play Store release pipeline: versioning, signing, staged rollout, Shorebird OTA hotfixes.



Contact this candidate