NGUYEN TIEN TOAN
Data Engineer
+Phone: 038*******
+Email: ******************@*****.***
+Github: https://github.com/nguyentientoanhaui
About
Leverage my knowledge in data and
programming skills in a practical environment,
aiming to learn, grow, and make meaningful
contributions as a data engineer by supporting
efficient data processing and adding value to
the organization.
Skills & Competencies
I. Project
1. Kafka to Hadoop
Description: This project simulates the collection and processing of sales transaction data for Company A from various branches across the country. It uses Apache NiFi, Apache Kafka, and Hadoop HDFS to build a real-time data storage pipeline. Simulated Scenario: Collect sales transaction data from branches and integrate real-time flight data from the Aviationstack API, allowing continuous processing and providing timely information to support business decision-making. Technologies Used:
• Apache NiFi
• Apache Kafka
• Hadoop HDFS
• MongoDB
• Aviationstack API.
Outcome: Provide a continuous and efficient data processing solution to support business decisions based on big data analytics.
Github: https://github.com/nguyentientoanhaui/Kafka-to-Hadoop MySQL, MongoDB
English: Upper Intermediate)
Ubuntu operating system
2. Predict results with continuous data streaming
Description: Developed a real-time credit card fraud detection system in a simulated transaction environment. Transaction data is continuously streamed to Apache Kafka, where an AI model detects potential fraudulent activities with high accuracy. Technologies Used:
• Data Streaming: Apache Kafka
• Machine Learning: AI model for fraud detection
• Storage: MongoDB for stable data storage
• Data Pipeline: Real-time data processing and integration with Kafka Outcome: Achieved real-time detection and secure storage of fraudulent transaction predictions, enabling timely insights and integration with other systems for enhanced business decision-making.
Github: https://github.com/nguyentientoanhaui/Predict-results-with-continuous-data- Streaming-
3. Data Crawling
Description: Developed a data crawling system to collect product data from the ShopeeFood API for market analysis.
Technologies Used:
• Programming: Python
• Libraries: Requests, Pandas
• Storage: MongoDB
• Multithreading: Improved data collection performance
• Scheduling: Set up a weekly process to push data from MongoDB to Hadoop Outcome: Collected and stored thousands of product details from ShopeeFood, providing a database for analysis and business decision-making. Github: https://github.com/nguyentientoanhaui/data_crawling II. Certifications
• SIC (Samsung Innovation Campus for Big Data)
• TOEIC: 735
III. Education
Hanoi University of Industry, B.Sc. in Information Technology (2021-2025)