Hoang Mai, Hanoi
***********@*****.***
Nguyen Huu Tam
Data Engineering
Portfolio: github.com/tamkadin
kaggle.com/tamkadin
PERSONAL INFORMATION
Birthplace: Duc Linh Commune, Vu Quang District, Ha Tinh Province Date of Birth: 20/06/2003
Address: Hoang Mai District, Hanoi
Hobbies: Football, watching movies, cafe music, technology, traveling... EDUCATION
Math 1, Ha Tinh High School for Gifted Students 2018 - 2021 Bachelor of Computer Science (IT1), School of Information and Communication Technology, Ha Noi University of Science and Technology 2021 - Current
CPA: 3.42
EXPERIENCES
Teaching Assistant, School of Information and Communication Technology, Ha Noi University of Science and Technology 2024 Intern Software, Samsung R&D Center (SRV) 6/2024 - 8/2024
• A division of Samsung Vietnam focusing on mobile platform development, AI, and software solutions for Galaxy devices and IoT integration.
Intern AI, VNPT AI 8/2024 - 11/2024
• Participated in AI training programs and hands-on projects on anomaly detection models. Intern Data Engineering, STS, Openasia Group 11/2024 - 05/2025
• Data Department in an in-house IT service company. Mostly focused on high-end fashion and luxury accessories retail such as Hermès, Bottega Veneta, and Tam Son Fashion.
Data Engineering, Viettel Digital Talent Program, Viettel Group 04/2025 - Current
• An elite training program designed by top Viettel experts to empower young talents in domains such as Cloud, Cybersecurity, Data Science AI, IoT, 5G, and Software Data Engineering. Participated in advanced courses and real-world projects focused on large-scale data infrastructure and analytics.
RESEARCH
Research at, Artificial Intelligence and Big Data Laboratory, SOICT 02/2024 – Present Research paper, Prioritizing Cancer Therapeutic Genes Using BioRank Submitted to CSBJ (Elsevier), Under Review ACHIEVEMENTS AND EXTRACURRICULAR ACTIVITIES
Second Prizes, Ha Tinh Provincial Contest in Mathematics and Informatics 2019, 2020, 2021 Participated in the VNOI 2019 reserve team 2018–2019 Achieved the title "STUDENT WITH FIVE GOOD CRITERIA", University level 2021–2022 Achieved the title "STUDENT WITH FIVE GOOD CRITERIA", University level 2022–2023 Member of Management Board, Vice Head of Communication, Volunteer Team 10/1985, Hanoi University of Science and Technology Students’ Association 2022–2023
Second Prize, University Level Scientific Research for Students 2023 Volunteer in "Green Summer Campaign 2023", Dinh Hoa District, Thai Nguyen Province 2023 Top 25 in "Young Creativity 2024" Competition 2024 Vietcombank Enterprise Scholarship 2025
ABILITY
Technologies and Tools Python, Git, Docker, MySQL, ClickHouse, dbt, Airbyte, Dagster, Databricks, WSL. Base Knowledge of Kafka, Hadoop, Spark, Flink, C/C++. Knowledges and Skills ELT/ETL, Data warehouse, Data lake, Data ingestion, SQL. Language English
Hoang Mai, Hanoi
***********@*****.***
Nguyen Huu Tam
Data Engineering
Portfolio: github.com/tamkadin
kaggle.com/tamkadin
PROJECTS
Data Analytics Platform
- Role: Member
– Designed and implemented a data lakehouse architecture using ClickHouse.
– Built data pipelines using Airbyte to ingest data from various sources.
– Utilized Dagster for orchestrating data workflows and managing ELT processes.
– Developed DBT models to transform raw data into structured formats suitable for analysis.
- Skills Used: Airbyte, ClickHouse, Dagster, DBT, Python, ELT. Data Product Tools
- Role: Member
– Developed a data extraction and transformation pipeline to convert brand invoices into ERP-compatible formats.
– Built a Flask-based API for handling PDF data extraction using the invoice2data Python library.
– Integrated ML models to predict missing fields in ERP invoices based on extracted data.
– Designed a modular transformation framework to handle different brand invoice formats.
– Integrated MySQL for structured data storage and optimized queries for efficient data retrieval and processing.
– Automated data processing pipelines for handling large invoice datasets.
- Skills Used: Python, Flask, MySQL, Machine Learning, invoice2data, Pandas, ETL. Customer 360 Data Platform (C360)
- Role: Individual
– Designed and implemented a lakehouse architecture to support Customer 360 use cases.
– Built a real-time data pipeline using Apache Flink to process event streams and customer interactions.
– Used Apache Hudi to store transactional customer data in an optimized lakehouse format.
– Implemented ELT pipelines to integrate batch and streaming data from multiple sources into a unified customer profile.
- Skills Used: Apache Flink, Apache Hudi, Apache Paimon, Kafka, MinIO, PyFlink, ELT, Lakehouse Architecture. Movie Data Storage, Processing & Analysis System
- Role: Individual
– Developed a real-time data pipeline for collecting, processing, and analyzing movie-related data from The Movie Database API.
– Built a scalable data architecture using Kafka, Spark, MongoDB, and Elasticsearch for efficient data storage and real-time processing.
– Processed streaming data with Apache Spark and stored cleaned data in MongoDB and Elasticsearch for further analysis.
– Visualized movie trends, budgets, and popular actors using Kibana to explore and present data insights.
- Technologies Used: Apache Kafka, Apache Spark, MongoDB, Elasticsearch, Kibana, Docker, Python. Image Caption Generator
- Role: Leader
- Automatically generate captions for images to assist the visually impaired and enhance image search systems using natural language.
- Skills Used: LLM (Vision Transformer), Deep Learning (CNN). Test-taking Application for Students
- Role: Member
- Used for exams, tests, and student practice.
- Required handling of exam formats, data storage, and strong security.
- Anti-cheating in exams with face recognition system.