HUNG Q. NGUYEN
Ho Chi Minh City, VN ********@*****.*** 094******* https://github.com/hunglk25
EDUCATION
University of Information Technology(UIT) Ho Chi Minh, VN Major in Computer Science Aug 2022 – June 2026
TECHNICAL SKILL
● Programming Languages: Python, SQL, C++
● Database Systems: MySQL, PostgreSQL, Cassandra
● Big Data and Data Processing: Apache Airflow, Apache Kafka, PySpark, ETL
● Machine Learning Libraries: scikit-learn, TensorFlow
● Cloud & Storage Solutions: Amazon S3
● Containerization & Orchestration: Docker, Docker Compose
● Messaging & Coordination: Kafka
ACADEMIC PROJECT
Spark data streaming
● Github: Spark Data Streaming
● Tech stack: Python, Apache Kafka, Apache Zookeeper, Apache Spark, Docker, Amazon S3
● Description: Simulated vehicle data generation to test a real-time data pipeline. Utilized Apache Kafka for data streaming and Zookeeper for broker management, achieving high-throughput, low-latency processing.
● Impact: Demonstrated the ability to handle high-velocity data and reliably store processed data in Amazon S3, showcasing data engineering skills in streaming, fault tolerance, and cloud storage. Blockchain data streaming
● Github: Blockchain Data Streaming
● Tech stack: Apache Airflow, Python, PostgeSQL, Docker, Amazon S3
● Description: Orchestrated blockchain data ingestion from Coingecko API using Airflow for workflow management. Leveraged Docker Compose for efficient deployment, enabling scalable, reproducible environments.
● Impact: Increased automation and reduced manual processing time for blockchain data analysis by 30%, enhancing ETL efficiency.
Realtime data streaming
● Github: Realtime Data Streaming
● Tech stack: Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, Cassandra, PostgreSQL, Docker
● Description: Developed an end-to-end data pipeline using randomuser.me API to simulate real-time user data streaming. Integrated Airflow for orchestration, Kafka for real-time streaming, and Cassandra for durable storage.
● Impact: Showcased expertise in creating scalable, resilient data pipelines, and implementing fault-tolerant storage solutions.
CERTIFICATION
● IBM Data Science - Gained foundational knowledge in data science, further strengthening data engineering capabilities, particularly in data wrangling, cleaning, and analysis.
● IBM Data Engineering Foundations - Gained essential skills in data engineering, enhancing expertise in data lifecycle management, Python programming, data wrangling, and SQL for efficient data handling and analysis.