Data Engineering Intern

Location:

Quan 1, 71000, Vietnam

Posted:

November 15, 2024

Contact this candidate

Resume:

HUNG Q. NGUYEN

Ho Chi Minh City, VN ********@*****.*** 094******* https://github.com/hunglk25

EDUCATION

University of Information Technology(UIT) Ho Chi Minh, VN Major in Computer Science Aug 2022 – June 2026

TECHNICAL SKILL

● Programming Languages: Python, SQL, C++

● Database Systems: MySQL, PostgreSQL, Cassandra

● Big Data and Data Processing: Apache Airflow, Apache Kafka, PySpark, ETL

● Machine Learning Libraries: scikit-learn, TensorFlow

● Cloud & Storage Solutions: Amazon S3

● Containerization & Orchestration: Docker, Docker Compose

● Messaging & Coordination: Kafka

ACADEMIC PROJECT

Spark data streaming

● Github: Spark Data Streaming

● Tech stack: Python, Apache Kafka, Apache Zookeeper, Apache Spark, Docker, Amazon S3

● Description: Simulated vehicle data generation to test a real-time data pipeline. Utilized Apache Kafka for data streaming and Zookeeper for broker management, achieving high-throughput, low-latency processing.

● Impact: Demonstrated the ability to handle high-velocity data and reliably store processed data in Amazon S3, showcasing data engineering skills in streaming, fault tolerance, and cloud storage. Blockchain data streaming

● Github: Blockchain Data Streaming

● Tech stack: Apache Airflow, Python, PostgeSQL, Docker, Amazon S3

● Description: Orchestrated blockchain data ingestion from Coingecko API using Airflow for workflow management. Leveraged Docker Compose for efficient deployment, enabling scalable, reproducible environments.

● Impact: Increased automation and reduced manual processing time for blockchain data analysis by 30%, enhancing ETL efficiency.

Realtime data streaming

● Github: Realtime Data Streaming

● Tech stack: Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, Cassandra, PostgreSQL, Docker

● Description: Developed an end-to-end data pipeline using randomuser.me API to simulate real-time user data streaming. Integrated Airflow for orchestration, Kafka for real-time streaming, and Cassandra for durable storage.

● Impact: Showcased expertise in creating scalable, resilient data pipelines, and implementing fault-tolerant storage solutions.

CERTIFICATION

● IBM Data Science - Gained foundational knowledge in data science, further strengthening data engineering capabilities, particularly in data wrangling, cleaning, and analysis.

● IBM Data Engineering Foundations - Gained essential skills in data engineering, enhancing expertise in data lifecycle management, Python programming, data wrangling, and SQL for efficient data handling and analysis.

Contact this candidate