DO HUU THAI - Data Engineer
**********@*****.*** 097******* Hanoi https://github.com/thaikun203 About
As a student, I always invest time in learning about new technologies and best practices. I am eagerly looking forward to the opportunity to contribute and provide as much value as possible to the business in an Intern/Fresher Data Engineer role.
Education
Information Technology, Hanoi University of Industry (2021 – Present) Technical Skills Main: Python, Spark, Kafka, Airflow, ETL/ELT, SQL, T-SQL, NoSQL, MySQL, MongoDB, Docker, Linux . Others: SSIS tool, Power BI, AWS.
Personal Project
DE-project-1: Realtime Data Streaming End-to-End Data Engineering Project (Link) Project overview: Build an ETL pipeline to handle real-time data from an API and process it for both further analysis and real-time analytics using modern data engineering tools. Requirements:
• Apache Airflow fetches data from the Wikimedia API and stores it in PostgreSQL.
• Apache Kafka streams data from PostgreSQL to the processing engine with Zookeeper for synchronization.
• Apache Spark processes streaming data using its distributed architecture.
• Load cleaned data into Cassandra for real-time analysis and PostgreSQL for further analysis.
• Utilize Docker to containerize and deploy all components of the pipeline.
• Technologies Used: Python, Kafka, Spark, PostgreSQL, Cassandra, Docker. DE-project-2: Realtime Data Streaming End-to-End Data Engineering Project (Link) Project overview: Build a Data Warehouse, use SSIS tool for ETL to process real-time data, and use Power BI to create reports
Requirements:
• Use SQL Server to design the Data Warehouse.
• Implement workflows with the SSIS tool to process real-time data.
• Create reports using Power BI.
Certifications
Hackerrank SQL (Advanced) Certificate