Data Engineer Intern

Location:

Quan Tan Binh, 72100, Vietnam

Posted:

August 07, 2025

Contact this candidate

Resume:

TA QUANG HUY

OBJECTIVE

Third-year Computer Science student with hands-on experience in building cloud-native data platforms and strong proficiency in Python, SQL, and distributed systems. Skilled in designing data pipelines using Apache Kafka, Airflow, with practical knowledge of cloud storage (Amazon S3, BigQuery) and database systems (SQL & NoSQL). Aspiring Data Engineer with hands-on experience in building scalable, real-time data platforms using Kafka, Spark, and Airflow. Passionate about data systems and delivering reliable, production-grade pipelines. EDUCATION

Open University of HCMC

Major: Computer Science

2022 - current

SKILLS

Programming & Scripting: C++, Python, SQL, Golang

Data Engineering & Processing: Apache Spark, Kafka, Airflow, Kafka Streams, Spark Streaming Databases & Storage: PostgreSQL, MongoDB, Cassandra, MinIO Containerization & Infrastructure: Docker, Docker Compose Visualization Tools: Apache Superset, Power BI

Other Tools & Libraries: Pandas, NumPy, Matplotlib, BeautifulSoup, Scrapy Cloud & Platforms: AWS (S3, Redshift), Google BigQuery Version Control: Git, GitHub

**********.*******@*****.*** • +84-833-***-*** • Ho Chi Minh City, Vietnam Data Engineer Intern

WORK EXPERIENCE

Technology used: Elixir Tango.

Collaborated with cross-functional teams to implement document automation pipelines for insurance clients in the US/CA.

Utilized mapping tools and geographical data sources to accurately pinpoint locations for various tasks and projects.

Conducted comprehensive quality assurance (QA) processes for projects, make reports for each product to ensure functionality and reliability. Reviewed CSV files and thoroughly understood the organizational rules to efficiently organize data onto the platform.

Designed and implemented data validation rules, transformations, and data flow processes within low-code platforms to streamline data management tasks. Icon Consulting Group Jan 2024 - Mar 2024

Freelance Low Code Developer

ACHIEVEMENTS

First runner-up of IT Consultant Challenge

organised by CodeMely and Netcompany

Position in the Top 10 of Swin Hackathon

PROJECTS

Team size: 4

Tech stack: Go, Kafka, Redis, MongoDB, S3, JWT, OAuth2, Docker. Designed a cloud-native file storage platform for secure file management, sharing, and collaboration. The system provides a comprehensive solution for file operations including upload, download, permission management, and real-time collaboration features. Key features:

System Design: Created data models, use case diagrams, and system architecture at project initiation.

File Management System: Built hierarchical folder structures with presigned URL uploads for direct S3 integration, developed APIs for file operations (upload, download, delete, move), ensuring efficient storage interaction.

Authentication & Authorization: Integrated JWT and OAuth2 (Google, GitHub) with a granular, role- based permission system.

Real-time Features: Integrated Server-Sent Events (SSE) to provide live notifications and upload progress tracking, enhancing user experience.

File Permissions and Access Control: Implemented shareable links with expiration, hierarchical access control, and permission inheritance.

GitHub: https://github.com/baothaihcmut/BiBox

System Design: https://shorturl.at/VLtj8

Bibox - Document Storage Solution Jan 2025 - May 2025 Backend Developer & System Designer

CERTIFICATION

IELTS: 5.5 - 2021 Aim: 6.5 - 2025

Position in the Top 10 of Swin Hackathon

IBM Python for Data Science, AI & Development issued by Coursera CONTACT

+84-833-***-***

**********.*******@*****.***

https://github.com/huyta1910

Huy Ta 2

Team size: Personal Project

Tech stack: Python, Apache Kafka, Spark Streaming, PostgreSQL, Superset, Docker. Designed and built a real-time voting system capable of handling high throughput vote submissions with low latency and strong consistency.

Used Apache Kafka for scalable message streaming, ensuring reliable ingestion and decoupling between data producers and consumers.

Applied Spark Streaming for fast, distributed vote aggregation and real-time validation, achieving near- instant updates with fault tolerance.

Used PostgreSQL as the central storage for both transactional and historical data, supporting strong consistency and easy querying for audit trails.

Integrated Apache Superset for real-time result dashboards, enabling continuous insights without reloading or downtime.

Containerized the entire system using Docker Compose, improving deployment speed, reproducibility, and scalability across environments.

GitHub: https://github.com/huyta1910/voting-simulation Real Time Voting Simulation

Data Engineer

July 2025 - current

Contact this candidate