Data Engineer Intern: ELT & Lakehouse Pipelines

Location:

Long Hoa, Vinh Long, Vietnam

Posted:

June 17, 2026

Contact this candidate

Resume:

NGUYEN TRUNG KIEN

DATA ENGINEER INTERN

Thu Duc, Ho Chi Minh +84-865-***-*** ****************@*****.*** Github Linkedin SUMMARY

Data Engineer with a strong foundation in designing, automating, and optimizing scalable data pipelines and lakehouse architectures. Proficient in building robust ELT/ETL workflows using Python, SQL, dbt, and Apache Airflow/Dagster. Adept at handling data integration from diverse platform APIs, managing schema evolution, and delivering high-performance data models for downstream analytics. EDUCATION

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND ENGINEERING Expected 2026 Bachelor of Engineer in Data Engineering

EXPERIENCED

METAGRIT VIETNAM CO., LTD Ho Chi Minh, Viet Nam

BACKEND DEVELOPER INTERN 03/2026 – 06/2026

• Designed and maintained database schema across 16 MySQL tables and 4 MongoDB collections to support core business domains.

• Built and documented 30+ RESTful CRUD APIs integrated with MySQL and MongoDB.

• Developed user tracking APIs to capture behavioral data (clickstream, session, user actions), enabling downstream funnel analysis, retention tracking, and user behavior analytics.

• Defined API contracts, request/response structures, and error handling conventions consumed by frontend clients. PROJECTS

FMCG Sales & Operations Analytics Platform Github

• Architected and orchestrated a fully automated 3-layer ELT pipeline (Staging, Intermediate, Marts) using dbt, Apache Airflow, and PostgreSQL via custom DAGs. Implemented robust data testing and documentation in dbt, ensuring data integrity and zero processing failures across historical transaction records.

• Modeled data structures optimized for analytical queries, enabling sub-second rendering for interactive downstream Power BI dashboards tracking key business performance metrics. eCommerce Data Lakehouse Github

• Engineered an end-to-end data pipeline processing large-scale e-commerce logs by integrating Apache Kafka for real-time ingestion and PySpark (Structured Streaming & Batch) for ETL transformations. Utilized Apache Iceberg as the open table format, optimizing storage layout with automated compaction and implementing SCD Type 2 for historical dimension tracking.

• Leveraged ClickHouse Materialized Views and Trino to power near-real-time dashboards (Cohort Retention, RFM analytics) in Apache Superset.

NovaBank Credit Risk Prediction Platform Github

• Architected a containerized (Docker) MLOps and feature-engineering pipeline using Dagster, dbt, DuckDB, and Mlflow. Automated feature engineering and model scoring workflows, incorporating data validation steps

(PSI/CSI metrics) to detect data drift post-deployment.

• AI screening tool successfully isolated the 21.8% high-risk customer profiles, enabling targeted credit policy decisions and reducing manual review effort, analytical dashboards covering delinquency rates, DPD buckets, vintage curves, and portfolio segmentation to surface early risk signals across 32,581 historical credit records. SKILLS

Programing & Query Language: Python, SQL

Data Ingestion & Collection: BeautifulSoup, Selenium, RESTful APIs, Kafka Transformation & Orchestration: dbt, Airflow, Spark, Dagster Data Platforms & Storage: PostgreSQL, SQL Server, Clickhouse, MongoDB, BigQuery, MinIO, Trino, Iceberg BI, Analytics & Visualization: Power BI, Metabase, Superset, Looker Developer Tools: AI Coding Assistants, VS Code, DBeaver, Git Machine Learning: Scikit-Learn, Tensorflow, MLFlow, Pytorch Core Skill: Data-driven Problem Solving, Analytical Thinking, Data Storytelling, Adaptability & Attention to Detail

Contact this candidate