Data Engineer Fresher

Location:

Ho Chi Minh City, Vietnam

Salary:

6000000

Posted:

June 10, 2025

Contact this candidate

Resume:

TRINH HA GIA PHU

Æ 087******* [ *************@*****.*** GitHub LinkedIn

ABOUT ME

I have completed my Data Science degree and am currently awaiting my graduation certificate. I’m available for full-time work and immediate onboarding. I’ve built real-time data systems using Python, data tools, and worked with Docker, Kubernetes and Terraform to deploy and manage cloud-native services. I enjoy optimizing performance and writing reliable code. I’m comfortable working with SQL and NoSQL databases, as well as data warehouses like BigQuery.

EDUCATION

Industrial University of Ho Chi Minh City - IUH GPA: 3.3/4.0 Bachelor of Engineering in Data Science

CERTIFICATIONS

HackerRank Certificate of Accomplishment SQL (Advanced) View here TOEIC Listening & Reading: 565 View here

SKILLS

Technical Skills:

- Coding Skills: SQL, Python(Docker images), Bash(automation scripts)

- Data Tools: Kafka, Spark, Airflow, Soda, dbt

- Data Warehouse: BigQuery(partitioning, clustering, cost optimization)

- Databases: PostgreSQL, Cassandra(NoSQL, distributed, real-time)

- Infrastructure as Code: Terraform, Helm Chart, YAML(use for Soda quality check) Soft Skills: Problem-solving, Teamwork, Critical Thinking, Time Management WORK EXPERIENCE

AI Intern – SmartChain AI Academy 03/2025 – 05/2025

* Supported feature engineering and data preprocessing to improve model input quality.

* Tested and fine-tuned models to optimize accuracy.

* Created basic reports and dashboards.

Research Intern – Institute for Development and Research in Banking Technology, VNU-HCM 09/2024 – 11/2024

* Conducted applied research and development of a real-time data processing system using modern cloud-native tools.

* Studied modern data architectures including Lambda, Kappa, and Delta Lake, with a solid understanding of batch vs. stream processing paradigms.

* Main project: See “Building a Real-Time Data Stream Processing System on Kubernetes and Terraform” in Highlighted Personal Projects for details.

HIGHLIGHTED PERSONAL PROJECTS

Building a Real-Time Data Stream Processing System on Kubernetes and Terraform GitLab Description: Design and implement a real-time data processing system using Kubernetes, Terraform, Kafka, and Spark. Focus on optimizing performance, scalability and fault tolerance.

· Kafka: Real-time data ingestion from Finnhub API.

· Spark Streaming: Processes high-frequency financial data in-memory.

· Cassandra: Low-latency storage for real-time queries.

· Kubernetes & Terraform: Scalable deployment and automated infrastructure management.

· Highlights: Orchestrated and optimized Spark jobs with Spark-Operator on Kubernetes. Built a fully containerized, distributed pipeline with high reliability and fault tolerance.

· Limitation: Terraform is focused on managing local Kubernetes infrastructure due to budget constraints. I’m actively expanding my skills in AWS cloud.

RFM Customer Segment & Customer Clustering with K-Means GitHub Description: Developed a data pipeline using Airflow, dbt, Spark on DataBricks, and Metabase to analyze customer behavior and drive targeted marketing campaigns.

· Modeled invoice fact table into customer, product, and datetime dimensions with dbt.

· Created SQL reports for revenue by country, top products, and monthly invoices.

· Applied RFM segmentation (9 segments) and trained K-Means with PySpark to identify 4 customer clusters on Databricks.

· Ensured data quality using Soda before transforming and modeling data with dbt.

Contact this candidate