TRINH HA GIA PHU
Æ 087******* [ *************@*****.*** GitHub LinkedIn
ABOUT ME
I have completed my Data Science degree and am currently awaiting my graduation certificate. I’m available for full-time work and immediate onboarding. I’ve built real-time data systems using Python, data tools, and worked with Docker, Kubernetes and Terraform to deploy and manage cloud-native services. I enjoy optimizing performance and writing reliable code. I’m comfortable working with SQL and NoSQL databases, as well as data warehouses like BigQuery.
EDUCATION
Industrial University of Ho Chi Minh City - IUH GPA: 3.3/4.0 Bachelor of Engineering in Data Science
CERTIFICATIONS
HackerRank Certificate of Accomplishment SQL (Advanced) View here TOEIC Listening & Reading: 565 View here
SKILLS
Technical Skills:
- Coding Skills: SQL, Python(Docker images), Bash(automation scripts)
- Data Tools: Kafka, Spark, Airflow, Soda, dbt
- Data Warehouse: BigQuery(partitioning, clustering, cost optimization)
- Databases: PostgreSQL, Cassandra(NoSQL, distributed, real-time)
- Infrastructure as Code: Terraform, Helm Chart, YAML(use for Soda quality check) Soft Skills: Problem-solving, Teamwork, Critical Thinking, Time Management WORK EXPERIENCE
AI Intern – SmartChain AI Academy 03/2025 – 05/2025
* Supported feature engineering and data preprocessing to improve model input quality.
* Tested and fine-tuned models to optimize accuracy.
* Created basic reports and dashboards.
Research Intern – Institute for Development and Research in Banking Technology, VNU-HCM 09/2024 – 11/2024
* Conducted applied research and development of a real-time data processing system using modern cloud-native tools.
* Studied modern data architectures including Lambda, Kappa, and Delta Lake, with a solid understanding of batch vs. stream processing paradigms.
* Main project: See “Building a Real-Time Data Stream Processing System on Kubernetes and Terraform” in Highlighted Personal Projects for details.
HIGHLIGHTED PERSONAL PROJECTS
Building a Real-Time Data Stream Processing System on Kubernetes and Terraform GitLab Description: Design and implement a real-time data processing system using Kubernetes, Terraform, Kafka, and Spark. Focus on optimizing performance, scalability and fault tolerance.
· Kafka: Real-time data ingestion from Finnhub API.
· Spark Streaming: Processes high-frequency financial data in-memory.
· Cassandra: Low-latency storage for real-time queries.
· Kubernetes & Terraform: Scalable deployment and automated infrastructure management.
· Highlights: Orchestrated and optimized Spark jobs with Spark-Operator on Kubernetes. Built a fully containerized, distributed pipeline with high reliability and fault tolerance.
· Limitation: Terraform is focused on managing local Kubernetes infrastructure due to budget constraints. I’m actively expanding my skills in AWS cloud.
RFM Customer Segment & Customer Clustering with K-Means GitHub Description: Developed a data pipeline using Airflow, dbt, Spark on DataBricks, and Metabase to analyze customer behavior and drive targeted marketing campaigns.
· Modeled invoice fact table into customer, product, and datetime dimensions with dbt.
· Created SQL reports for revenue by country, top products, and monthly invoices.
· Applied RFM segmentation (9 segments) and trained K-Means with PySpark to identify 4 customer clusters on Databricks.
· Ensured data quality using Soda before transforming and modeling data with dbt.