Contact
Education
Phone
*****.********@*****.***
Linh Dong, Thu Duc, Ho Chi Minh City
Address
Skills
TAO CHI VY
Data Engineer - Data Scientist
ABOUT ME
Data Engineering
(Primary Focus)
Big Data & Streaming
Apache Hadoop, Apache Spark,
Apache NiFi, Apache Kafka
HCMC UNIVERSITY OF TECHNOLOGY
AND EDUCATION
Specialization: Information Systems
2021-2025
I am Tao Chi Vy, an Information Systems student
specializing in Data Engineering with experience in full- stack development using Django. I have a solid
foundation in ETL pipelines, data warehousing, and system optimization to enhance data processing
efficiency. I design scalable data architectures and integrate machine learning models into data-driven applications.
Additionally, I develop web applications and APIs that ensure smooth interaction between frontend and
backend.
WORK EXPERIENCE
Data Engineer - Data Scientist Intern
VNA Group 3 months ( 9/2024 - 11/2024 )
Built a centralized data warehouse for poor
households and people with revolutionary merits,
supporting data-driven policy development.
Improved data retrieval efficiency by 30% using
PostgreSQL and MySQL over two months.
Developed a real-time ETL pipeline with Apache NiFi, boosting processing speed by 40%.
CAREER OBJECTIVE
Aspiring to advance my career in Data Engineering by focusing on designing and optimizing ETL pipelines and data workflows. I have hands-on experience with
Apache NiFi, Kafka, Spark, Hadoop, and databases
including PostgreSQL, MySQL, and MongoDB. Leveraging my backend development skills with Django and
experience integrating machine learning models, I aim to build efficient, scalable data systems that drive business value.
Relevant coursework:
Big Data Analytics - Coursera,
Samsung Innovation Campus - Big
Data Course
Foundations of User Experience (UX)
Design - Coursera
https://www.linkedin.com/in/taovy060103
https://github.com/Cloudy009
Github
https://chivy-mycv.onrender.com
Personal Website
Skills
Backend & API Development
(Supporting Role)
Backend Development
Python (Django, FastAPI, Flask)
Java (Spring Framework)
IntelliJ IDEA
API Development
RESTful APIs, Dialogflow
Deployment & DevOps:
Docker, AWS, Render, Railway,
MongoDB Atlas, Git, Supabase,
Neon
Frontend Development
Languages & Frameworks
JavaScript, React.js, HTML, CSS,
SCSS
Other Skills
Tools & Platforms
Jupyter Notebook, Dash, DBeaver
pgAdmin, MongoDB Compass
Soft Skills
Communication
Problem-solving
Critical thinking
Solution design
Teamwork
Adaptability
PROJECT
Oracle-Based Data Pipeline with Airflow, Spark & Power BI Developed an end-to-end data pipeline using Oracle, Airflow, Spark, MinIO, and DBT. Automated ingestion from CSV/Excel into Oracle, enriched data from external APIs
(stored in MinIO), performed transformations with Spark and DBT, and visualized KPIs in Power BI dashboards. Real-Time Twitter Sentiment Analysis
Developed a real-time pipeline to analyze public
sentiment from Twitter using AWS EC2, Apache NiFi, Kafka, Spark Streaming, and MongoDB.
Deployed TF-IDF + Logistic Regression model for tweet classification
Streamed and processed live tweets; stored results in MongoDB
Visualized trends with Dash & Plotly; containerized with Docker Compose
Real-Time E-commerce Analytics Dashboard with Spark, Kafka & Grafana
Built a real-time analytics system for e-commerce user behavior using Kafka, Spark Streaming, and MySQL.
Stored and visualized time-series data with InfluxDB and Grafana.
Integrated batch demographic data with streaming
activity for combined analytics.
Deployed via Docker Compose with performance tuning and monitoring.
Delivered dashboards showing campaign performance, gender-based order distribution, and real-time insights. Optimized RESTful APIs with FastAPI, cutting data
retrieval time by 50%
Deployed and managed services with Docker for
efficient containerization and orchestration.
Data Engineer - Freelance Project
VNA Group 2 months ( 3/2025 - 4/2025 )
Designed and automated a centralized admissions
data warehouse for high school applicants, using a hybrid Star–Snowflake schema and SCD Type 1
Scheduled daily ETL jobs using Pentaho PDI.
Triggered real-time ETL updates via Pentaho Carte API, achieving data refresh within ~1 minute.
Ensured data accuracy and reliability through testing with Postman and JavaScript Fetch API.
Delivered ready-to-use data for strategic insights into high school admissions and student demand.
ETL & Data Processing
Machine Learning
Integration, Data Processing,
Recommendation Systems,
ETL Processes, DBT, Pentaho
PDI
Excel
Database & Storage
SQL Server, MySQL,
PostgreSQL, MongoDB,
Cassandra, Firebase, Neo4j,
CouchDB, MinIO, InfluxDB,
Oracle
Data Visualization:
BI Tools: Tableau, Power BI
Real-time Dashboards: Grafana
Orchestration & Automation
Power Automate
CERTIFICATE
Google Data Analysis
TOIEC 600+
Samsung Innovation Campus (SIC)
Full portfolio & source code on GitHub: github.com/Cloudy009 Build Sales Data Warehouse for Business Intelligence Designed and implemented a data warehouse for sales reporting, order fulfillment, and inventory analysis using SQL Server and Visual Studio Community 2022.
Dockerized ETL Pipeline with Airflow & FastAPI
Created a Docker-based ETL pipeline with Airflow for a PostgreSQL data warehouse. Implemented multi-layer
(Bronze–Gold) processing, integrated MinIO for storage, restored data with PgAdmin, and exposed FastAPI
endpoints with REST-triggered DAGs for real-time
synchronization.
Customer Segmentation & Recommendation System
Processed behavioral data and developed machine
learning models for customer segmentation and product suggestion.
Sales Performance Dashboard with Power BI
This Power BI project delivers a sleek and interactive dashboard for visualizing sales performance data across regions, products, and time periods. It features dynamic slicers, auto-refresh with Power Automate, a modern interface designed in Figma, and real-time refresh timestamps.
Passport Renewal System with Oracle & Django
Built a web application for managing passport renewals using Django and Oracle. Included authentication, form validation, and data persistence features.
Food & Beverage Web App with Chatbot
Developed a Django-based F&B web application with
MySQL backend and a Dialogflow-integrated chatbot. Also deployed a version using MongoDB Atlas for flexible data storage. Hosted on Render and Railway for scalable deployment.
Language
English
Japanese
Hobby
Singing
Playing Ghita
Sport