NGUYEN MINH KHOI
DATA ENGINEERING
Ho Chi Minh City, Vietnam ************@*****.*** +84-082******* Linkedin GitHub CAREER OBJECTIVE
I am a student at the University of Economics Ho Chi Minh City (UEH), majoring in E-commerce. Besides having a background in eCommerce related issues, working on various projects as a Data Engineering has helped me improve my data analysis skills yours. Currently, I am ready to integrate into the corporate environment and contribute my best.
EDUCATION
UNIVERSITY OF ECONOMICS HO CHI MINH CITY (UEH)
Bachelor’s in E-Commerce
Ho Chi Minh City, Vietnam
2022 - 2025
• GPA: 3.76/4.0
SKILLS
TECHNICAL SKILLS
● Relational databases: MySQL, SQL Sever, PostgreSQL, AzureSQL.
● SQL: SQL Query, Window Function, CTE, Subquery.
● Python: ETL pipeline (Extract, Load, Transform), Pandas, Numpy, OS, Matplotlib, Seaborn (Data Visualization).
● Pyspark: ETL pipeline (Extract, Load, Transform), transforming OLTP data from Data Lakes to OLAP data in Data Warehouse.
● Power BI: Diagnostic Analytics, Transform, Data Model, Data Visualization, Basic Dax Function, Metrics and Measures creation, Power BI Service.
● Spreadsheets: Excel, Google Spreadsheets, Power Query, Pivot Table.
● Cloud Infratructure: Azure cloud.
● Others: Tableau, SPSS.
SOFT SKILLS
● Storytelling, Problem Solving, Analytical Thinking
● Teamwork and Independent work, Communication Skills LANGUAGES
● Vietnamese (Native)
● English
WORK EXPERIENCE
TAY A REAL ESTATE SERVICE TRADING CO., LTD
Data Entry Clerk (Part-time)
Ho Chi Minh City, Vietnam
June 2024 – August 2024
• Search for real estate information and seller profiles on the NhaTot website. Then, fill in the data in the appropriate columns, use the Excel function to find duplicate information
• Manage company data such as updating the selling price status, whether the real estate has been traded or not
PERSONAL PROJECTS
CUSTOMER BEHAVIOUR AND TRENDING ANALYSIS
Tools: Pyspark(ETL data, change OLTP to OLAP), MySql database, PowerBI, Dax code From the data of LogContent data and LogSearch data, perform:
• Build a data pipeline, ETL Data From DataLake To DataWareHouse Using PySpark, SQL, Transfomation Action, Window Function, Repartition to load data to MySql database. After that, connect MySql database to PowerBI to visualize and find insights.
• Calculate how much time each user spends on each different category, determine which categories users access. and identify the category that each user spends the most time on. Additionally, it aims to assess customer activeness.
• Identify the most searched keywords and their corresponding categories for each user during the two time periods: from June 1–14, 2022, and from July 1–14, 2022. Additionally, it aims to examine whether the categories users are interested in have changed between the two months, and if so, how they have changed.
• Visit project: Here
CUSTOMER360 ANALYSIS
Tools: SQL, Power BI, Dax Code, OLTP to OLAP, RFM, Segmentation, Campaign Development.
• The Customer360 Analysis Report is a comprehensive report that analyzes customer data based on theories and models such as the Customer360 framework, RFM model, IQR method, and BCG matrix. It provides insights into grouping and analyzing customer groups.
• Using SQL to calculate the R (Recency), F (Frequency), and M (Monetary) metrics. Next, apply the IQR method to determine the interquartile range for R, F, and M. Afterward, combine R, F, and M into a three- digit composite score (ex: customer A has R =1, F=3, M=2 so RFM = 132).
• Then, utilize the BCG matrix to categorize customers into four groups: VIP Customers, Loyal Customers, Potential Customers, and Visiting Customers. Finally, connect the results to Power BI and use DAX code to visualize the data and provide insights for each customer group.
• Visit project: Here
E-COMMERCE ANALYTICS
Tools: Excel, Power BI, Dax Code
From the data a promotion program of 30% off total bill, maximum 20.000 VNĐ (Shopee), we will make a report to see the overview of the promotion program and answer some information about What is the total bill range with the highest number of transactions? What are the 3 sellers who are likely to be fraudulent and explain? What are the 3 buyers who are likely to be fraudulent and explain?, perform:
• Use Excel to review the data for errors. Then, transform data in Power BI to model and prepare data for analysis.
• Use DAX Code to calculate metrics such as: Median, Max, Percentage per segment, etc, and then visualize them through charts and create a dashboard.
• Base on chart and statistics to find the total bill range with the highest number of transactions and Find insignts to find out information about 3 sellers who are likely to be fraudulent, 3 buyers who are likely to be fraudulent.
• Visit project: Here
PERSONAL AWARDS
• Google Data Analytics Professional Certificate (In progress).
• International English Language Testing System (IELTS): Overall 5.5 (L: 5.0, R:5.5, W:6.0, S:4.5) (4/2022).
• Prize for Student of 5 faculty-level merits 2023.
• Award C for UEH Scientific Research 2024: The impact of After-sales service factors on electronic loyalty, and cosmetics on e-commerce platforms: a case study of students in HCM City.