BÙI KHẮC KIÊN
Data Engineer
Male
*************@*****.***
Dormitory area B - Mac Dinh Chi
street - Dong Hoa ward - Di An city
- Binh Duong
https://github.com/BuiKien15
Skills
Data Preprocessing
Data Processing
Spark and PySpark
Python Programming
Analysis and Interpretation
Data Visualization
Machine Learning
Data Mining
CAREER GOALS
"Wish to participate in real Data Engineering projects, where I can apply knowledge of programming, databases, data visualization and Machine Learning to solve specific problems of the company. With the knowledge imparted from university, experience from projects, the ability to diligently explore, eager to learn along with teamwork skills, withstand work pressure and communicate effectively, I believe that I can contribute to building a team Data Engineering is strong and effective, while learning and developing yourself."
EDUCATION
2021 - 2025
UNIVERSITY OF INFORMATION TECHNOLOGY - NATIONAL UNIVERSITY HCMC -UIT
Majoring in Information Systems
GPA: 3.05/4.0
PROJECTS
Semester 2 - 2024
UNIVERSITY OF INFORMATION TECHNOLOGY Student
Title: Building a Retail Sales Data Warehouse
Link: https://drive.google.com/drive/folders/1HzCtz_qf_xjotTN9izRabtkIIP6GzJOT? usp=sharing
Objective: Developed a data warehouse for analyzing retail sales. Technologies: SQL Server, SSIS, SSAS, Power BI, Python. Highlights:
• Designed data models and implemented ETL processes.
• Created OLAP cubes for analysis and generated reports in Power BI. Outcome: Improved data accessibility and insights for business decisions. Semester 2 - 2024
UNIVERSITY OF INFORMATION TECHNOLOGY Student
Title: Analyze Data on Virtual Currency Prices
Link: https://github.com/canhlong1430/PTDLK
Objective: Developed predictive models for cryptocurrency prices using time series analysis.
Technologies: ARIMA, Linear Regression, RNN, GRU, LSTM, VAR, TBATS, PatchTST. Highlights:
• Analyzed Binance Coin (BNB), Dogecoin (DOGE), and Ethereum (ETH) datasets.
• Utilized various algorithms for price prediction over 30, 60, and 90 days.
• Evaluated models using MSE, MAE, RMSE, and MAPE metrics. Outcome: Provided insights into cryptocurrency price trends to assist investors in decision-making.
Semester 2 - 2024
UNIVERSITY OF INFORMATION TECHNOLOGY Student
Title: Predicting the Likelihood of Hotel Booking Cancellations from Hotel Activity Data
Link: https://github.com/BuiKien15/Data-Mining
Objective: Develop a machine learning model to predict hotel booking cancellations using hotel activity data.
Technologies: Python, Pandas, Scikit-learn, Seaborn, Plotly. Data Warehousing Concepts
(Understanding of Data Models,
Star Schema, etc.)
Data Transformation (ETL Process -
Using SSIS)
Exploratory Data Analysis
Time Series Analysis
Model Evaluation and Comparison
OLAP Cube Development (Using
SSAS)
MDX Querying
Deep Learning
Hobbies
● Football
OTHER SKILLS
Programming languages: C#, C++,
Python, R
Query language: Oracle
Big data: Hadoop, Apache Ant
Social network analysis: Gephi, network
structure analysis, centrality measures,
information spread model
Soft skills: Teamwork, flexible
communication, ability to communicate
CERTIFICATE
IELTS 6.0
ACTIVITY
Member of the executive committee of
MMCL Association
Highlights:
• Analyzed the Hotel Booking Demand dataset with 119,390 records and 36 attributes.
• Preprocessed data to handle missing values, outliers, and normalization.
• Implemented various classification algorithms including Decision Tree, Naive Bayes, and Logistic Regression.
• Utilized K-Fold Cross Validation to evaluate model performance. Outcome: Provided insights into factors influencing booking cancellations, assisting hotels in improving reservation management and reducing cancellation rates. Semester 1 - 2025
UNIVERSITY OF INFORMATION TECHNOLOGY Student
Title: Research and Implementation of Clustering Algorithms on Terrorist Attacks in the UK Using PySpark
Link:
https://drive.google.com/drive/folders/13wQiJjUYWVynJFakE9SIS3WqG6afHEL? fbclid=IwY2xjawIafxJleHRuA2FlbQIxMAABHbhvE01bqehO8daUtxTH4nrNEzNZRDBvK 8uFGGyX1qOsE5DpMrPoPU3QGg_aem_iTDndYZmZec7LTm7hH8PZg Objective: Develop a machine learning model to cluster terrorist attacks in the UK to propose effective strategies for counter-terrorism. Technologies: PySpark, K-Means.
Highlights:
• Analyzed the Global Terrorism Dataset comprising 5,235 records with 135 attributes.
• Preprocessed data to handle missing values and irrelevant columns.
• Implemented K-Means clustering to identify patterns in terrorist attacks.
• Evaluated clusters based on attack types, casualties, and targets. Outcome: Provided insights into the characteristics of terrorist attacks, aiding in the development of targeted counter-terrorism strategies. Semester 1 - 2025
UNIVERSITY OF INFORMATION TECHNOLOGY Student
Title: Network of Supply Chains/Supermarkets in the States of the US Objective: Analyze the relationships between product categories and states within supermarket chains to enhance business strategies. Link: https://drive.google.com/drive/folders/119a5yAgZ88jeUihJXP-YM5QjiujTL7nT Technologies: Python, Gephi, Louvain Algorithm, Girvan-Newman. Highlights:
• Analyzed a dataset of supermarket sales with 9,994 records and 13 attributes.
• Cleaned and processed data to eliminate duplicates and handle missing values.
• Converted data from DataFrame to graph format for community detection.
• Implemented Louvain and Girvan-Newman algorithms to identify clusters in sales data.
• Evaluated the networks using metrics such as PageRank, Eigenvector, Closeness, and Betweenness centralities.
Outcome: Provided insights into sales trends across different states, facilitating the development of targeted marketing strategies for various product categories.
© topcv.vn