Vo Huu Nghia
DATA ENGINEER
Phone: +84-397****** Email: ***********@*****.*** GitHub: HuuNghia1406 SUMMARY
I am an experienced data engineer in designing, building, and optimizing robust data pipelines. I have experience working with database management systems such as SQL Server, MongoDB, Azure, MySQL, PostgreSQL, and Oracle, and big data processing tools such as Hadoop, Spark, and Kafka. I am aiming to become a specialist in the field of database optimization and big data processing. I aspire to work in a professional environment where I can maximize my abilities to contribute to the company's development. EDUCATION
• HCM University of Technology and Education
• DATA ENGINEERING Aug 2020 - Aug 2024
PROJECTS
Recommend System Aug 2023 - Apr 2024
• Project Description:
- Developed an intelligent book recommendation system leveraging the University of Technology and Education Ho Chi Minh City's library database. By employing advanced machine learning algorithms, we achieved an 85% accuracy rate in recommendation results. This system empowers users to save time and efficiently discover relevant materials.
• Responsibilities:
- Data Preparation: The sheer volume and complexity of the over 200 data tables(16.5GB), coupled with the absence of any documentation, made understanding the individual tables a daunting task, prolonging the data preprocessing phase to 3 months.
- Data Warehouse:By using SSIS to design a data warehouse with 2 fact tables and 15 dimension tables, we increased concurrent processing capability by 25%.
- Data Visualization Dashboard: Use Power BI to connect and import data from the data warehouse.
- Association Rule Mining: By applying the Apriori algorithm to mine association rules, we reduced the dataset size by 1/6, significantly improving the efficiency of our analysis.
- Recommendation System: Through optimization, the book recommendation system has significantly improved its accuracy, increasing from 70% to 85%. This means users can more easily find books that match their preferences, enhancing their reading experience
• Technology: Sql, Apriori, FP-Growth, Word2Vec, TF-IDF, Knn, Python,, Power BI, Streamlit. Data Mining Retail Sales & Bank Data Feb 2023 - May 2023
• Project Description: This project aims to analyze customer data to gain valuable insights into customer behavior, identify customer segments, and build predictive models for targeted marketing and decision-making.
• Responsibilities:
- Data processing and cleaning.
- Building a clustering model using Microsoft Clustering for evaluating customer clusters.
- Analyzing association rules using Association Rule.
- Building a prediction model and making decisions using Microsoft Decision Trees.
- Using Power Bi and Python to visualize data.
• Technology: SQL, Python, SSIS, SSAS, Power BI, Seaborn, Scikit-learn, Matplotlib, Pandas. Python for Data Analyst -10 Databases Sep 2023 - Dec 2023
• Project Description: This project aims to explore datasets from Kaggle, visualize its key characteristics, and uncover valuable insights through data analysis. By employing appropriate visualization techniques and analyzing the results, the project seeks to identify patterns, trends, and anomalies within the data.
• Responsibilities:
- Obtain dataset from Kaggle.
- Preprocess data by cleaning, transforming, and organizing it.
- Select appropriate visualization techniques based on dataset requirements.
- Utilize various charts and plots to visualize the data.
- Analyze patterns and relationships revealed by the visualizations.
- Identify interesting observations, patterns, and outliers.
- Draw meaningful conclusions from the analyzed data.
• Python Libraries: Numpy, Pandas, Matplotlib, Seaborn, Scikit-learn, SciPy. SKILLS
Database RDBMS: SQL Server, My SQl, Azure, Postgres, MongoDB.
Big Data tools: Spark, Kafka, Hadoop.
ETL tools: PySpark, Airflow.
Program language: Python, SQL, C/C++, C#.
Infrastructure: Docker.
Analyst tools: Excel, Power Bi, Tableau.
ADDITIONAL SKILLS AND CERTIFICATE
Completed courses in C++, Python, OOP on Codelearn
Panasonic Soft Skills Certification
ETL in Python and SQL
English: Good