Nguyen Ngoc Hai Luan
Data Engineer Intern
Æ 090******* [ *************************@*****.*** LinkedIn GitHub Ho Chi Minh, Viet Nam Working Experience
Quoc Bao Software Ho Chi Minh, Viet Nam
Data Analysis - Intership Jul 2024 – Dec 2024
• Collected, processed, and analyzed a large-scale university dataset containing 230,000 rows and 23 columns.
• Performed data cleaning and transformation, including handling missing values, removing duplicates, and standardizing formats data.
• Exploratory data analysis (EDA) to identify key patterns and insights that support decision-making processes.
• Build Machine Learning models such as Logistic Regression and Decision Trees to predict scholarship eligibility.
• Design and build interactive dashboards and reports using Power BI, providing stakeholders with actionable insights through visual analytics.
• Generated scholarship eligibility lists with 68% predictive accuracy, enhancing the efficiency and accuracy of the selection process.
Mindx Technology School Ho Chi Minh, Viet Nam
Programming Teacher Part-time Aug 2023 – May 2024
• Providing basic SQL and Python lessons for everyone from 12-30 years old, classes of 10-20 students with online format.
• Helps students become familiar with basic query statements.
• Assists students in completing their final project. Projects
ETL Pipeline for Retail Data Warehouse GitHub
• Processed the Online Retail.xlsx dataset, containing over 500,000 transaction records from multiple countries.
• Designed a Data Warehouse with Dimension tables (Product, Customer, Date) and a Fact table (Sales).
• Utilized SSIS for data extraction, transformation, and loading (ETL), incorporating Conditional Split, Data Conversion, and Lookup Transformation.
• Calculated the TotalPrice column, mapped data into the SQL Server Data Warehouse, and optimized ETL performance.
Analysis Salary IT GitHub
• Analyzed IT salary data using EDA and visualization in Python.
• Applied ANOVA, Chi-Square, and machine learning models Linear Regression, Logistic Regression, Decision Tree.
• Identified key salary factors based on experience, industry, location, and company size.
• Built interactive salary dashboards in Power BI. Diabetes Analysis and Prediction GitHub
• Developed a machine learning model to predict diabetes using logistic regression, decision trees, support vector machine and random forest.
• Implemented data processing and analysis techniques to identify the causes of diabetes.
• Implemented a Flask web application that allows users to input health metrics and predictions diabetes.
• Optimized model performance by feature engineering and hyperparameter tuning, reducing overfitting. Education
Ho Chi Minh City University of Industry and Trade Ho Chi Minh, Viet Nam Bachelor’s of Data Science; GPA: 3.08/4.00 Oct 2021 – Present Language: Japan - JLPT: N5
Skills
Languages: C/C++, Python, SQL, R, Java
Technologies:MySQL, MongoDB, PostgreSQL, Neo4j, Kafka, Spark, Airflow, PyTorch, TensorFlow