Bui Thi Hue Data Scientist
Dob
Phone
*************@*****.***
Address
Le Van Luong Street, Nha Be
District, Ho Chi Minh City
Objective
I can help businesses
understand customer data,
thereby optimizing their
business strategy
SKILLS
Technical Skills
• SQL (Advanced) – Complex
Queries, CTE, Window
Functions
• Power BI (Intermediate)
Create an Automated
Dashboard Using DAX
Functions
EDUCATION
University of Transport Ho Chi Minh City 2020 - 2024 Infomation Technology
GPA: 3.37
Achieve high academic achievements in Database, Data Mining, Artificial Intelligence
PROJECTS
09/2023 - 10/2023
Data Scientist
Languages: Python
Data Description: The available dataset includes four attributes: TV, Radio, Newspaper and Sales
Data Cleaning & Basic Statistical Analysis:
• Tool/Language: Python (using Pandas library).
• Tasks:
Data cleaning.
Basic statistical analysis (mean, standard deviation, maximum, and minimum values).
Data Visualization:
• Tool/Language: Python (using Seaborn library).
• Tasks:
Create a Pairplot to visualize relationships between TV, Radio, Newspaper, and Sales
Generate a Correlation Heatmap to identify correlations between variables.
Model Building:
• Tool/Language: Python (Scikit-learn) Environment: Jupyter Notebook, VS Code.
• Models:
Multiple Linear Regression.
Polynomial Regression.
Model Avaluation:
• Multiple Linear Regression:
Train the Multiple Linear Regression model on the training set and predict Sales on the test set.
Evaluation Metrics:
Mean Absolute Error (MAE): 1.5117
R-squared (R ): 0.86
Visualization: Plot the relationship between actual Sales and predicted Sales using Matplotlib in Python.
• Polynomial Regression:
Train the Polynomial Regression model on the training set and predict Sales on the test set.
Evaluation Metrics:
• Python (Intermediate) – Use
Pandas, NumPy, Matplotlib,
and Seaborn for Data
Processing and Visualization
• Machine Learning (Basic)
Use Scikit-Learn for Predictive
Analysis
Probability & Statistics
Descriptive Statistics
• Measures of Central
tendency
• Measures of variability
Soft Skills
• Communication Skill
• Problem Solving Skill
• Self-Study Skill
• Time Management Skill
Mean Absolute Error (MAE): 0.5906
R-squared (R ): 0.98
Visualization: Plot the relationship between actual Sales and predicted Sales using Matplotlib in Python.
Conclusion:
The Polynomial Regression model achieves higher accuracy compared to the Multiple Linear Regression model.
11/2023 - 11/2023
Data Scientist
Languages: Python
Data Collection: Used the "Boston Housing" dataset from the Scikit- learn library.
Data Cleaning & Basic Statistical Analysis:
• Tool/Language: Python with Pandas library.
• Tasks:
Check for missing values and count unique values.
Perform basic statistical analysis (mean, standard deviation, maximum, and minimum values).
Data Visualization:
• Tool/Language: Seaborn and Matplotlib in Python.
• Create a heatmap to visualize correlations between attributes. Model Building:
• Tool/Language: Python (Scikit-learn) Environment: Jupyter Notebook, VS Code.
• Model: Linear Regression.
• Model Training & Evaluation:
Training Data Performance:
R : 0.746
Adjusted R : 0.736
MAE (Mean Absolute Error): 3.089
RMSE (Root Mean Squared Error): 4.367
Test Data Performance:
R : 0.712
Adjusted R : 0.685
MAE: 3.859
RMSE: 5.482
Conclusion: • The model achieves an R score of 71.2%, indicating good predictive capability. The MAE and RMSE values on the test set do not signi cantly differ from the training set, demonstrating that the model does not suffer from severe over tting and maintains stable accuracy when applied to new data.
CERTIFICATIONS
Python for Data Analyst 2023
HackerRank SQL
(Advanced )
2025
© topcv.vn