Post Job Free
Sign in

Data Scientist Ho Chi

Location:
Ho Chi Minh City, Vietnam
Posted:
March 24, 2025

Contact this candidate

Resume:

Bui Thi Hue Data Scientist

Dob

**/**/****

Phone

036*******

Email

*************@*****.***

Address

Le Van Luong Street, Nha Be

District, Ho Chi Minh City

Objective

I can help businesses

understand customer data,

thereby optimizing their

business strategy

SKILLS

Technical Skills

• SQL (Advanced) – Complex

Queries, CTE, Window

Functions

• Power BI (Intermediate)

Create an Automated

Dashboard Using DAX

Functions

EDUCATION

University of Transport Ho Chi Minh City 2020 - 2024 Infomation Technology

GPA: 3.37

Achieve high academic achievements in Database, Data Mining, Artificial Intelligence

PROJECTS

09/2023 - 10/2023

Data Scientist

Languages: Python

Data Description: The available dataset includes four attributes: TV, Radio, Newspaper and Sales

Data Cleaning & Basic Statistical Analysis:

• Tool/Language: Python (using Pandas library).

• Tasks:

Data cleaning.

Basic statistical analysis (mean, standard deviation, maximum, and minimum values).

Data Visualization:

• Tool/Language: Python (using Seaborn library).

• Tasks:

Create a Pairplot to visualize relationships between TV, Radio, Newspaper, and Sales

Generate a Correlation Heatmap to identify correlations between variables.

Model Building:

• Tool/Language: Python (Scikit-learn) Environment: Jupyter Notebook, VS Code.

• Models:

Multiple Linear Regression.

Polynomial Regression.

Model Avaluation:

• Multiple Linear Regression:

Train the Multiple Linear Regression model on the training set and predict Sales on the test set.

Evaluation Metrics:

Mean Absolute Error (MAE): 1.5117

R-squared (R ): 0.86

Visualization: Plot the relationship between actual Sales and predicted Sales using Matplotlib in Python.

• Polynomial Regression:

Train the Polynomial Regression model on the training set and predict Sales on the test set.

Evaluation Metrics:

• Python (Intermediate) – Use

Pandas, NumPy, Matplotlib,

and Seaborn for Data

Processing and Visualization

• Machine Learning (Basic)

Use Scikit-Learn for Predictive

Analysis

Probability & Statistics

Descriptive Statistics

• Measures of Central

tendency

• Measures of variability

Soft Skills

• Communication Skill

• Problem Solving Skill

• Self-Study Skill

• Time Management Skill

Mean Absolute Error (MAE): 0.5906

R-squared (R ): 0.98

Visualization: Plot the relationship between actual Sales and predicted Sales using Matplotlib in Python.

Conclusion:

The Polynomial Regression model achieves higher accuracy compared to the Multiple Linear Regression model.

11/2023 - 11/2023

Data Scientist

Languages: Python

Data Collection: Used the "Boston Housing" dataset from the Scikit- learn library.

Data Cleaning & Basic Statistical Analysis:

• Tool/Language: Python with Pandas library.

• Tasks:

Check for missing values and count unique values.

Perform basic statistical analysis (mean, standard deviation, maximum, and minimum values).

Data Visualization:

• Tool/Language: Seaborn and Matplotlib in Python.

• Create a heatmap to visualize correlations between attributes. Model Building:

• Tool/Language: Python (Scikit-learn) Environment: Jupyter Notebook, VS Code.

• Model: Linear Regression.

• Model Training & Evaluation:

Training Data Performance:

R : 0.746

Adjusted R : 0.736

MAE (Mean Absolute Error): 3.089

RMSE (Root Mean Squared Error): 4.367

Test Data Performance:

R : 0.712

Adjusted R : 0.685

MAE: 3.859

RMSE: 5.482

Conclusion: • The model achieves an R score of 71.2%, indicating good predictive capability. The MAE and RMSE values on the test set do not signi cantly differ from the training set, demonstrating that the model does not suffer from severe over tting and maintains stable accuracy when applied to new data.

CERTIFICATIONS

Python for Data Analyst 2023

HackerRank SQL

(Advanced )

2025

© topcv.vn



Contact this candidate