Machine Learning Data Analyst

Location:

Quan 1, 71000, Vietnam

Posted:

August 27, 2024

Contact this candidate

Resume:

NGUYỄN PHƯƠNG TÙNG

Data Scientist Intern

Data Analyst Intern

**/**/****

Male

078*-***-***

*******************@*****.***

Đông Hoà, Dĩ An, tỉnh Bình Dương

https://github.com/NgPhTung

PROFESSIONAL SKILLS

• Data Cleaning

• Data Exploration & Visualization

• Data Analysis & Modeling

• Computer Vision, NLP

• Machine Learning, Deep learning

PROGRAMMING LANGUAGES

• SQL (MySQL): Proficient in data

querying, analysis, and optimization.

• Python (NumPy, Pandas, Matplotlib,

Seaborn, SciPy, Scikit-learn,

TensorFlow, Keras, ...): Skilled in data

extraction, transformation, data

processing, visualization, analysis

and machine learning.

• Java, JavaScript, C#, C++, C:

Understanding of object-oriented

programming concepts and proficient

in programming.

TECHNOLOGIES & TOOLS

Technologies: Apache Hadoop, Apache

Hive, PySpark, HTML, CSS.

Tools:

• MySQL Server, Azure Data Studio

• VMware Workstation

• Google Colab, Kaggle, Jupyter

Notebook

OBJECTIVE

Short-term goals:

• Graduate from university ahead of schedule.

• Gain solid experience through practical work and projects to confidently enhance personal skills.

Long-term goals:

• Develop substantial expertise to become a data analyst/scientist, aiming to contribute significantly to the company's success. EDUCATION

October 2021 - Present

UNIVERSITY OF INFORMATION TECHNOLOGY - VNU HCMC Computer Science 4th year student - Classifcication: Good (GPA: 8.34/10) COURSE PROJECTS

02/2024 - 06/2024

DROPOUT PREDICTION Data Mining

• Based on the MOOC dataset from Tsinghua University in China, which records students' learning information through online courses, my team explored, preprocessed data, and extracted insights. We built models with Azure Machine Learning to predict student dropout rates and deployed these predictions as a web app.

• Using: Python

• My responsibilities: Data exploration, data analysis, data preprocessing, feature engineering and data visualization.

• Link:

https://drive.google.com/drive/folders/1KcWKjE3AAISOwZVaQ0Q9bYTwjM_6Oz d0?usp=sharing

02/2024 - 06/2024

CUSOMER SEGMENTATION USING K-MEANS CLUSTERING Data Analyst

• With a dataset available on Kaggle, treated as Big Data, our team utilized PySpark DataFrame for preprocessing and implemented the K-means algorithm to cluster customers using PySpark RDD.

• Using: Python, PySpark.

• My responsibilities:

Researching the meaning and information of the columns in the dataset.

Conducting exploratory data analysis (EDA) and visualization using Python.

Processing the data (transformation, encoding, etc.) using PySpark DataFrame.

• Link: https://github.com/NgPhTung/Bigdata_Kmeans 02/2024 - 07/2024

REAL-TIME RESTAURANT REVIEWS ASPECT SENTIMENT QUAD PREDICTION Data Engineer

• Developed a system for real-time prediction of aspect-based sentiment using Aspect Sentiment Quad Prediction (ASQP) from restaurant reviews. Utilizing Big Data tools such as Apache Spark and Apache Kafka, the team successfully trained a T5 model, achieving an F1-Score of 0.5883 on the test set.

• Using: Apache Spark, PySpark, Apache Kafka, T5 Model.

• My responsibilities:

Integrated Spark Streaming with Kafka to ingest data from the source in batches.

Utilized PySpark to query and preprocess data from Kafka using SQL.

Trained various T5 models (small, base, etc.) using a different dataset.

• Link: https://github.com/NgPhTung/DS200_Bigdata_Analysis

• Brackets

• Microsoft: Word, Excel, PowerPoint

SOFT SKILLS

• Attention to Detail

• Problem solving

• Time management

• Collaboration

Hobbies

● Walking, watching movies,

reading books, swimming,

playing badminton

09/2023 - 12/2023

PYTHON FOR MACHINE LEARNING Machine Learning

• Analyzed and preprocessed two datasets, Diabetes and Compas, focusing on both training and testing phases. For each dataset, trained various machine learning models, including Logistic Regression, KNN, SVM, Gaussian Naive Bayes, MLP, Decision Tree, Random Forest, Gradient Boosting, LightGBM, XGBoost, AdaBoost and SGD Classifier. Applied techniques such as GridSearchCV and cross-validation to optimize model performance. Additionally, utilized StandardScaler, OrdinalEncoder, and RobustScaler to enhance data preparation and model accuracy.

• Using: Python (Google Colab, Kaggle, Jupyter Notebook)

• My responsibilities:

Data preprocessing, encoding, feature engineering, exception handling, visualization and reporting.

Model training, fine-tuning and experimentation with normalization techniques.

• Link: https://github.com/NgPhTung/CS116_Python_for_machine_learning.git 02/2024 - 06/2024

FRUIT CLASSIFICATION Computer Vision

• Implemented KNN, SVM, and Random Forest models to classify fruit images into different categories. After experimenting with various techniques, HOG yielded the best results. Additionally, I applied a voting model technique to enhance the accuracy of predictions.

• Using: Python

• My responsibilities: Data collection, model building, training, fine-tuning, and experimenting with different techniques.

• Link: https://github.com/NgPhTung/CS231_Computer_Vision.git 09-2023 - 12-2023

TEXT CLASSIFICATION OF NEWS ARTICLES NLP

• Implemented models such as Naive Bayes, Logistic Regression, MLP, and Text- CNN using the Bag of Words technique to classify article titles and summaries into one of four categories: World, Sport, Business, or Sci/Tech.

• Using: Python

• Responsibilities: Data preprocessing (e.g., HTML tag removal, tokenization, lemmatization), model building, training, and fine-tuning Naive Bayes, Logistic Regression, MLP, and Text-CNN for text classification.

• Link: https://github.com/NgPhTung/CS221.git

CERTIFICATES

30/09/2023

Toeic L&R: 625

Contact this candidate