NGUYỄN PHƯƠNG TÙNG
Data Scientist Intern
Data Analyst Intern
Male
*******************@*****.***
Đông Hoà, Dĩ An, tỉnh Bình Dương
https://github.com/NgPhTung
PROFESSIONAL SKILLS
• Data Cleaning
• Data Exploration & Visualization
• Data Analysis & Modeling
• Computer Vision, NLP
• Machine Learning, Deep learning
PROGRAMMING LANGUAGES
• SQL (MySQL): Proficient in data
querying, analysis, and optimization.
• Python (NumPy, Pandas, Matplotlib,
Seaborn, SciPy, Scikit-learn,
TensorFlow, Keras, ...): Skilled in data
extraction, transformation, data
processing, visualization, analysis
and machine learning.
• Java, JavaScript, C#, C++, C:
Understanding of object-oriented
programming concepts and proficient
in programming.
TECHNOLOGIES & TOOLS
Technologies: Apache Hadoop, Apache
Hive, PySpark, HTML, CSS.
Tools:
• MySQL Server, Azure Data Studio
• VMware Workstation
• Google Colab, Kaggle, Jupyter
Notebook
OBJECTIVE
Short-term goals:
• Graduate from university ahead of schedule.
• Gain solid experience through practical work and projects to confidently enhance personal skills.
Long-term goals:
• Develop substantial expertise to become a data analyst/scientist, aiming to contribute significantly to the company's success. EDUCATION
October 2021 - Present
UNIVERSITY OF INFORMATION TECHNOLOGY - VNU HCMC Computer Science 4th year student - Classifcication: Good (GPA: 8.34/10) COURSE PROJECTS
02/2024 - 06/2024
DROPOUT PREDICTION Data Mining
• Based on the MOOC dataset from Tsinghua University in China, which records students' learning information through online courses, my team explored, preprocessed data, and extracted insights. We built models with Azure Machine Learning to predict student dropout rates and deployed these predictions as a web app.
• Using: Python
• My responsibilities: Data exploration, data analysis, data preprocessing, feature engineering and data visualization.
• Link:
https://drive.google.com/drive/folders/1KcWKjE3AAISOwZVaQ0Q9bYTwjM_6Oz d0?usp=sharing
02/2024 - 06/2024
CUSOMER SEGMENTATION USING K-MEANS CLUSTERING Data Analyst
• With a dataset available on Kaggle, treated as Big Data, our team utilized PySpark DataFrame for preprocessing and implemented the K-means algorithm to cluster customers using PySpark RDD.
• Using: Python, PySpark.
• My responsibilities:
Researching the meaning and information of the columns in the dataset.
Conducting exploratory data analysis (EDA) and visualization using Python.
Processing the data (transformation, encoding, etc.) using PySpark DataFrame.
• Link: https://github.com/NgPhTung/Bigdata_Kmeans 02/2024 - 07/2024
REAL-TIME RESTAURANT REVIEWS ASPECT SENTIMENT QUAD PREDICTION Data Engineer
• Developed a system for real-time prediction of aspect-based sentiment using Aspect Sentiment Quad Prediction (ASQP) from restaurant reviews. Utilizing Big Data tools such as Apache Spark and Apache Kafka, the team successfully trained a T5 model, achieving an F1-Score of 0.5883 on the test set.
• Using: Apache Spark, PySpark, Apache Kafka, T5 Model.
• My responsibilities:
Integrated Spark Streaming with Kafka to ingest data from the source in batches.
Utilized PySpark to query and preprocess data from Kafka using SQL.
Trained various T5 models (small, base, etc.) using a different dataset.
• Link: https://github.com/NgPhTung/DS200_Bigdata_Analysis
• Brackets
• Microsoft: Word, Excel, PowerPoint
SOFT SKILLS
• Attention to Detail
• Problem solving
• Time management
• Collaboration
Hobbies
● Walking, watching movies,
reading books, swimming,
playing badminton
09/2023 - 12/2023
PYTHON FOR MACHINE LEARNING Machine Learning
• Analyzed and preprocessed two datasets, Diabetes and Compas, focusing on both training and testing phases. For each dataset, trained various machine learning models, including Logistic Regression, KNN, SVM, Gaussian Naive Bayes, MLP, Decision Tree, Random Forest, Gradient Boosting, LightGBM, XGBoost, AdaBoost and SGD Classifier. Applied techniques such as GridSearchCV and cross-validation to optimize model performance. Additionally, utilized StandardScaler, OrdinalEncoder, and RobustScaler to enhance data preparation and model accuracy.
• Using: Python (Google Colab, Kaggle, Jupyter Notebook)
• My responsibilities:
Data preprocessing, encoding, feature engineering, exception handling, visualization and reporting.
Model training, fine-tuning and experimentation with normalization techniques.
• Link: https://github.com/NgPhTung/CS116_Python_for_machine_learning.git 02/2024 - 06/2024
FRUIT CLASSIFICATION Computer Vision
• Implemented KNN, SVM, and Random Forest models to classify fruit images into different categories. After experimenting with various techniques, HOG yielded the best results. Additionally, I applied a voting model technique to enhance the accuracy of predictions.
• Using: Python
• My responsibilities: Data collection, model building, training, fine-tuning, and experimenting with different techniques.
• Link: https://github.com/NgPhTung/CS231_Computer_Vision.git 09-2023 - 12-2023
TEXT CLASSIFICATION OF NEWS ARTICLES NLP
• Implemented models such as Naive Bayes, Logistic Regression, MLP, and Text- CNN using the Bag of Words technique to classify article titles and summaries into one of four categories: World, Sport, Business, or Sci/Tech.
• Using: Python
• Responsibilities: Data preprocessing (e.g., HTML tag removal, tokenization, lemmatization), model building, training, and fine-tuning Naive Bayes, Logistic Regression, MLP, and Text-CNN for text classification.
• Link: https://github.com/NgPhTung/CS221.git
CERTIFICATES
30/09/2023
Toeic L&R: 625
© topcv.vn