Machine Learning, Deep Learning

Location:

Quan 1, 710000, Vietnam

Posted:

July 10, 2023

Contact this candidate

Resume:

PHẠM LÊ TRƯỞNG

LOCATION: Linh Xuan, TP. Thu Duc, HCM

SĐT:036*******

EMAIL: ***************@*****.***

GITHUB: https://github.com/PhamLeTruong

About

I am a final year student at University of Information Technology - National University of Ho Chi Minh City, majoring in Computer Science. With a great passion for working with data and models and I look forward to gaining more experience from real-world problems in the company. I am a dedicated person and a cooperative teammate. I find working with data and models both challenging and rewarding, and I am always looking for new things to learn and improve my skills, gaining valuable experience. I am currently looking for an internship in the field of machine learning. I am happy to be a part of your company and contribute my skills and knowledge to your company. Education

University of Information Technology, Ho Chi Minh City - Viet Nam Cumulative GPA: 8.3/10

Certifications

• HackerRank Problem Solving (Basic) Certificate

• HackerRank Problem Solving (Intermadiate) Certificate

• Coursera Problem Solving Using Computational Thinking Projects

SENTIMENT ANALYSIS OF STUDENT FEEDBACK

• Input: A comment on feedback from Vietnamese students.

• Output: Label 0, 1, 2 correspond to negative, neutral, positive.

• Preprocessing: remove stretch characters, teencode symbols, acronyms, special characters and tokenization.

• Feature extraction: use TfidfVectorizer and tuning the parameters(ngram, max_df, min_df, max_features, norm, ...).

• Solving The Class Imbalance Problem: Data augmentation(ADASYN, SMOTE, KMeansSMOTE, BorderlineSMOTE, SVMSMOTE).

• Machine learning model: Naive Bayes, XGBoost, Logistic Regression, Support Vector Machine, VotingClassifier, LSTM.

• Parameter tuning for models: GridSearchCV method for traditional machine learning model. For deep learning networks, adjusting each parameter (input_dim, output_dim, filters, kernel, activation, units, loss, optimizer) combined with EarlyStopping helps the model save good results during training.

• Evaluation metrics: Precision, Recall, F1-score.

• Deployment: build web application with streamlit to classify student feedback.

• Source code: link

GOLD PRICES - TIME SERIES FORECASTING

• Input: Time series and corresponding gold price.

• Output: gold price at next time.

• Preprocessing: sort time series from smallest to largest in fixed format (use excel, python), gold price standardization

(MinMaxScaler, StandardScaler).

• Deep learning model: LSTM, CNN-LSTM.

• Parameter tuning for models: tuning each parameter (input_dim, output_dim, filters, kernel, activation, units, loss, optimizer) combined with EarlyStopping helps the model save good results during training.

• Deployment: build a web application that allows users to:

upload datasets.

train models directly on the website.

result in gold price in the next 30 days, data visualization diagrams and measure metrics (RMSE, MAPE, MAE).

• Source code: link

CRAWL COMMENT DATA FROM TIKI, LAZADA FOR SENTIMENT ANALYSIS

• Crawl data from Tiki, Lazada (BeautifulSoup, Selenium).

• Data Annotation.

• Data preprocessing: convert back to lowercase, remove extended characters, links, special characters, stop words, emoji

• Feature extraction: use CountVectorizer and TfidfVectorizer (unigram and bigram).

• Machine learning model: Naïve Bayes.

• Evaluation metrics: Precision, Recall, F1-score.

• Source code: link

TEXT INFORMATION RETRIEVAL SYSTEM

• Dataset: Cranfield.

• Document preprocessing: Convert lowercase, remove punctuation, remove stopword, standard number, stemming(LancasterStemmer).

• Build a vocabulary list and index for the document set.

• Calculating TF-IDF weights according to the SMART system.

• Query preprocessing: similar to document preprocessing, in addition to using query expansion.

• Model: Vector Space Model(VSM), Best Matching 25 (BM25).

• Calculate similarity of query and document: cosine similarity(VSM), formula score(BM25).

• Rank the returned documents from high to low.

• Evaluation metrics: Precision, Recall, Map interpolate(TREC).

• Source code: link

Skills

• Language: Python.

• Web: streamlit.

• Visualization: matplotlib, seaborn, plotly.

• Others: Excel, Word, PowerPoint.

• Pursing: Statistic, Machine Leaning, Deep Learning.

Contact this candidate