PHẠM LÊ TRƯỞNG
LOCATION: Linh Xuan, TP. Thu Duc, HCM
SĐT:036*******
EMAIL: ***************@*****.***
GITHUB: https://github.com/PhamLeTruong
About
I am a final year student at University of Information Technology - National University of Ho Chi Minh City, majoring in Computer Science. With a great passion for working with data and models and I look forward to gaining more experience from real-world problems in the company. I am a dedicated person and a cooperative teammate. I find working with data and models both challenging and rewarding, and I am always looking for new things to learn and improve my skills, gaining valuable experience. I am currently looking for an internship in the field of machine learning. I am happy to be a part of your company and contribute my skills and knowledge to your company. Education
University of Information Technology, Ho Chi Minh City - Viet Nam Cumulative GPA: 8.3/10
Certifications
• HackerRank Problem Solving (Basic) Certificate
• HackerRank Problem Solving (Intermadiate) Certificate
• Coursera Problem Solving Using Computational Thinking Projects
SENTIMENT ANALYSIS OF STUDENT FEEDBACK
• Input: A comment on feedback from Vietnamese students.
• Output: Label 0, 1, 2 correspond to negative, neutral, positive.
• Preprocessing: remove stretch characters, teencode symbols, acronyms, special characters and tokenization.
• Feature extraction: use TfidfVectorizer and tuning the parameters(ngram, max_df, min_df, max_features, norm, ...).
• Solving The Class Imbalance Problem: Data augmentation(ADASYN, SMOTE, KMeansSMOTE, BorderlineSMOTE, SVMSMOTE).
• Machine learning model: Naive Bayes, XGBoost, Logistic Regression, Support Vector Machine, VotingClassifier, LSTM.
• Parameter tuning for models: GridSearchCV method for traditional machine learning model. For deep learning networks, adjusting each parameter (input_dim, output_dim, filters, kernel, activation, units, loss, optimizer) combined with EarlyStopping helps the model save good results during training.
• Evaluation metrics: Precision, Recall, F1-score.
• Deployment: build web application with streamlit to classify student feedback.
• Source code: link
GOLD PRICES - TIME SERIES FORECASTING
• Input: Time series and corresponding gold price.
• Output: gold price at next time.
• Preprocessing: sort time series from smallest to largest in fixed format (use excel, python), gold price standardization
(MinMaxScaler, StandardScaler).
• Deep learning model: LSTM, CNN-LSTM.
• Parameter tuning for models: tuning each parameter (input_dim, output_dim, filters, kernel, activation, units, loss, optimizer) combined with EarlyStopping helps the model save good results during training.
• Deployment: build a web application that allows users to:
upload datasets.
train models directly on the website.
result in gold price in the next 30 days, data visualization diagrams and measure metrics (RMSE, MAPE, MAE).
• Source code: link
CRAWL COMMENT DATA FROM TIKI, LAZADA FOR SENTIMENT ANALYSIS
• Crawl data from Tiki, Lazada (BeautifulSoup, Selenium).
• Data Annotation.
• Data preprocessing: convert back to lowercase, remove extended characters, links, special characters, stop words, emoji
• Feature extraction: use CountVectorizer and TfidfVectorizer (unigram and bigram).
• Machine learning model: Naïve Bayes.
• Evaluation metrics: Precision, Recall, F1-score.
• Source code: link
TEXT INFORMATION RETRIEVAL SYSTEM
• Dataset: Cranfield.
• Document preprocessing: Convert lowercase, remove punctuation, remove stopword, standard number, stemming(LancasterStemmer).
• Build a vocabulary list and index for the document set.
• Calculating TF-IDF weights according to the SMART system.
• Query preprocessing: similar to document preprocessing, in addition to using query expansion.
• Model: Vector Space Model(VSM), Best Matching 25 (BM25).
• Calculate similarity of query and document: cosine similarity(VSM), formula score(BM25).
• Rank the returned documents from high to low.
• Evaluation metrics: Precision, Recall, Map interpolate(TREC).
• Source code: link
Skills
• Language: Python.
• Web: streamlit.
• Visualization: matplotlib, seaborn, plotly.
• Others: Excel, Word, PowerPoint.
• Pursing: Statistic, Machine Leaning, Deep Learning.