Technical Skills: Java, Python(NLTK, sklearn, panda,numpy), Machine Learning(Tensorflow, MXNet), NLP, Data Structure, Software Development, R, Linux, Microsoft Office, SQL
Languages: Mandarin(native), English(fluent), Portuguese(Fluent) Academic Experience
Brandeis University Sept.2018- Present
Master of Computational Linguistics
Related Course: Information Extraction, Statistical Approaches to Natural Language Processing, Fundamentals of Computational Linguistics, Natural Language Annotation for Machine Learning, Introduction to Natural Language Processing with Python, Formal Semantics, Syntactic Theory, Mathematical Methods in Linguistics
University of International Business and Economics Sept. 2013- July. 2017 Bachelor of Portuguese Language(Major) / International Trade(Minor) Course Project
1. Neural network parser with the Penn TreeBank data Dec 2019
● Developed a project in Python to train a neural network parser with Penn TreeBank data.
● Used the encoder-decoder framework and conducted experiments with the various attention mechanisms to observe their eﬀect on the performance of the parser.
● Achieved 93% accuracy on the data set with scaled Luong attention 2. POS tag with Average Perceptron Algorithm Nov 2019
● Developed a project to label sentences with part of speech tags.
● Used perceptron algorithm and average perceptron to train the data and viterbi to decode. Took the bigram along with affixes as features.
● Achieved around 92% accuracy on the data set with affixes 3. Detect the discourse relationship with Convolutional Neural Network Oct 2019
● Developed a Python program that builds a CNN model with the data of Penn Discourse Treebank.
● Preprocessed raw data with Google News word vectors, used tensorflow and keras to build the model.
● Achieved around 74% accuracy on the explicit discourse relations 4. Building Logistic Regression Model for Sentiment Analysis Sept 2019
● Developed a Python program that builds a logistic regression model from scratch using posterior possibility.
● Used the mini-batch stochastic gradient descent to optimize the parameters. Implemented a test round after each epoch using development set data. Smooth the trained model by L2 regularization. 5. Song Lyric Mood Detection Annotation May 2019
● Developed a project to annotate the song lyrics with four types of mood and its intensity level.
● Wrote the Annotation Guidelines, used MAE to annotate data, set the gold standard and used NLTK metrics package to compute the Inter Annotator Agreement.
● First-authored a paper pending submission to a relevant workshop 6. Building Association of Computational Linguistics Corpus Dec 2018
● Collected papers published by the Association of Computational Linguistics in 2018 and developed a Python program that transforms those articles in PDF format into TXT format.
● Achieved adding the data into NLTK as a Corpus.
China Radio International July.2016- Aug.2016
Translated daily news from Portuguese to Chinese and published on China Radio International website. Collected and analyzed public sentiment.