Machine Learning / Data Scientist - Natural Language Processing

Location:

Bellevue, WA

Posted:

November 17, 2015

Contact this candidate

Resume:

Alice Leung

+1-425-***-**** *************@*****.*** +33-069*******

Objective

I am actively looking for a full-time or internship opportunity to begin in February 2016, where I can contribute to the company's work in NLP. The only task remaining of my final year of study is the Master’s thesis, the topic of which will be the progress that I achieve at the company.

Education

Master in Language and Communication Technologies (Erasmus Mundus Masters) (current GPA: 4.0 of 4.0) Second year of competitive two-year joint master’s degree to be completed in two member universities of the EM consortium, in two different countries. Provides students with education and training to prepare for a career in NLP research or industry. University of Lorraine – Nancy, France 2016 (Expected) Studying grammar formalisms of syntax and semantics, discourse analysis, machine learning, formal logic and language theory, NLP tools and linguistic resources, knowledge representations and discovery, advanced web technologies, data mining University of the Basque Country – San Sebastian, Spain (GPA: 4.0 of 4.0) 2015 Studying computational syntax and semantics, machine learning, corpus analysis, machine translation, statistical analysis, artificial intelligence, information retrieval, applications of NLP technologies Master of Arts in Language Studies (GPA: 3.44 of 4.0) 2012 Hong Kong Baptist University - Hong Kong, China

Studied properties of language, phonetics, phonology, morphology, semantics, syntax, second language acquisition, language education, pragmatics, discourse analysis, English as a world language Bachelor of Arts in Communication, French (GPA: 3.66 of 4.0) 2011 University of Washington - Seattle, WA USA

Double Major, honored consecutively on Dean’s List Technical Skills

Programming Languages: Python, Java, Prolog, HTML/CSS, Javascript, JQuery, AJAX, PHP, SQL NLP Tools: NLTK, OpenNLP, FOMA, GATE, InfoMap, UKB, Protege Machine Learning Tools: Apache Spark, Weka, Java Mallet, Python SciKit Human Languages: English, Cantonese and Mandarin Chinese, Spanish, French Experience

Research Intern US Army Research, Multilingual Computing Branch – Adelphi, MD USA July-Sep 2015 Contributed to ongoing research projects at the Multilingual Computing section of the Computational and Information Sciences Directorate. Individual contributions include: writing a high-performing POS tagger using machine learning methods, and scaling it to handle large data sets or large feature sets using Apache Spark software to parallelize the operations on multiple machines; editing and extending a dependency tree editor program for illustrating and annotating dependencies of a text, using MXGraph; utilizing various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performance such as Corbit for Chinese Treebank parsing and Morphochains for automatic morphological segmentation. Research Assistant (Intern) IXA NLP Research Group - San Sebastian, Spain Jan-June 2015 Worked with NewsReader project, a multinational project to build structured event indexes from millions of daily news articles to assist professionals in any sector to make well-informed decisions. Contributed to evaluation and annotation of entities and events based on knowledge of English semantics and annotation guidelines, and perfected the overall semantic annotation for the production of a gold standard; reviewed and defended annotation disputes. Bilingual and Native English Teacher Various schools and academies – Hong Kong Spain 2011 – 2015 Delivered courses of advanced English to Spanish working adults, particularly exam prep for the Cambridge Advanced Exam. Also delivered various courses at primary and secondary schools in Hong Kong including IELTS, A-Level English Writing prep, public speaking, and general English. Exercised class management and communication with and among students. Courses taught in English, but occasionally in Cantonese, Mandarin Chinese, or Spanish as needed by the student. Responsible for explaining advanced grammar and vocabulary, and training students' writing, listening, and speaking skills. Successfully drove most students to achieve a passing grade in all sections of the exams. Chinese Teacher Academia Acces – Barcelona, Spain 2014 Taught beginning Mandarin Chinese in small-group setting to young students. Courses taught only in Spanish. Planned and delivered effective lessons and enriched students’ progress. Succeeded in bringing students from zero Chinese background to basic conversational level of Chinese.

Public Relations Officer Hong Kong Student Association at University of Washington - Seattle, WA USA 2010 – 2011 Promoted the club and its mission to the greater university community to spread awareness of the club. Engaged in negotiations with diverse businesses to recruit sponsorships and increase membership to fund activities. Mediated with the federal government to negotiate monetary settlements. Responded to inquiries/complaints from members and liaised to management officers. Academic Projects

Training a hypertagger for syntactic label prediction: Created a parallel corpus of the GMB (Groningen Meaning Bank) corpus containing a) semantic descriptions extracted from the DRS XML files and b) syntactic labels automatically extracted from CCG sequences, based on a set of mapping rules from CCG sequences to syntactic labels. Trained a high performing hypertagger, using Java Mallet, which correctly predicted syntactic labels given an input of semantic descriptions. Real-time Tweet Sentiment Analyzer: Using Twitter4J to access Twitter’s API, designed and implemented a Java program that requested hundreds of real-time tweets from multiple major English- and Spanish-speaking cities around the world, ran the tweets through NLP pipelines, and analyzed them for sentiment to produce snapshots of those cities’ sentiment in that instant. The NLP pipeline included the tokenizer, POS tagger, and sentiment analyzer from IXA pipes in both languages. Also implemented a preliminary GUI so that a user may select specific cities to compare sentiment scores in different cities. Automatic multiple-choice question generator: Programmed an automatic generator that, given a text as input, generated relevant multiple choice questions. Questions were formed from sentences using Hobb’s algorithm to replace answer phrases with corresponding question words and invert word order as necessary. NLP tools used include NLTK, Stanford Parser, UKB for information from WordNet, semantic vectors

Highest similarity sentence retriever in Python: Programmed a highest similarity sentence retriever using six different metrics in Python. Search terms defined by user input and most similar sentence in the text file returned, based on each metric. Web crawler and keyword counter in Python: Designed and implemented a web crawler in Python to crawl websites and return aggregated metric, such as keyword count. Scope of web crawl limited to parsing top thirty links per domain. T9 predictive text algorithm and spell checker in FOMA: Wrote a program with FOMA to return a word match or matches corresponding to a user inputted number, as inputted on a T9 3x4 numeric keypad. A database of dictionary words was used, from which potential matches were retrieved. Also implemented a spell checker with FOMA to recognize misspelled words and to provide recommendations for potential word choices based on “cost” of correction. Machine accent predictor and verb conjugator for Spanish in FOMA: Built a transducer in FOMA that marks the spoken stress on the correct vowel in regular words (which do not carry a stress mark in the orthography), based on rules regarding Spanish strong and weak vowels, syllable count, and diphthongs. Designed and implemented a transducer that returned the correct verb conjugation of regular and irregular Spanish verbs in various tenses and participles for different subject pronouns. Also designed a similar transducer for English adjective derivations for comparative and superlative forms and verb derivations for subject pronouns.

XML file dependency reader in Python: Wrote a program in Python with XML file as input to retrieve all lemmas related to a lemma by a dependency. User inputs a type of dependency (SBJ, AMOD, PMOD, etc) and a lemma, which returns all lemmata with said dependency related to lemma.

Grammar design of morphophonological analyzer for Basque and Koromfe in FOMA: Performed a morphological analysis of Basque, and the Niger-Congo language Koromfe. Programmed an analyzer in FOMA with the morphophonological rules to correctly attain 100% of inflected forms in both target languages.

Contact this candidate