Data Engineer

Location:

Posted:

November 19, 2018

Resume:

Xinrou (Rose) Li

******@********.*** https://github.com/xinrouli https://www.linkedin.com/in/xinrouli 631-***-**** 515 W 38th Street, New York, NY, United States, 10018 EDUCATION

Columbia University New York, NY

M.A. – Statistics (Quality for STEM) Expected Graduation Date: Dec. 2018 GPA: 3.50/4.00

Relevant Courses: Probability, Statistical Inference, Linear Regression Models, Statistical Machine Learning, Applied Data Science, Introduction to Databases, Advanced Data Analysis, Advanced Machine Learning. Stony Brook University (SBU) Stony Brook, NY

Double Major: B.S. – Applied Mathematics and Statistics(AMS); Economics(ECO) May 2017 AMS Major GPA: 3.96/4.00; ECO Major GPA: 3.95/4.00 Relevant Courses: Mathematical Statistics; Introduction to Quantitative Finance; Econometrics; Financial Mathematical. TECHNICAL SKILLS

Highly skilled – Python (NumPy, Pandas, scikit-learn, Tensorflow, genism, nltk), R, MySQL, Git, HTML, FLASK, R Shiny Intermediate – AWS, Google Cloud, JavaScript, CSS, SAS Certified Base Programmer for SAS 9 – SAS PROFESSIONAL EXPERIENCE

OneConnect Lab, Pactera New York, NY

AI Machine Learning Engineer Intern Oct. 2018 – Present

• Collaborate with other engineers to build a data-driven chatbot powered by machine learning models and improve the product by making chatbot memorize the context of dialogue history.

• Redefine the evaluation methods and add them to the current pipeline.

• Replace the low-level DBAPI with ORM mapper and make the code base easier to maintain and manage. Tencent Holdings Limited Shen Zhen, China

Summer Intern, Data and AI team, Research and Development Center, Customer Services Department Jun. 2018 – Aug. 2018

• Gained insights into technology industry and natural language processing.

• Data Preprocessing: utilized Python clean 300,000+ dirty text data, process segmentation and train word2vec.

• Data Modeling: classified over hundreds intent labels and increased accuracy rate from 70% to 90% by performing Naïve Bayes, SVM, Random forest, Ada Boost, Logistic Regression, and Neural Networks; implemented generative models such as seq2seq by Tensor Flow.

• Data Visualization: generated diverse data graphs by using matplotlib, ggplot, plotly and word cloud in Python. PROJECTS EXPERIENCE

Database Project (SQL, Python, Flask, Google Cloud), Columbia University New York, NY Group Member Oct. 2018 – Present

• Design the database ER diagram based on US flight data combined with user info entity for storing each user’s feedback.

• Implement the database by translating the ER diagram into a database schema in Google Cloud(SQL)

• Implement the application that access and modifies the database(FLASK) Twitter – Trade War Analysis (Python, HTML), Columbia University New York, NY Group Member May 2018

• Data Preprocessing: used Twitter API to crawl Twitter text with "Trade War" as a keyword and cleaned dirty data in Python.

• Data Manipulation and Analysis: 1. discovered the leading opinions according to the re-twitter numbers; 2. performed sentiment analysis by using the Vader model; 3. found the main arguments of the positive and negative views by LDA model; 4. associated the Dow Jones index with the positive sentiment of trade war and analyzed the relationship between these two.

• Data Visualization: built a website by using Python Anywhere, HTML, Java Script and other languages and realized a real- time sentiment analysis of text in a sub-webpage.

Predictive Analytics (Image Data) Project (R), Columbia University New York, NY Group Member Mar. 2018

• Identified the object in the test data set using Random Forest model based on SIFT, PCA, HOG and GRAY features from 3000 images, and compared with the results from other teammates’ models, such as SVM, Ada Boost and Logistic Regression. NYC Crime and Party Event Project (R, R Shiny), Columbia University New York, NY Group Member Feb. 2018

• Business Goal: helped people who want to host a party in New York City with a high crime rate to find the safest place.

• Data Analysis: provided valuable analysis and information about the six different party types and seven different crimes types.

• Data Visualization: visualized NYC party and crime locations by using R Shiny App from datasets which contain over 100,000 observations in main page and showed interactive bar chart, line chart and so on in sub-pages. Additional Information

Honors: Award of Honor (Applied Mathematics and Statistics Graduates GPA Top 10, May 2017, SBU), Dean’s List (All Semesters, SBU), Outstanding Academic Achievement Award (GPA 4.0, Fall 2015, Fall 2016, SBU) Interests: Travel (Coldfoot inside the Arctic Circle, Nov. 28, 2015)

Contact this candidate