Data Scientist

Location:

Posted:

January 07, 2018

Resume:

WEN LIANG

**** ******* ******, *** **** La Jolla, CA, 92092 Phone: 858-***-**** Email: **************@*****.*** Links: github.com/excelmaxx/ www.linkedin.com/in/wen-liang-bb7a2a129/ Objective

Seeking the Data Scientist position.

Education Background

University of California, San Diego - Master of Science 09/2016-present Jacob School of Engineering, Electrical and Computer Engineering (GPA 3.6/4.0) Coursework: Machine Learning, Web Mining, Recommender Systems, Big Data Analytics, Statistics, Algorithms Nankai University - Bachelor of Science 09/2012-06/2016 School of Electronic Information and Optical Engineering, Electrical Engineering Coursework: C++ Programming, OO Design, Embedded System Development, Probability theory and Basic Statistics Skills

Programming Language: Python (Scipy/Numpy/scikit-learn/Pandas/Pyspark), Java, SQL (proficient), Bash, Matlab, NoSQL, R (familiar), C, C++, Ruby (working knowledge) Tools and Techniques: Hadoop, Hive, Spark, Map Reduce, Git, Docker, MongoDB, Tensorflow, Keras, AWS Work & Internship Experience

Data scientist intern, “Y” (AI supply chain management) department, JD.com, Inc. 06/2017-09/2017 Ø Built models to predict GMV and item volume by SKU category and imporved the accuracy by 8% Ø Used Spark, Hive and Pandas in feature engineering and promotion analysis, Pyplot and Seaborn in data visualization, XGBoost, LR, RNN and ensemble in modeling and cross validation in evaluation and tuning Ø Research in feature learning “SKU2vec” using relation between items in users purchase history. Software engineer intern, APPs department, Spreadtrum Communications, Inc. 10/2015-01/2016 Ø Developed an easy-to-use CPU design test and benchmark web application with various benchmark test cases, adaptive process control, results uploading and archiving using Python, Shell Script and C. Project Experience

Chatbot (Powered by Deep Learning and NLP) 08/2017-present Ø Developed a chatbot that supports basic conversations and trained on data from Reddit, Microsoft and others Ø Implemented seq2seq (Sequence to Sequence) model with word2vec word embedding, multi-layer bidirectional LSTM and additive attention mechanism using tensorflow Ø Used classification of questions to ensemble models for different situations. Movie Recommender System 01/2017-03/2017

Ø Developed a movie recommender system can provide personalized recommendations that suit users taste Ø Implemented MapReduce jobs to process the large movie rating dataset from Netflix using Hadoop and Java Ø Used the item-based collaborative filtering algorithm, obtained and merged users rating matrix and the items co-occurrence matrix to get recommendation results. Review Helpfulness and Product Rating Prediction 02/2017-03/2017 Ø Built latent-factor models, applied SVM, PCA and Adaboost methods in prediction and got top 15% of the class Ø Processed more than 8G datasets of Amazon reviews and items Ø Utilized ANOVA, chi-square test and Pearson correlation statistics in data analysis and feature selection. Word Auto-complete and N-Gram model 10/2016-12/2016 Ø Constructed a n-gram word library using MapReduce from Wikipedia corpus and connected the model to interfaces to implement real-time word auto-complete on a webpage. Ø Built the language model based on relative n-gram frequencies and stored it in MySQL database Wearable Human Motion Track Recording and Evaluation Device 04/2014-02/2016 Ø Lead the design and development of a wearable device which can record and analyze human motion data and coach users’ motion (2 published paper and a patent) Ø Developed the interfaces of TI MCU with sensors and screen, implemented Kalman filter algorithm using C.

Contact this candidate