Post Job Free
Sign in

Data Scientist

Location:
Champaign, IL
Posted:
March 03, 2021

Contact this candidate

Resume:

Tianqi Wu

tianqi-wu *****@********.*** wutianqidx 217-***-****

OBJECTIVE

To leverage data analysis to support decision making and deliver efficient solutions to business problems EDUCATION

University of Illinois at Urbana Champaign Champaign, IL MS in Statistics GPA: 4.00/4.00 Aug 2019 - Dec 2020 MS in Industrial Engineering (Advanced Analytics) GPA: 3.85/4.00 Aug 2017 - May 2019 BS in Industrial Engineering (Math, CS minor) Aug 2013 - May 2017 EXPERIENCE

Xiaomi Technology Beijing, China

Applied Scientist Intern Jun - Aug 2019

Prepared and processed raw text of over 70,000 Chinese poems

Implemented Seq2Seq with attention to generate acrostic poems using TensorFlow

Achieved 30% better performance of artistic conception, fluency and diversity than RNN

Modified regular expressions to increase F1 score by 20% for chatbot’s dialogue feature Sina Corporation Beijing, China

Data Scientist Intern Jun - Aug 2018

Implemented web crawler to collect daily HTML source codes of 10 Sina webpages

Built an analyzer to detect the potential cyber attack based on calculation of cosine similarities between past and current webpages using Python

Created a database to manage the monitor and contact list using MySQL PROJECT

LendingClub Loan Status Prediction Aug - Dec 2019

Performed feature selection through exploratory data analysis on 1.5 million records with imbalanced classes

Preprocessed data with standardization, categorical variable encoding and missing data imputation

Implemented Logistic Regression, Naive Bayes, Decision Tree, Random Forest and MLP for comparison

Improved performance from 0.78 accuracy and 0.08 F1 score to 0.65 accuracy and 0.68 F1 score with technique of undersampling, regularization, cross-validation and parameter tuning Top Skills Employers Look For Aug - Dec 2020

Crawled and examined HTML source code of 5,000 job postings from Amazon.jobs

Extracted keywords from basic and preferred qualifications

Visualized skill sets for different job titles with WordCloud and bar chart using Python

Deployed the Dash web application using AWS Elastic Beanstalk RESEARCH

Medical Text Generation Jan - May 2020

Processed 16,950 brain activity image data with corresponding medical reports

Utilized CNN to extract features from images and encode as embeddings for text generation

Implemented Transformer model to align CNN features with medical descriptions and generate reports

Achieved 0.561 BLEU@1 on test data with implementation in PyTorch Yelp Review Sentiment Analysis Aug - Dec 2019

Preprocessed data with NLP techniques such as stemming, lemmatization and TF-IDF vectorization to transform unstructured review text into numeric data set

Implemented LSTM, BiLSTM and BiLSTM+Attention models for Yelp Review Polarity

Compared performance with TF-IDF+logistic regression, BOW, RNN and BERT

Achieved 96.2% accuracy on 560,000 training and 38,000 testing samples ADDITIONAL

Relevant Coursework: Data Structures and Algorithms, Machine Learning, Deep Learning

Statistical Knowledge: A/B Testing, Probability Distribution, Regression, Forecasting

Programming Languages: Python, SQL, R, Git, SAS, PyTorch, TensorFlow

Models: Linear/Logisitc Regression, Random Forest, KNN, SVM, CNN, RNN, BERT



Contact this candidate