Python Software Engineer

Location:

Danville, CA

Posted:

April 13, 2019

Contact this candidate

Resume:

Jiayang Tian

linkedin.com/in/jiayang-tian 314-***-**** **************@*****.*** San Ramon, CA jiayangmike.com SKILLS

• Solid math and engineering background, 3 years experience in data science with concentration in machine learning

• Python (Scikit-Learn, TensorFlow, Keras, PyTorch), Java, SQL, JavaScript, Shell, R, MATLAB

• Database (MySQL, MongoDB), Big Data (Hadoop, Spark, Hive), Web Development (HTML&CSS, Flask, D3.js), AWS, Git, Docker EDUCATION

M.S. in Information Systems Washington University in St. Louis Saint Louis, MO Jan. 2017 - Dec. 2018

• Coursework: Machine Learning, Deep Learning, Cloud Computing, Advanced Algorithms, Nonlinear Optimization, Prescriptive Analytics B.E. in Electrical Engineering Beijing University of Posts and Telecommunications Beijing, China Sep. 2012 - June 2016

• Coursework: Java Programming, Data Structures & Algorithms, Mathematical Statistics, Database Systems, Computer Networks WORK EXPERIENCE

Data Scientist Intern Didi Technology - China biggest ride-sharing service Beijing, China May 2018 - Aug. 2018

• Responsible for driver churn prediction system including tagging analysis, data pipeline, feature engineering, model development, post- analysis, and online A/B tests, which resulted in 3.5% GMV growth and promotion strategies for millions of drivers (Python, Hive)

• Built and tuned the churn prediction model by XGBoost and achieved 0.71 recall, further deployed into production

• Improved driver capacity monitoring system with cities/drivers segmentation (Rule-based and Clustering)

• Participated in driver conversion research and built a many-to-many RNN model for driver activeness prediction TA & Software Engineer (volunteer) JulyEDU - an online AI education community Remote Jan. 2017 - Dec. 2017

• Developed a library from scratch (Python, Numpy) for educational purpose, which provides machine learning methods (SVM/GBDT/KMeans, etc.) with Sklearn-Style APIs, and a Keras-Style deep learning framework (ANN/CNN/LSTM, etc.)

• Built a toolkit which supports automated data cleaning and feature engineering (generation and selection) to simplify common tasks

• Conducted tests, documents, tutorials, QA sessions and weekly online lectures; Achieved more than thousands of class participants PROJECTS

Search Ads Web Service and Click-Through Rate(CTR) Prediction Sep. 2018 - Dec. 2018

• Implemented web crawler to collect data from E-Commerce website as Ads products (Java, JSoup)

• Built search Ads web service including query understanding, Ads selection, ranking, filtering and pricing (Java, MySQL, MemCached)

• Implemented query parsing, query rewrite algorithms by Word2Vec and PageRank (Python, Spark)

• Built Ads ranking algorithm based on CTR prediction (GBDT+LR), bid price and Query-Ads relevance (Python, Spark)

• Further Implemented and evaluated multiple models (DeepFM, Wide&Deep, etc.) for CTR and Ads ranking (Python, TensorFlow) Real-time Personalized News Reading and Analysis Platform Jan. 2018 - May 2018

• Implemented a data pipeline which monitors and scrapes latest news from websites (Python, MongoDB, Redis, RabbitMQ)

• Built a web (React, Node.js) for news browsing, and a log system to track users’ behaviors for preferences modeling (Python)

• Evaluated multiple models (TextCNN/RNN/RCNN/HAN, etc.) to classify news categories and achieved 88% accuracy (Python, Keras)

• Used Latent Dirichlet Allocation(LDA) for news topic modeling and visualization(PCA/t-SNE) to discover news latent structures

• Developed news keywords extraction algorithm (TF-IDF, textrank, Word2Vec, NER) to improve the system Scalable and Deep Learning-based Movie Recommendation System Sep. 2017 - Oct. 2017

• Built a scalable collaborative filtering movie recommendation engine and its web service (Spark, Python, Flask, AWS)

• Utilized Neural Networks and users/movies attributes as embedding for content-based recommendation (Python, TensorFlow)

• Further developed Autoencoder-based model for recommendation, which reduced RMSE metric by 8% than baseline A Helper for Reviews Text Analysis based on NLP Research Jan. 2017 - May 2017

• Applied polarity classifiers (SVM/LR) and dictionary-based method for reviews sentiment analysis, which achieved 93% accuracy

• Further built CNN-based model with Word2Vec/GloVe/POS to extract key aspects from reviews for fine-grained sentiment analysis

• Developed review text automatic summarization by Attention-based Seq2Seq model (Python, TensorFlow)

• Implemented APIs for the trained models, and built a simple Android App to help users analyze reviews COMPETITIONS

Online Loan Default Risk Controlling - predict probabilities of users repaying their loans on time @DataCastle, TOP 1%

• Explored multiple discretization and imputation methods for feature engineering; used SVM and XGBoost as baselines

• Built bagging of multiple XGBoost with different features to prevent overfitting; used semi-supervised learning for data augmentation User Profiling Challenge - discover users attributes(i.e. ages) by their queries on search engine @DataFountain, TOP 2%

• Built stacking model of Neural Network/Logistic Regression/Naive Bayes with TF-IDF/Word2Vec/Doc2Vec/pattern analysis Supply Chain Demand Forecast - predict next 5 weeks sales volume for an E-commerce platform @DataFountain, TOP 5%

• Imported external data(i.e. weather); Feature Engineering (time-series, ranking); Modeling by Linear Regression,Rule-based and LightGBM TalkingData AdTracking Fraud Detection - predict whether users will download after clicking Ads @Kaggle, TOP 5%

• Applied feature engineering(click pattern, conversion) for more than 190 millions records and used LightGBM to build the model

Contact this candidate