Data Software Engineer

Location:

Sunnyvale, CA

Posted:

April 04, 2019

Contact this candidate

Resume:

Zhiqi Guo

*** ** **** *******, *********, CA ****6 • Cell: 917-***-**** • ********@***.***

EDUCATION

NEW YORK UNIVERSITY New York, NY

Master of Science in Data Science June 2018

- CAS/GSAS TUITION PROGRAM (scholarship)

NEW YORK UNIVERSITY New York, NY

Bachelor of Science in Mathematics June 2016

Minors: Computer Science, Business Study

Technical Skills

• Programming & Languages: Python, R, Matlab, Java, JavaScript, HTML/CSS, C/C++

• Core: Machine Learning, Deep Learning, NLP, Computer Vision, SQL/MySQL, Database, Algorithm, Big Data

• Toolkits, Software & Analytics: PostgreSQL, Hadoop, Apache Spark, Scikit-Learn, Tableau, Pandas, Numpy, PyTorch, TensorFlow, Keras, NLTK, Cloud9, Linux/Unix, AWS(EC2, S3), Excel, Github/Bitbucket PROFESSIONAL EXPERIENCE

AILaw Inc. Mountain View, CA

Software Engineer 08/2018 – present

• Involved in company’s new platform development project, mainly focused on landing page re-engineering tasks

• Integrate, test and document code changes. Collaborate with UI designers, develop and implement HMTL, CSS, JavaScript pages for company website and user interaction pages under React JS framework

• Code up Python scripts to load and save clients data and run data analytics and modeling. Integrated data modeling results to the backend infrastructure for more intelligent decision making in real time Multimer Data New York, NY

Data Science Intern 01/2018 – 07/2018

• Test environment with neural network tools and data access for analysis team

• Code up Neural Analysis script and Text Processing script to analyze frequency bands, determine which statistical tests and Machine Learning methods to implement. Build data pipeline to automatically run various tasks

• Run parametric and nonparametric statistical tests on Power Band and eSense data on within-subject and group level. Analyze neural time-series data using regression and neural network techniques PROJECTS/ACTIVITIES

YouTube Video Comments Emotional and Sentiment Analysis in Apache Spark

• Analyzed YouTube users historical comments and behaviors pattern from channels and videos related to animals and/or pets

• Built data ETL pipeline to process, clean and analyze comments dataset. Designed metrics and rules to label comments into dog/cat owners and no pets users category based on NLP, Spark Dataframe and Spark SQL.

• Resolved unbalanced data label problem by downsampling. Implemented feature Extractors by using RegexTokenizer and Word2Vec embedding. Built up supervised learning models from an unsupervised dataset. Conducted pipeline to train and tune logistic Regression and Random Forest classifier under Spark-ML to predict cat/dog owners

• Classified all users by using trained models. After cross-validation and parameter tuning, improved best performance to precision 0.84, recall 0.87 and Area under ROC 0.93. Extracted insights about dog/cat owners by explored topic frequency Fraud Detection System in E-Commerce Industry

• Developed various algorithms for E-Commerce site to raise alert for illegal activities and predict probability of each transaction whether the activity is fraudulent or not via Python programming and Apache Spark

• Built model pipeline of pre-processing and transforming data, merge and aggregate various data source inputs, check outliers

• Designed and engineered features based on domain knowledge. Tried different approach to handle data imbalance problem(include Up-sampling, Down-sampling, SMOTE and Class weights tuning)

• Trained supervised models (Logistic Regression, Random Forest, Gradient Boosting Machine) for classifications. GridSearch and tuned hyper parameters. Evaluated models based on various metrics (Precision, Recall, ROC-AUC score and F1 score)

• Analyzed insights from outcome and feature importance. Deployed model result to set up alert/decline rules for transactions Large-Scale Movie Recommendation Engine Development in Apache Spark

• Built collaborative filtering movie recommender systems based on MovieLens Datasets (20 million ratings and 465,000 tag)

• Conducted data ETL pipeline to analyze movie rating dataset and run online analytical processing(OLAP) with Spark SQL

• Implemented Alternating Least Square (ALS) model under PySpark-ML based on users-items Matrix Factorization to predict ratings for the movies. Provide personalized movie recommendations for various users

• Fine-tuned model’s hyperparameters with Spark ML cross-validation toolbox to find best model. Monitored training process by visualized learning curve and training loss. Minimized RMSE error to 0.889 on validation set and 0.89333 on testing set

Contact this candidate