Sign in

Data Python

San Francisco, CA
October 23, 2018

Contact this candidate


Xianqiao Li

Foster City, CA • • 551-***-****

Profile Links: LinkedIn; GitHub; Tableau Visualizations SKILLS

Programming Skills: Python, SQL, R

Data Science Packages: Pandas, NumPy, SciPy, Scikit-Learn Tools & Technologies: PostgreSQL, MySQL, Tableau, Spark, Regex, Git/GitHub, AWS S3 Analytical Skills: Supervised & Unsupervised Learning Models, Principal Component Analysis, Regularization, Model Evaluation, A/B Test


Stevens Institute of Technology - Research Assistant (Hoboken, NJ) Jul 2017 – Sep 2017

Developed a predictive model pipeline to detect stock trading patterns and forecast dark pool liquidity based on historical trading records

Implemented feature engineering and applied the oversampling technique (SMOTE) that leads to 90% accuracy

Incorporated SQL scripts with Python to pull data from the PostgreSQL database, preprocessed data, performed exploratory data analysis to select top features that influence the model accuracy

Utilized Tableau to analyze and visualize trade volumes under different configurations, including market capitalizations, sectors, venues and trading time

Dun & Bradstreet - Technology Engineer Intern (Short Hills, NJ) Sep 2016 – May 2017

Deployed an end-to-end data pipeline for D&B Hoovers’ content intelligence platform via ETL on AWS cloud by leveraging Python, SQL and Git, saved 3 months’ budget and project time

Developed SQL and Python scripts to pull data to generate XML files, injected data to Oracle database and implemented log rotation and archives to ensure data integrity

Utilized APIs and acquired Fortune 500 company data from JSON files to upgrade functionality using Python

Implemented an internal tool in Python to validate URLs that lead to a 90% increase in team productivity EDUCATION

Stevens Institute of Technology (Hoboken, NJ) May 2017 Master of Science in Business Intelligence and Analytics GPA: 3.8/4.0 University of Sydney (Sydney, Australia) Jul 2014

Master of Commerce in Finance

University of Adelaide (Adelaide, Australia) Dec 2012 Master of Commerce


NLP and Topic Modeling on Customer Reviews (Python, ETL)

Utilized the unsupervised learning algorithm to cluster 20,000 unlabeled user reviews

Implemented data pipeline to extract, transform and load (ETL) user reviews data into SQLite database

Preprocessed data using tokenization, stemming and stop-words removal and used TF-IDF to extract features

Trained K-Means clustering algorithm to identify topics and keywords for each cluster and visualized results Wisconsin Breast Cancer Diagnostic Accuracy Prediction (Python)

Developed classification models to predict patient breast cancer based on labeled data

Analyzed and evaluated the performances of various algorithms, including Logistic Regression, SVM, Random Forests, Naive Bayes, using ROC and AUC

Achieved 3% increase in accuracy via Ensemble learning and hyperparameters optimization with Scikit-Learn Customer Churn Prediction in Telecommunication Industry (Python)

Developed supervised machine learning models to predict customer churn probability via Python

Prepared data for training by data preprocessing, categorical feature transformation and standardization

Trained Logistic Regression, Random Forests and K-NN, and applied regularization to overcome overfitting

Evaluated model performance of classification via K-fold cross-validation and analyzed feature importance to identify top factors that influenced the results

Implemented Grid Search technique to find the optimal parameters that lead to 93% testing accuracy

Contact this candidate