Xianqiao Li
Foster City, CA • ac7g8g@r.postjobfree.com • 551-***-****
Profile Links: LinkedIn; GitHub; Tableau Visualizations SKILLS
Programming Skills: Python, SQL, R
Data Science Packages: Pandas, NumPy, SciPy, Scikit-Learn Tools & Technologies: PostgreSQL, MySQL, Tableau, Spark, Regex, Git/GitHub, AWS S3 Analytical Skills: Supervised & Unsupervised Learning Models, Principal Component Analysis, Regularization, Model Evaluation, A/B Test
EXPERIENCE
Stevens Institute of Technology - Research Assistant (Hoboken, NJ) Jul 2017 – Sep 2017
Developed a predictive model pipeline to detect stock trading patterns and forecast dark pool liquidity based on historical trading records
Implemented feature engineering and applied the oversampling technique (SMOTE) that leads to 90% accuracy
Incorporated SQL scripts with Python to pull data from the PostgreSQL database, preprocessed data, performed exploratory data analysis to select top features that influence the model accuracy
Utilized Tableau to analyze and visualize trade volumes under different configurations, including market capitalizations, sectors, venues and trading time
Dun & Bradstreet - Technology Engineer Intern (Short Hills, NJ) Sep 2016 – May 2017
Deployed an end-to-end data pipeline for D&B Hoovers’ content intelligence platform via ETL on AWS cloud by leveraging Python, SQL and Git, saved 3 months’ budget and project time
Developed SQL and Python scripts to pull data to generate XML files, injected data to Oracle database and implemented log rotation and archives to ensure data integrity
Utilized APIs and acquired Fortune 500 company data from JSON files to upgrade functionality using Python
Implemented an internal tool in Python to validate URLs that lead to a 90% increase in team productivity EDUCATION
Stevens Institute of Technology (Hoboken, NJ) May 2017 Master of Science in Business Intelligence and Analytics GPA: 3.8/4.0 University of Sydney (Sydney, Australia) Jul 2014
Master of Commerce in Finance
University of Adelaide (Adelaide, Australia) Dec 2012 Master of Commerce
PROJECTS
NLP and Topic Modeling on Customer Reviews (Python, ETL)
Utilized the unsupervised learning algorithm to cluster 20,000 unlabeled user reviews
Implemented data pipeline to extract, transform and load (ETL) user reviews data into SQLite database
Preprocessed data using tokenization, stemming and stop-words removal and used TF-IDF to extract features
Trained K-Means clustering algorithm to identify topics and keywords for each cluster and visualized results Wisconsin Breast Cancer Diagnostic Accuracy Prediction (Python)
Developed classification models to predict patient breast cancer based on labeled data
Analyzed and evaluated the performances of various algorithms, including Logistic Regression, SVM, Random Forests, Naive Bayes, using ROC and AUC
Achieved 3% increase in accuracy via Ensemble learning and hyperparameters optimization with Scikit-Learn Customer Churn Prediction in Telecommunication Industry (Python)
Developed supervised machine learning models to predict customer churn probability via Python
Prepared data for training by data preprocessing, categorical feature transformation and standardization
Trained Logistic Regression, Random Forests and K-NN, and applied regularization to overcome overfitting
Evaluated model performance of classification via K-fold cross-validation and analyzed feature importance to identify top factors that influenced the results
Implemented Grid Search technique to find the optimal parameters that lead to 93% testing accuracy