Post Job Free
Sign in

Data Assistant

Location:
New York, NY
Posted:
February 23, 2021

Contact this candidate

Resume:

Tianyi (Stella) Wang

******@********.*** 858-***-**** linkedin.com/in/tianyi-stella-wang/ 70 West 93rd Street, New York, NY EDUCATION

Columbia University New York, NY Aug. 2019 - Dec. 2020 Master of Science in Data Science, GPA: 3.67

Relevant Coursework: Deep Learning, Machine Learning, Algorithms for Data Science, Exploratory Data Analysis & Visualization, Natural Language Processing, Statistics Inference & Modeling, Computer System, ML for Financial Modelling University of California, San Diego La Jolla, CA Sept. 2015 - June 2019 Bachelor of Science in Cognitive Science, specialized in Machine Learning, GPA: 3.85 (Honors: Magna Cum Laude) SKILLS

Programming Languages/Tools: Python, SQL, R, MATLAB, JavaScript, HTML, Tableau, Power BI, MS Excel Frameworks: TensorFlow, Keras, Scikit-learn, Pandas, Matplotlib, SciPy, ggplot2, Spark, Hadoop, D3.js, PyTorch EXPERIENCE

Scripps Research La Jolla, CA Jun. 2020 - Sept. 2020 Data Science Research Intern

• Developed CNN and FFNN models for polygenic disease risk prediction with Python, Tensorflow, and CUDA

• Built a VCF-file data preprocessing pipeline of the Atherosclerosis Risk in Communities dataset with Scikit-allel, Pandas and Cyvcf2 to generate numeric representations of genomic variants

• Improved the AUC Score by 7% and accuracy by 2% comparing to common polygenic risk prediction techniques by adjusting model structure, tuning model parameters and analyzing hyperparameters with Keras Tuner and Hiplot

• Reduced months of preparatory work by automating the process of selecting features from 697K+ genomic variants and building an autoencoder with Keras to obtain an encoded representation of the genomic variants for training The Mortimer B. Zuckerman Mind Brain Behavior Institute New York, NY Oct. 2019 - Mar. 2020 Graduate Research Assistant

Analyzing Songbirds’ Selectivity towards Different Stimuli

• Built a data preprocessing pipeline with MATLAB to convert 300+ audios of songbirds’ callback to numeric data

• Visualized the data and conducted statistical tests (likelihood-ratio tests, t-tests, etc.) using MATLAB to demonstrate that the songbirds are more selective toward certain stimulus, and the selectivity increases with age Predicting Language Proficiency from a small EEG Dataset

• Employed SVM to predict English Efficiency of native Mandarin speakers based on the EEG data recorded during English listening test; used Backward Elimination and PCA for feature selection and dimensionality reduction

• Obtained a correlation coefficient of 0.83 for the predicted proficiency and the actual proficiency, which indicated that language proficiency changes the encoding of linguistic hierarchy in EEG response PROJECTS

Reinforcement Learning for Taxi Driver Re-Positioning Problem DiDi Global, Sept. 2020 – Dec. 2020

• Conducted EDA on NYC taxi datasets and built a reinforcement learning environment to simulate taxi demand

• Increased drivers’ income by 9% by applying SARSA algorithm to guide taxi driver repositioning decisions Residual Attention Network for Image Classification Columbia University, Fall 2019

• Implemented Residual Attention Network with 56 and 92 trunk layer depth on TensorFlow for the CIFAR datasets

• Increased the accuracy of the models by 30% by integrating batch normalization and L2 regularization Stock Analysis and Forecasting Columbia University, Fall 2019

• Forecasted individual stocks in the S&P 500 and the return of an entire Portfolio through multiple factor models and Monte Carlo Simulations with R

• Visualized trends and forecasts of stocks with ggplot2 to improve stock selection process and achieve better return

• Built an interactive stock price visualization with D3.js to enhance the clarity of the selection process



Contact this candidate