Data Analyst

Location:

New York, NY

Posted:

March 08, 2021

Contact this candidate

Resume:

ZIQING HUANG

Manhattan, New York 646-***-**** ******.*@********.*** LinkedIn

SUMMARY

Research & Industry Experience: 4+ year research experience in data manipulating and building machine learning models in Python and R, 1+year working experience as a machine learning engineer, 3+year experience in SQL Tools: Spark, AWS(EC2, RDS, S3), GCP(VM instances, API, Big Query), PostgreSQL, Hive, Tableau, Databricks, Git Programming: Python(Scikit-Learn, TensorFlow, Pytorch, PySpark), SQL, R(dplyr, ggplot2, rvest), JavaScript EDUCATION

Columbia University M.S. in Data Science - Quantitative Methods GPA: 3.73 Sep 2019 - Feb 2021 Relevant Courses: Algorithms & Data Structures, Machine Learning, Data Visualization, Social Network Analysis, Database, Deep Learning, Probability and Mathematical Statistics Georgia Institute of Technology Online M.S. in Computer Science part-time Aug 2020 - Jun 2021 Nankai University B.S. in Quantitative Psychology GPA: 3.8 Sep 2015 - Jun 2019 WORK EXPERIENCE

Graphen. AI Jan 2020 - Present

Data Scientist Intern New York, U.S.

• Developed ADME model(Drug Absorption, Distribution, Metabolization and Excretion) with PyTorch Geometric; trained deep learning models including Graph Convolutional Network to improve drug development

• Created company blog websites with JavaScript and used Pelican to automatically update static html KPMG Sep 2020 - Dec 2020

Data Scientist Intern, Practicum Project New York, U.S.

• Built SEIR models based on daily data including death data, lockdown index, and population to predict short-term COVID-19 new cases and long-term transmission trends for different states of U.S.

• Developed Time Series models to predict long-term economic impacts under COVID-19

• Created slides, webpage and dashboards to visualize the mobility geospatial prediction for financial clients Tencent May 2020 - Sep 2020

Data Scientist Intern, Interactive Entertainment Group (IEG) remote

• Negotiated with the product team to predict the lifetime value(LTV) of global users; grouped users according to features including region, pay rate within 7 days and channels; developed ML models including multiple-task DNN and LightGBM for different groups and used random search to tune hyper-parameters; set different metric including mean absolute error(MAE) and mean absolute percentage error(MAPE) for different models to measure and optimize models;

• Conducted churn analysis in Indonesia; developed forecasting models with XGBoost to predict the churning probability; helped the product team to perform personal interventions and increase the retention rate by 4.6%

• Built and optimized SQL pipelines with PySpark to extract data(millions of rows) from Tencent distributed systems efficiently

• Implemented A/B test and hypothesis testing in stats to optimize props recommendation algorithms of PUBGM

• Designed and implemented visualization dashboards with Tableau for global games to monitor user activity and game revenue Epim Network Jan 2020 - Apr 2020

Data Scientist Intern New York, U.S.

• Obtained million-size data the clients need by connecting API with Request and web crawler with BeautifulSoup

• Used standard deviation and boxplots to detect outliers and used MICE to fill random missing value

• Trained ML models to improve the defect detection accuracy to 89.3% and visualize the model with graphviz PROJECTS

Application of medical appointment and results system for Covid-19 testing Sep 2020 - Dec 2020

• Designed Entity Relationship Diagram; implemented and populated the database on PostgreSQL in VM instance on GCP

• Website Development: Developed and maintained front-end interaction websites with Flask for people to select the time slots, room, and doctor from available test sites and view results from a previous test Kaggle Competition - Two Sigma: Predict Stock Movements(Top 10%) Dec 2018 - Jan 2019

• Adopted the important metrics and financial indicators including stock volumes, opening price, and moving averages, of U.S. stock data over the past decade in to create features; Used linear interpolation to supplement missing value

• Split the dataset of news with TimeSeries Split to acquire continuous data for training and validating models

• Adopted stacking method to ensemble 5 models trained on a different set of features to further boost the performance and got 4% relative improvement on the best single model

Contact this candidate