Post Job Free
Sign in

Data Scientist

Location:
Los Angeles, CA
Posted:
March 09, 2020

Contact this candidate

Resume:

Hongxia Dai, Ph.D.

[ adb74r@r.postjobfree.com Ó 540-***-**** Los Angeles, CA linkedin.com/in/hongxia-dai github.com/daihongxia EDUCATION

Ph.D. in Physics

Virginia Tech

Dec 2019 Blacksburg, VA

B.Sc. in Physics

University of Science and

Technology of China (USTC)

Jun 2014 Hefei, China

SKILLS

Python SQL C++ R

Bash Git Linux

Object-oriented Programming

Pandas Apache Spark

PyTorch Scikit-Learn

TensorFlow Keras

Computer Vision

Natural Language Processing

Data Visualization

Statistical Modeling

Signal Processing

Monte Carlo Simulation

Google Cloud Platform (GCP)

WORK EXPERIENCE

Research Assistant

Center for Neutrino Physics, Virginia Tech

May 2015 – Oct 2019 Blacksburg, VA

Published 6 papers in total on leading particle physics journals, and presented on behalf of the collaboration on international conferences. See my Google Scholar: bit.ly/3a82aa5

Analyzed experimental data to extract nuclear cross-section for different atoms to model atomic nucleus structure accurately;

Developed Monte Carlo simulation programs in C++ to predict the efficiency of particle detectors;

Mentored junior PhD researchers within the team. Visiting Researcher

Thomas Jefferson National Accelerator Facility

Feb 2017 - May 2017 Newport News, VA

Participated in the experimental setup and took shifts in monitoring data collection, closely worked with hardware engineers in detecting and reporting potential issues in the facility;

Developed and implemented data processing workflow to analyze and visualize over 10 TBs of raw data in Python and ROOT (C++ data analysis framework).

CERTIFICATES

Deep Learning Specialization

deeplearning.ai, Coursera

CNN ResNets YOLO

RNN LSTM Word2Vec

Optimization And More

Advanced Data Science with

IBM Specialization

IBM, Coursera

Spark Pipeline SystemML

DeepLearning4j And More

PROJECTS

Object Detection For Self-driving Car

Researched and trained CenterNet (CNN-based model) using PyTorch on Google Cloud (GPU instances, cloud storage, etc) for object detection and pose prediction on vehicles in photos;

Wrote modular and portable python pipeline for data preprocessing, training, and evaluating in computer vision tasks. Judgements Legal Area Classification

Built long document encoders with Tf-Idf, LSTM, BERT, etc.. Trained classifiers using Scikit-Learn, and Huggingface’s Transformers. Achieved classification accuracy of 0.68 for 41 different classes;

Packaged the project with PyScaffold.

311 NYC Service Requests Data Analysis And Web App

Analysed 311 requests dataset (12GB+) stored in AWS Redshift with SQL queries and Pandas. Data visualization with Matplotlib, Seaborn, and Bokeh. Used GLoVe encoder to process texts, and clustered similar texts using K-Means & DBSCAN;

Built models to predict daily requests (regression) and types (50-class classification) using LightGBM, Logistic Regression, and Decision Tree, achieved 7.7 l1-loss and 0.63 accuracy respectively.

Built a web app for 311 service daily requests prediction, using Flask for back-end and React for front-end.



Contact this candidate