Sign in

Data Computer Engineering

Cleveland, Ohio, United States
January 17, 2018

Contact this candidate

Kaijin Zhang

**** ****** **, *** ***, Cleveland, OH, 44120

662-***-**** SUMMARY

Ph.D. candidate with solid research ability, problem-solving skills, learning motive and coding skills EDUCATION

Case Western Reserve University Cleveland, OH USA

Ph.D. in Computer Engineering(GPA: 3.7/4.0) 2015 - Mississippi State University Mississippi State, MS USA M.S. in Electrical and Computer Engineering(GPA: 3.8/4.0) 2012 - 2015 University of Science and Technology of China Hefei, China B.E in Information Security (GPA: 3.5/4.0) 2008 - 2012

• Honors: Outstanding Student Scholarship, USTC (Top 10%) SKILLS

- Programming: Python, C, Scala, Java, MATLAB, SQL, Shell Scripting, Latex

- Machine Learning: Tensorflow, Scikit-learn, Keras, Numpy, Scipy, Pandas, NLTK, OpenCV, imblearn, statsmodel.api, Gensim

- Big Data: Apache Spark, MLlib, Hadoop, HDFS, MapReduce, OpenMPI, Apache Solr

- Visualization: Seaborn, Matplotlib, Tableau, D3.js

- Database / Tools: MySQL, Sqlite3, PostgreSQL, LaTex, Git, BeautifulSoup, HDF5, NFS, CPlex, ARIMA, AWS EC2/S3/EMR, Boto3, Jupyter, requests, urllib2, tidylib, csv, json, psycopg2, sqlalchemy

- Knowledge: OOP/OOD, NLP, Big Data, Machine learning, Data Mining, Deep learning, Statistics, computer vision


Case Western Reserve University Cleveland, OH USA

Research Assistant & Teaching Assistant, NEST Research Group o Research focus: Security, Optimization Theory, Cyber-physical system, Networking o Privacy-preserving outsourcing schemes for large-scale basic computation and its application o Designed algorithms for outsourcing computationally-intensive basic operations including matrix/tensor convolution, QR/LU matrix decomposition and cloud-assisted medical image registration while preserving data privacy

o Implemented and deployed secure algorithms for convolution and decomposition tasks over an AWS cluster with specially designed parallel processing scheme (good scalability validated for up to 16 nodes) using Spark MapReduce, OpenMPI

o Used h5py, scipy, numpy for data formatting, NFS for file sharing, Boto3 for AWS management o Deployed privacy-preserving automatic point extraction and matching and TPS-based deformable medical image registration over an AWS cluster

o Used OpenCV and Sk-image for image preprocessing and feature extraction o Superior computational and I/O complexity compared with traditional homomorphic encryption based algorithms and naïve scheme while guaranteeing security under Chosen-Ciphertext-Attack.

(5/6 local computational cost reduction)

o Cyber-system-assisted transit route network operating scheme design o Designed an efficient cyber-system-assisted novel TRN system that keeps pushing the network to a less congested status by dynamically altering vehicle routes and passenger allocation policy o Based on Lyapunov queuing theory, formulated a mixed integer programming problem that integrates queue status, fairness penalty and passengers’ utility and solved the problem by decomposing it into distributable components with bounded performance gap with optimal solution and guaranteed network strong stability

o Simulated performance gained 3 times network throughput improvement (Python, Matlab, Cplex, NetworkX)

o Data hub site building for an NGO (OneCommunity, Cleveland) o Developed a website using CKAN (Python-based toolkit) and deployed it on Amazon EC2 o Applied Solr for dataset searching and PostgreSQL for database management, enabled dataset publishing, sharing, online previewing, indexing and quick visualization of spatial-temporal datasets

o Wrote scripts for customized automatic file uploading, used psycopg2, sqlalchemy for communicating with Database

o Machine learning pipeline for browser type/version prediction on user-agent strings o Feature extraction using Python regex, train using random forest with a specially designed extra processing step for less confident classes (sklearn, numpy) o Used imblearn for imbalance data preprocessing (oversampling and downsampling) o Achieved 98.33% F1-macro score and 99.5% F1-micro score on classifying 18 different browser types with training dataset of 422k records

o Text classification on natural language dataset “IMDB ratings” o Designed a multi-task learning algorithm using LSTM (written in Keras) and attention mechanism that simultaneously classifies reviews in several categories achieving 90.8% accuracy o Implemented Naïve Bayes based algorithm with Tf-idf BoW representation, SVM-based classification with Doc2Vec distributed representation using Gensim for sentimental analysis on movie reviews

o Used NLTK for word stemming and stop wordset, sklearn for training o Content developing for data science course

o Designed course materials, lab handouts, project tutorials, and assignments and served as TA and lab instructor for course “DSCI 133: Data Science and Engineering” (40 students) o Developed course content that covers the whole data engineering pipeline (PyData) including data collection, cleaning, statistical inference, analytics, predictive models, big data techniques, database management and visualization (requests, urllib2, tidylib, Beautifulsoup, pandas, matplotlib, seaborn, statsmodel, Spark, EMR, Mysql, Tableau, Jupyter, sklearn) EXTRACURRICULAR ACTIVITIES & OTHER INFORMATION

• USTC Robot Competition (Rank 8/64):built a bipedal fighting robot and a line tracking vehicle

• 2017 Cleveland Marathon: completed

• President of Mississippi State University Chinese Students and Scholars Association: lead a team of members to hold several festival galas that receive over 400 guests

• Interests: basketball, board game, consumer electronics, reading top AI publications, hiking & fishing

Contact this candidate