YI JIA
San Jose, CA *****
Email: *******@*******.***
Phone: 785-***-****
EDUCATION
Ph. D COMPUTER SCIENCE, UNIVERSITY OF KANSAS, 12/2012 specialized in Machine Learning, Statistical Learning and Data Mining Master INFORMATION SCIENCE, RITSUMEIKAN UNIVERISTY, JAPAN specialized in Distributed Computing
Master COMPUTER SCIENCE, SHANGHAI JIAO TONG UNIVERISTY, CHINA specialized in Parallel Computing
HONORS
Graduate School Award for Outstanding International Graduate Student (University of Kansas, U.S.)
Japanese Government (Monbukagakusho) Scholarship for Outstanding Foreign Student (Ritsumeikan University, Japan)
SPECIALIZED KNOWLEDGE
Productive work on transferring the state-of-the-art Machine Learning techniques into real-world products.
Self-motivated research interests and extensive research experiences in multiple interdisciplinary problems by using various Machine learning, Statistical Learning and Data mining techniques (spectral clustering, dirichlet mixture model, Bayesian networks, topic modeling in natural language processing, time series classification, and frequent subgraph pattern mining, etc.).
Exceptional experiences on big data solution and tools: MapReduce (Hadoop MapReduce), Apache Spark, Hive etc..
More than 15 years’ experiences on programming in Java, Python, R, and Matlab. IMMIGRATION STATUS
U.S. Permanent Resident
WORKING EXPERIENCES
03/2018 – 01/2020 Data Scientist 2
EMPLOYER: eBay Inc., San Jose, CA
RESPONSIBILITY: Build Price Guidance Service (estimate selling prices of eBay product) based on various NLP models
Application: eBay Global Market Place, Selling Experience Implemented in: Python, Scala, Hadoop, Hive, Apache Spark, Pig 08/2014 – 02/2018 SENIOR DATA SCIENTIST
EMPLOYER: GroundTruth Inc., formerly xAd Inc., Mountain View, CA RESPONSIBILITY: (1) Design and build the fundamental location data layer of the GroundTruth global location market places.
(2) Optimize and model mobile online advertising bidding strategy.
(3) Optimize and model mobile online advertising KPI (CTR, SAR).
Application: Mobile Online Advertising, Location based Advertising, Audience Targeting
Implemented in: Python, Java, Scala, Hadoop, Hive, Apache Spark, S3, Postgis DB, SQL
10/2012 – 08/2014 MACHINE LEARNING DATA SCIENTIST
EMPLOYER: InterTrust Technologies, Sunnyvale, CA
RESPONSIBILITY: Worked in the Personagraph project to build a secure mobile user inference platform. This platform infers personal profiles from various sources of user cell phone data, such as location data, App usage data, and social networking data. I applied various techniques into the product, such as Natural Language Processing (NLP), Supervised learning modeling, Semi- supervised learning modeling, Clustering, time series change- point detection, etc.
Application: Mobile Advertising, Online Target Audience Advertising Implemented in: Java,Python, R
RESEARCH EXPERIENCES
08/2005 – 09/2012 GRA, UNIVERSITY OF KANSAS, USA
Graph mining:
(1) I developed a novel graph mining algorithm which uses stochastic matrices to incorporate evolutionary prior knowledge to recognize approximate sub-graph patterns from large noisy graph database. The mined pattern features are further used for graph structure classification based on Support Vector Machine. Application: Bioinformatics (immunity protein functional classification) Implemented in: C++, Perl, MPI,
Graphic statistical modeling:
(1) I developed a novel Bayesian Network method for structure inference in non- stationary time series data. Our approach is based on multivariate change-point detection and Markov Chain Monte Carlo (MCMC) sampling. Application: Bioinformatics (reconstruction of gene regulatory networks) Implemented in: JAVA, Matlab
(2) I am developing a novel Bayesian Network method based with hierarchical topic modeling for student’s learning pathway inference from incomplete student testing data and unstructured skill text data. Our model is based on Latent Dirichlet Allocation and MCMC sampling.
Application: student dynamic skill testing and assessing on special education (sponsored by U.S. Department of Education) Implemented in: JAVA, Matlab, Mallet (Natural Language Processing Package) Online Clustering:
(1) I designed a novel auto-adaptive online spectral clustering method for unlimited large network streams.
Application: event detection in large social networks, recommender systems in e-commerce and anomaly detection in telecommunication networks Implemented in: Matlab, Perl
PATENTS
US 62/251,090 Systems and Methods for Creating and Using Geo-Blocks PUBLICATIONS
1) Yi Jia and Tom Walsh, Bayesian Network Structural Inference with Text Regularity, Technical Report
2) Yi Jia, Jun Huan and Hongguo Xu, ISSUER: an online evolutionary spectral clustering method for large network streams, Technical Report 3) Yi Jia, Jun Huan, and Wenrong Zeng. Discrete non-stationary bayesian networks based on perfect simulation. ACM International Conference on Information and Knowledge Management, 2012
4) Yi Jia and Jun Huan. Constructing non-stationary Bayesian Networks method with flexible time lag choosing mechanism, BMC Bioinformatics, 2010 5) Yi Jia, Jun Huan, and Jintao Zhang, An efficient graph mining method for complicated and noisy data with real-world applications, Knowledge and Information Systems: An International Journal (KAIS), Springer, 2010 6) Yi Jia, and Jun Huan, The Analysis of Arabidopsis Thaliana Circadian Network Based on Non-stationary DBNs Approach with Flexible Time Lag Choosing Mechanism, in Proceedings of the IEEE International Conference on Bioinformatics & Biomedicine (BIBM'09), 2009
7) Aaron Smalter, Jun Huan, Yi Jia, Gerald Lushington, GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009
8) Yi Jia, Vincent Buhr, Jintao Zhang, Jun Huan, Leonidas N. Carayannopoulos, Towards Comprehensive Structural Motif Mining for Better Fold Annotation in the
``Twilight Zone'' of Sequence Dissimilarity, Jan. 2009, Journal of BMC Bioinformatics
9) Yi Jia, Vincent Buhr, Jintao Zhang, Jun Huan, Leonidas N. Carayannopoulos, Comprehensive Structural Motif Mining for Better Fold Annotation, APBC2009 Conference (Asia Pacific Bioinformatics Conference). 10) Yi Jia, Josef Jurek, Kamen Nikolov, and Terry Clark, Integration of genomic data from multiple sources using XMLGUS, Technical Report, 2006.