Dayu Yuan
Ph.D candidate in Computer Science Email: ******@***.***
The Pennsylvania State University Phone: 510-***-****
University Park, PA 16802 Homepage: http://www.cse.psu.edu/~duy113
Objective:
Software Engineer
Background Summary:
Graph mining and indexing
Information retrieval
Data mining and Machine learning
Education:
The Pennsylvania State University, University Park, PA. (Aug. 2008 - present)
Ph.D. candidate in Computer Science and Engineering
Advisor: Prasenjit Mitra, C. Lee Giles
Estimated Graduation Time: Summer 2013
Zhejiang University, Hangzhou, China. (Aug. 2004 - July. 2008)
B.Eng. in Software Engineering
Thesis: Blending Feature suppression of CAD models.
(Excellent Undergraduate Dissertation Award)
Industry Experiences:
Twitter (May. 2012 Aug. 2012)
Work with the revenue team
Time series analysis of revenue related metrics: fetch data using cascading (scalding)
on big data platform, analyze with R and design a dashboard for visualization.
Research In Motion (Redwood City) (May. 2011-Aug. 2011)
Work on the project CCL (Content Collection Library)
Mining user behavior patterns using Hadoop (with support of Sqoop, Pig, Hive and
Mahout)
Projects:
Information Retrieval Related:
Build a chemical-document search engine with open source indexer Solr/Lucene.
Build a features-based chemical-molecule search engine, supporting both graph
containment search and similarity search.
Data Mining and Knowledge Discovery Related:
Propose a graph-feature-selection algorithm to represent graph data to vectors. The
effectiveness of this feature-mining algorithm is tested with various classifiers.
Use the EM algorithm to address the clustering problem of a set of synthetic location data.
System Related:
Develop and maintain a resource management system for the Chemxseer project based on
the Spring framework.
Develop an interface visualizing the behavior of Tor, which is an open network that
defends against network surveillance.
Technical Skills:
Programming Language: Java, R, C++, C, Javascript, Ruby
Big Data Platform: hadoop, sqoop, hive, pig, mahout, cascading
Others: Matlab, Weka, Spring
Research Experiences:
Pennsylvania State University (Aug. 2008 - present)
Streaming graph feature mining: design a streaming algorithm to mine graph features.
A submodular objective function is proposed and a greedy algorithm is designed to
maximize the objective function with approximation guarantee.
Graph indexing and query optimization: Designed an innovative index structure for
the subgraph search problem, and it outperformed all other existing index structures as
high as 100 times in time efficiency.
Graph Feature Mining: Designed a graph-mining algorithm, which largely reduced the
time of subgraph-feature mining for indexing and classification.
Chemical document information retrieval: Design an algorithm to mine the entities of
chemical documents to facilitate search on chemical entities.
State Key Lab of CAD and CG, Zhejiang University (Aug. 2007 June. 2008)
CAD model simplification and information hiding
Publications:
Dayu Yuan and Prasenjit Mitra, Lindex: a lattice-based index for graph databases, VLDB
Journal, accepted 2012
Dayu Yuan, Prasenjit Mitra, HuiwenYu and C. Lee Giles, Iterative Mining Graph Features
for Graph Indexing, In proceedings of 28th IEEE international conference on Data
Engineering (ICDE 2012)
Dayu Yuan and Prasenjit Mitra, A Lattice-based Graph Index for Subgraph Search,
WebDB 2011
Dayu Yuan, Prasenjit Mitra, and C. Lee Giles, A Lightweight Index for SuperStructure
Search, in submission
Dayu Yuan, Huiwen Yu, Prasenjit Mitra, and C. Lee Giles, Subgraph Pattern Mining via
Streaming Max-Coverage, in submission