An RDF Data Model for the
Semantic Web
*th Oracle Life Sciences User Group meeting
May 16-17, 2005
Agenda
Introduction 5 min
Susie Stephens
Semantic Web for Life Sciences 25 min
Susie Stephens
Oracle support of RDF in RDBMS 25 min
Souripriya Das
Demo of Siderean s Seamark Navigation Server 25 min
Mike DiLascio, David LaVigna & Joanne Luciano
Discussion 10 min
Susie Stephens
Semantic Web for Life Sciences
Susie Stephens
What is the Semantic Web?
A machine-readable format that is Web
compatible
The Semantic Web adds definition tags to
information in Web pages
Enables computers to discover data more
effectively
Allows new associations to form between pieces
of information
Resource Description Framework
W3C standard for the common data format
Based on triples (subject predicate object)
Everything has a URI
Ontologies used to label the RDF tagged elements
Image Source: W3C
Image Source: W3C
Enterprise Integration Hub
Image Source: W3C
Semantic Web Stack
Image Source: W3C
Pharma Productivity
Source: PhRMA & FDA 2003
Critical Path Initiative
Source: Innovation or Stagnation, FDA Report, March 2004
Ontology Frameworks for Integration
Protein
Gene
mRNA
Cascade
pathway
Localization
Disease
Intervention Bio-process
point
Drug
Microarray
experiment
Target
model
Treatment
Biological Pathways
Image Source: Cytoscape
Beyond the Dead Graphical Model
Image Source: KEGG
Assigning Trust Values to Data
Image Source: SWANS
Inferencing
If Gene G is implicated in Disease D, and its Protein
Product P is a functional component of only Pathway
P2 -> then Disease D directly perturbs Pathway P2
Why Semantic Web for Life Sciences?
Heterogeneous data integration using explicit
semantics
Expression well-defined and rich models of
biological systems
Annotating findings and interpretations formally and
sharing with other scientists
Embedding models and semantics within papers
Applying logic to infer additional insights and to
propose and/or capture new hypotheses
QUESTIONS
ANSWERS
RDF Support in Oracle RDBMS
Souripriya Das, Ph.D.
Consultant Member of Technical Staff
Oracle New England Development Center
Overview
Three types of database objects
Model RDF graph consisting of a set of triples
Rulebase Set of (user-defined) rules
Rule Index Entailed RDF graph
We discuss following aspects for each type of object
DDL
DML
Views
Security
RDF Query (with Inference)
RDF Models
Model: Overview
Each RDF Model (graph) consists of a set of
triples
A triple (statement) consists of three
components
Subject URI or blank node
Predicate URI
Object URI or literal or blank node
A statement itself can be a resource (allowing
nested graphs)
Model: Example
16
:John
age
Family:
brotherOf
(:John :brotherOf :Mary)
(:John :age 16 ^^xsd:Integer) parentOf
(:Mary :parentOf :Matt) :Mary :Matt
(:John :name John )
(:Mary :name Mary )
thinks
Reification:
(:John :thinks _:S1)
(_:S1 rdf:subject :Sue) livesIn
:Sue NYC
(_:S1 rdf:predicate :livesIn)
(_:S1 rdf:object NYC )
RDF Query
SDO_RDF_MATCH Table Func
Arguments
Graph pattern
A sequence of triple patterns
Triple patterns typically use variables
RDF Data set a set of models
Filter
Aliases
FROM TABLE(SDO_RDF_MATCH(
(?x :brotherOf ?y) (?y :parentOf ?z),
SDO_RDF_Models( family ),
)) t
SDO_RDF_MATCH: return
Columns (of type VARCHAR2) in each returned row:
For each variable ?x in Graph Pattern
x
x$rdfVTYP
URI, Literal, Blank node
x$rdfLTYP
Specific literal type (e.g., xsd:integer)
x$rdfCLOB
Contains actual value, if ?x matches a
CLOB value
x$rdfLANG
Language tag, if any (e.g., en-us )
If no variable in Graph Pattern
A dummy column
SDO_RDF_MATCH: matching
Matching multiple representations
The same point in value space may have
multiple representations
10 ^^xsd:Integer
10 ^^xsd:PositiveInteger
010 ^^xsd:Integer
000010 ^^xsd:Integer
SDO_RDF_MATCH automatically resolves
these
RDF Query: Example
Find salary and hiredate of all the uncles
SELECT emp.name, emp.salary, emp.hiredate
FROM emp,
TABLE(SDO_RDF_MATCH(
(?x :brotherOf ?y)
(?y :parentOf ?z)
(?x :name ?name),
SDO_RDF_Models( family'),
)) t
WHERE emp.name=t.name;
Use of SDO_RDF_MATCH allows embedding a
graph query in a SQL query
RDF Query: Example 2
Find pairs of persons residing at the same
address where the first person rents a truck and
the second person buys a fertilizer
SELECT t3.x name1, t3.y name2
FROM AddrTable t1, AddrTable t2,
TABLE(SDO_RDF_MATCH(
(?x :rents ?a) (?a rdf:type :Truck)
(?y :buys ?b) (?b rdf:type :Fertilizer),
SDO_RDF_Models( Activities'),
)) t3
WHERE t1.name=t3.x and t2.name=t3.y and
t1.addr=t2.addr;
RDF Rulebases
Rulebase: Overview
Each RDF rulebase consists of a set of rules
Each rule consists of
antecedent: graph-pattern
filter condition (optional)
Consequent: graph-pattern
One or more rulebases may be used with
relevant RDF models (graphs) to obtain
entailed graphs
Rulebase: Example
Rules in a rulebase family_rb:
Antecedent: (?x :brotherOf ?y) (?y :parentOf ?z)
Filter: NULL
Consequent: (?x :uncleOf ?z)
Antecedent: (?x :age ?a)
Filter: a >= 65
Consequent: (?x :ageGroup Senior )
Antecedent: (?x :parentOf ?y) (?y :parentOf ?z)
Filter: NULL
Consequent: (?x :grandParentOf ?z)
RDF Rule Indexes
Rule Index: Overview
A rule index represents an entailed graph
A rule index is created on an RDF dataset
(consisting of a set of RDF models and a set
of RDF rulebases)
Rule Index: Example
A rule index may be created on a dataset
consisting of
family RDF data, and
family_rb rulebase (shown earlier)
The rule index will contain inferred triples
showing uncleOf and ageGroup information
RDF Query with Inference
SDO_RDF_MATCH with
Rulebases
Arguments
Graph pattern
A sequence of triples (with variables)
RDF Data set
a set of models
a set of rulebases
Filter
Aliases
FROM TABLE(SDO_RDF_MATCH(
(?x :uncleOf ?y),
SDO_RDF_Models( family ),
SDO_RDF_Rulebases ( rdfs, family_rb )
)) t
RDF Query w/ Inference:
Example
Find salary and hiredate of all the
uncles
SELECT emp.name, emp.salary, emp.hiredate
FROM emp,
TABLE(SDO_RDF_MATCH(
(?x :uncleOf ?y) (?x :name ?name),
SDO_RDF_Models( family'),
SDO_RDF_Rulebases( rdfs, family_rb'),
)) t
WHERE emp.name=t.name;
RDF Query w/ Inference:
Example 2
Find pairs of persons residing at the same
address where the first person rents a truck and
the second person buys a fertilizer
SELECT t3.x name1, t3.y name2
FROM AddrTable t1, AddrTable t2,
TABLE(SDO_RDF_MATCH(
(?x :rents ?a) (?a rdf:type :Truck)
(?y :buys ?b) (?b rdf:type :Fertilizer),
SDO_RDF_Models( Activities'),
SDO_RDF_Rulebases( rdfs ),
)) t3
WHERE t1.name=t3.x and t2.name=t3.y and
t1.addr=t2.addr;
RDF Models
Model: DDL
Procedures provided as part of the API may be used
to
Create a model
Drop a model
When a user creates a model, a database view gets
created automatically
rdfm_family
A model corresponds to a column of type
SDO_RDF_TRIPLE_S in a base table
Each model has exactly one base table associated
with it
Model: DDL Creating a Model
Create an Application Table
CREATE TABLE family_table (
id NUMBER, family_triple SDO_RDF_TRIPLE_S);
Create a Model
EXEC SDO_RDF.CREATE_RDF_MODEL(
family, family_table, family_triple );
Automatically creates the following database
rdfm_family Loading RDF Data into Oracle
Java API provided to load NTriple into NDM
Sample XSLs provided
To convert RDF to NTriple
To convert RDF to INSERT statements
Model: DML
SQL DML commands may be used to do DML
operations on a base table to effect DML (i.e., triple
insert, delete, and update) on the corresponding
model
Insert Triples
INSERT INTO family_table VALUES (1,
SDO_RDF_TRIPLE_S( family',
'',
'',
Model: Security
The creator of the base table corresponding to a
model can grant privileges to other users
To perform DML to a model, a user must have DML
privileges for the corresponding base table
The creator of a model can grant QUERY privileges
on the corresponding database view to other users
A user can query only those models for which s/he
has QUERY privileges to the corr. database views
Only the creator of a model can drop the model
Model: Views
Database views corresponding to the models
RDF Rulebases
Rulebase: DDL
Procedures provided as part of the API may
be used to
Create a rulebase
create_rulebase('family_rb');
Drop a rulebase
drop_rulebase('family_rb');
When a user creates a rulebase, a database
rdfr_family_rb (rule_name,
antecedent, filter, consequent, aliases)
Rulebase: DML
SQL DML commands may be used on the
database view corresponding to a target
rulebase to insert, delete, and update rules
insert into mdsys.rdfr_family_rb values(
uncle_rule',
(?x :brotherOf ?y) (?y :parentOf ?z),
NULL,
'(?x :uncleOf ?z)',
SDO_RDF_Aliases ;
Rulebase: Security
Creator of a rulebase can grant privileges to
the corresponding database view to other
users
Performing DML operations requires invoker
to have appropriate privileges on the
database view
Only the creator of a rulebase can drop the
rulebase
Rulebase: Views
RDF_RULEBASE_INFO
Contains the list of rulebases
For each rulebase, contains additional
information (such as, creator, view name, etc)
Content of each rulebase is available from the
corresponding database view
RDF Rule Indexes
Rule Index: DDL
Procedures provided as part of the API may be used
to
Create a rule index
create_rules_index ('family_rb_rix_family,
SDO_RDF_Models('family'),
SDO_RDF_Rulebases( rdfs','family_rb
Drop a rule index
drop_rules_index ('family_rb_rix_family');
When a user creates a rule index, a database view
gets created automatically
rdfi_family_rb_rix_family Rule Index: Security
To create a rule index on an RDF dataset
(models and rulebases), user needs to have
QUERY privileges on those models and
rulebases
Creator of a rule index holds QUERY privilege
on the rule index and may grant this privilege
to other users
Only the creator of a rule index can drop it
Rule Index: Views
RDF_RULEINDEX_INFO
Contains the list of rule indexes
For each rule index, contains additional
information (such as, creator, status, etc)
RDF_RULEINDEX_DATASETS
For every rule index, stores the names of its
models and rulebases
Rule Index: Dependencies
Content of a rule index depends upon the
content of each element of its dataset
Any modification to the models or rulebases in its
dataset invalidates the rule index
Dropping a model or rulebase will drop
dependent rule indexes automatically.
Summary
RDF Data Model
Models (Graphs)
RDF Query using SDO_RDF_MATCH Table Function
RDF Data Model with (user-defined) Rules
Models (Graphs)
Rulebases
Rule Indexes
RDF Query on entailed RDF graphs
Management (DDL, DML, Security, )
Models, Rulebases, and Rule Indexes
RDF Data Model Demo
Demo: Family Schema
Demo: Family Schema 2
Demo: Family Model Data
Demo: Family Model Data (Alt)
Demo: Query without Inference
select m from TABLE(SDO_RDF_MATCH(
'(?m rdf:type :Male)',
SDO_RDF_Models('family'),
null,
SDO_RDF_Aliases(
SDO_RDF_Alias 'http://www.example.org/family
null));
M
http://www.example.org/family/Jack
http://www.example.org/family/Tom
Demo: Query w/ RDFS Inference
select m from TABLE(SDO_RDF_MATCH(
'(?m rdf:type :Male)',
SDO_RDF_Models('family'),
SDO_RDF_Rulebases( RDFS ),
SDO_RDF_Aliases(
SDO_RDF_Alias 'http://www.example.org/family
null));
M
http://www.example.org/family/Jack
http://www.example.org/family/Tom
http://www.example.org/family/John
http://www.example.org/family/Matt
http://www.example.org/family/Sammy
Demo: Family Rulebase
Antecedent: (?x :parentOf ?y) (?y :parentOf ?z)
Filter: NULL
Consequent: (?x :grandParentOf ?z)
Demo: Query w/ Family and RDFS
Inference
select x, y from TABLE(SDO_RDF_MATCH(
'(?x :grandParentOf ?y) (?x rdf:type :Male)',
SDO_RDF_Models('family'),
SDO_RDF_Rulebases('RDFS','family_rb'),
SDO_RDF_Aliases(
SDO_RDF_Alias http://www.example.org/family
null));
X Y
http://www.example.org/family/John http://www.example.org/family/Cindy
http://www.example.org/family/John http://www.example.org/family/Tom
http://www.example.org/family/John http://www.example.org/family/Jack
http://www.example.org/family/John http://www.example.org/family/Cathy
QUESTIONS
ANSWERS
Demo of Siderean s Seamark
Navigation Server
Mike DiLascio & Joanne Luciano
Agenda
About Siderean Software & Predictive
Medicine, Inc.
Introducing Seamark Navigation Server v.3.6
Seamark & Oracle 10g RDF Data Model
Demonstration of Seamark / Oracle 10g
integration
Lessons Learned / Q&A
About Siderean Software
Aggregate, organize and navigate information
-the way users think
-to improve analysis and decision making.
Founded in 2001 and based in El Segundo, CA
Ventured backed in 2004
Delivering RDF-centric navigation and analysis capabilities
for end users (a.k.a. - the last mile )
Active W3C member leveraging Semantic Web standards
Demonstrating integrated Seamark navigation layer over
Oracle 10g RDF Data Model in collaboration with
Predictive Medicine, Inc.
Current solutions
50,000 results! Now what? I give up! Hello? Get me an apple! Why do I get oranges when I m looking
for apples?
IT: CONTENT PRODUCER:
As soon as I fix his, I just produced three apples
hers stops working. last week!
Enterprise search Knowledge management
a brute force approach breathtakingly expensive
Introducing Seamark Navigation Server
I can see the big picture! No more staring at a blank text box. I can drill down quickly to what I want.
IT: CONTENT PRODUCER:
I can take my coffee I knew we had an apple in
break now. here somewhere.
Seamark layering organization to deliver pinpoint navigation
How it works: process
View View
Term
Person
Text
Place
Event
Metadata about Organized into a unified Analyzed to generate Providing pinpoint
data and content information architecture on-demand views navigation across
is aggregated the data and content
How it works: architecture
User Navigation
and User Tagging
Unstructured Content
and Data Feeds
Web Browsers
& Portals
User Alerts
Search Engines
Metadata Navigation Navigation
Aggregator Metadata Web Services
Feed Aggregators
Structured Content
Sources
Seamark/Oracle integration
architecture: Phase 1
User Navigation
and User Tagging
Web Browsers
& Portals
User Alerts
Batch RDFMatch
Oracle 10g Cached
Query issued from Navigation
RDF Data Navigation
Seamark at Web Services
Model for Metadata
index time
scalable
persistence of
Feed Aggregators
metadata
Seamark/Oracle integration
architecture: Phase 2
User Navigation
and User Tagging
Web Browsers
& Portals
User Alerts
Oracle 10g Federated RDFMatch Dynamic Navigation
RDF Data Queries issued from Navigation Web Services
Model for Seamark at query time Metadata
scalable
persistence of
metadata Feed Aggregators
Seamark Demo: Background & Concepts
Life Sciences demonstration premise
RDF offers high value during early stage research
Leveraging strengths of Oracle 10g & Seamark v3.6
Oracle large datasets / scalability
Seamark useful subsets / flexible navigation & insights
Project elapsed time - about one week
Locating and identifying data sources represented the
greatest time element
Data sources in RDF required minimal integration time
Non-RDF data sources required transformation and linking
values (non-trivial but straightforward)
Seamark Demonstration: Identification of new drug candidates
1. Differentiate different forms
GO2Keyword.rdf
Keywords.rdf
of disease
2. Identify patients subgroups.
ProbeSet.rdf
3. Identify top biomarkers
4. Identify function
Keyword
GO2OMIM.rdf
GO2UniProt.rdf Probe
5. Identify biological and
chemical properties and
Protein
disease associations of
Gene
biomarker
MIM Id
OMIM.rdf 6. Identify documents
7. Identify role in metabolic
IntAct.rdf
GO.rdf
pathways
GO2Enzyme.rdf
UniProt.rdf Enzyme
Organism
8. Identify compounds that
interact
Citation
9. Identify and compare
Compound
function in other organisms
Taxonomy.rdf
Enzymes.rdf
PubMed.xml KEGG.rdf
10. Identify any prior art
Pathway
Live Seamark Life Sciences
Demonstration:
Sample Screenshots
Seamark application start page shows integration of OMIM, GO, KEGG, UniProt and NCBI
Select: Probe Set ID: M18255_cds2_s_at
Results: 9 Matches on M18255_cds2_s_at to the Gene Ontology
Cytoplasm 1st of 9 Matches
Cellular Location Via Gene Ontology
Cytoplasm 1st of 9 Matches
Page Scroll
Cytoplasm 1st of 9 Matches
Page Scroll
Plasma Membrane,, 2nd of 9 Matches
Cellular Location Via Gene Ontology
Page Scroll for more results, etc.
Start Page: Optionally search across entire collection based upon
keywords from the integrated data sources
Seamark Lessons Learned
RDF offers multiple unconstrained views of
data/relationships
Provides maximum flexibility during early stage research
Later stages can leverage OWL to constrain known
relationships
Data providers Timing is right to publish in RDF format
Cut your customer s integration costs
Speed discovery time
Even with one week of effort
Proof of Concept demonstrates value of broad & deep
integration
Additional value in extending POC in customer pilot initiatives
Siderean Seamark Conclusion
Getting the precise
information we need from
today s data glut is
profoundly difficult
Solving this problem
requires a solution that
works the way you think
Siderean is the world s first
turnkey navigation server
for the enterprise and
people at large
To arrange a demonstration of Seamark or
Thank You! for more information please contact:
Mike DiLascio
Office: +1-781-***-****
Mobile: +1-781-***-****
*********@********.***
Siderean Software, Inc.
390 North Sepulveda Blvd., Suite 2070
El Segundo, CA 90245-4475 USA
http://www.siderean.com