Symmetric Multi Processor Evaluation of an RDF Query

I've been doing alot of thinking (and prototyping) of Symmetric Multi Processor evaluation of RDF queries against large RDF datasets. How could you Map/Reduce an evaluation of SPARQL / Versa (ala HbaseRDF), for instance? Entailment-free Graph matching is the way to go at large volumes. I like to eat my own dog food, so I'll start with my toolkit.

Let us say you had a large RDF dataset of historical characters modelled in FOAF, where the Graph names match the URI's assigned to the characters. How would you evaluate a query like the follwing (both serially and in concurrent fashion)?

BASE  <>
PREFIX rel: <>
PREFIX rdfg: <>
SELECT ?mbox
    ?kingWen :name "King Wen";
             rel:parentOf ?son . 
    GRAPH ?son { [ :mbox ?mbox ] }.

Written in a the SPARQL abstract syntax:

Join(BGP(?kingWen foaf:name "King Wen". ?kingWen rel:parentOf ?son),Graph(?son,BGP(_:a foaf:mbox ?mbox)))

If you are evaluating it niavely, it would reduce (via algebra) to:

Join(eval(Dh, BGPa), eval(Dh, Graph(?son,BGPb)))

Where Dh denotes the RDF dataset of historical literary characters, BGPa denotes the BGP expression

?kingWen foaf:name "King Wen". ?kingWen rel:parentOf ?son

BGPb denotes

_:a :mbox ?mbox

The current definition of the Graph operator as well as the one given by Perez, Jorge seem (to me) amenable to parallel evaluation. Let us take a look at the operational semantics of evuating the same query in Versa:

   (*|-foaf:name->"King Wen")-rel:parentOf->*,

Versa is a Graph-traversal-based RDF querying language. This has the advantage a computational graph theory that we can rely on for analysis of a query's complexity. Parallel evaluation of (directed) Graph traversals appear to be an open problem for a deterministic turing machine. The input to the map function would be the URIs of the sons of King Wen. The map function would be the evaluation of expression:


This would seem to be the equivalent of the evaluation of the Graph(..son URI..,BGPb) SPARQL abstract expression. So far, so good. Parallel evaluation can be implemented in a manner that is transparent to the application. An analysis of the evaluation using some complexity theory would be interesting to see if RDF named graph traversal queries have a data complexity that scales.

Chimezie Ogbuji

via Copia