I've been doing alot of thinking (and prototyping) of Symmetric Multi Processor evaluation of RDF queries against large RDF datasets. How could you Map/Reduce an evaluation of SPARQL / Versa (ala HbaseRDF), for instance? Entailment-free Graph matching is the way to go at large volumes. I like to eat my own dog food, so I'll start with my toolkit.
Let us say you had a large RDF dataset of historical characters modelled in FOAF, where the Graph names match the URI's assigned to the characters. How would you evaluate a query like the follwing (both serially and in concurrent fashion)?
BASE <http://xmlns.com/foaf/0.1/> PREFIX rel: <http://purl.org/vocab/relationship/> PREFIX rdfg: <http://www.w3.org/2004/03/trix/rdfg-1/> SELECT ?mbox WHERE { ?kingWen :name "King Wen"; rel:parentOf ?son . GRAPH ?son { [ :mbox ?mbox ] }. }
Written in a the SPARQL abstract syntax:
Join(BGP(?kingWen foaf:name "King Wen". ?kingWen rel:parentOf ?son),Graph(?son,BGP(_:a foaf:mbox ?mbox)))
If you are evaluating it niavely, it would reduce (via algebra) to:
Join(eval(Dh, BGPa), eval(Dh, Graph(?son,BGPb)))
Where Dh denotes the RDF dataset of historical literary characters, BGPa denotes the BGP expression
?kingWen foaf:name "King Wen". ?kingWen rel:parentOf ?son
BGPb denotes
_:a :mbox ?mbox
The current definition of the Graph operator as well as the one given by Perez, Jorge et.al. seem (to me) amenable to parallel evaluation. Let us take a look at the operational semantics of evuating the same query in Versa:
map( (*|-foaf:name->"King Wen")-rel:parentOf->*, 'scope(".-foaf:mbox->*",.)', )
Versa is a Graph-traversal-based RDF querying language. This has the advantage a computational graph theory that we can rely on for analysis of a query's complexity. Parallel evaluation of (directed) Graph traversals appear to be an open problem for a deterministic turing machine. The input to the map function would be the URIs of the sons of King Wen. The map function would be the evaluation of expression:
scope(".-foaf:mbox->*",.)
This would seem to be the equivalent of the evaluation of the Graph(..son URI..,BGPb) SPARQL abstract expression. So far, so good. Parallel evaluation can be implemented in a manner that is transparent to the application. An analysis of the evaluation using some complexity theory would be interesting to see if RDF named graph traversal queries have a data complexity that scales.