Copia

I've been doing alot of thinking (and prototyping) of Symmetric Multi Processor evaluation of RDF queries against large RDF datasets. How could you Map/Reduce an evaluation of SPARQL / Versa (ala HbaseRDF), for instance? Entailment-free Graph matching is the way to go at large volumes. I like to eat my own dog food, so I'll start with my toolkit.

Let us say you had a large RDF dataset of historical characters modelled in FOAF, where the Graph names match the URI's assigned to the characters. How would you evaluate a query like the follwing (both serially and in concurrent fashion)?

BASE  <http://xmlns.com/foaf/0.1/>
PREFIX rel: <http://purl.org/vocab/relationship/>
PREFIX rdfg: <http://www.w3.org/2004/03/trix/rdfg-1/>
SELECT ?mbox
WHERE {
    ?kingWen :name "King Wen";
             rel:parentOf ?son . 
    GRAPH ?son { [ :mbox ?mbox ] }.
}

Written in a the SPARQL abstract syntax:

Join(BGP(?kingWen foaf:name "King Wen". ?kingWen rel:parentOf ?son),Graph(?son,BGP(_:a foaf:mbox ?mbox)))

If you are evaluating it niavely, it would reduce (via algebra) to:

Join(eval(Dh, BGPa), eval(Dh, Graph(?son,BGPb)))

Where Dh denotes the RDF dataset of historical literary characters, BGPa denotes the BGP expression

?kingWen foaf:name "King Wen". ?kingWen rel:parentOf ?son

BGPb denotes

_:a :mbox ?mbox

The current definition of the Graph operator as well as the one given by Perez, Jorge et.al. seem (to me) amenable to parallel evaluation. Let us take a look at the operational semantics of evuating the same query in Versa:

map(
   (*|-foaf:name->"King Wen")-rel:parentOf->*,
   'scope(".-foaf:mbox->*",.)',
)

Versa is a Graph-traversal-based RDF querying language. This has the advantage a computational graph theory that we can rely on for analysis of a query's complexity. Parallel evaluation of (directed) Graph traversals appear to be an open problem for a deterministic turing machine. The input to the map function would be the URIs of the sons of King Wen. The map function would be the evaluation of expression:

scope(".-foaf:mbox->*",.)

This would seem to be the equivalent of the evaluation of the Graph(..son URI..,BGPb) SPARQL abstract expression. So far, so good. Parallel evaluation can be implemented in a manner that is transparent to the application. An analysis of the evaluation using some complexity theory would be interesting to see if RDF named graph traversal queries have a data complexity that scales.

Chimezie Ogbuji

via Copia

Copia

Ogbujis on an abundance of topics

Tag parallel

Symmetric Multi Processor Evaluation of an RDF Query