Symmetric Multi Processor Evaluation of an RDF Query

I've been doing alot of thinking (and prototyping) of Symmetric Multi Processor evaluation of RDF queries against large RDF datasets. How could you Map/Reduce an evaluation of SPARQL / Versa (ala HbaseRDF), for instance? Entailment-free Graph matching is the way to go at large volumes. I like to eat my own dog food, so I'll start with my toolkit.

Let us say you had a large RDF dataset of historical characters modelled in FOAF, where the Graph names match the URI's assigned to the characters. How would you evaluate a query like the follwing (both serially and in concurrent fashion)?

BASE  <http://xmlns.com/foaf/0.1/>
PREFIX rel: <http://purl.org/vocab/relationship/>
PREFIX rdfg: <http://www.w3.org/2004/03/trix/rdfg-1/>
SELECT ?mbox
WHERE {
    ?kingWen :name "King Wen";
             rel:parentOf ?son . 
    GRAPH ?son { [ :mbox ?mbox ] }.
}

Written in a the SPARQL abstract syntax:

Join(BGP(?kingWen foaf:name "King Wen". ?kingWen rel:parentOf ?son),Graph(?son,BGP(_:a foaf:mbox ?mbox)))

If you are evaluating it niavely, it would reduce (via algebra) to:

Join(eval(Dh, BGPa), eval(Dh, Graph(?son,BGPb)))

Where Dh denotes the RDF dataset of historical literary characters, BGPa denotes the BGP expression

?kingWen foaf:name "King Wen". ?kingWen rel:parentOf ?son

BGPb denotes

_:a :mbox ?mbox

The current definition of the Graph operator as well as the one given by Perez, Jorge et.al. seem (to me) amenable to parallel evaluation. Let us take a look at the operational semantics of evuating the same query in Versa:

map(
   (*|-foaf:name->"King Wen")-rel:parentOf->*,
   'scope(".-foaf:mbox->*",.)',
)

Versa is a Graph-traversal-based RDF querying language. This has the advantage a computational graph theory that we can rely on for analysis of a query's complexity. Parallel evaluation of (directed) Graph traversals appear to be an open problem for a deterministic turing machine. The input to the map function would be the URIs of the sons of King Wen. The map function would be the evaluation of expression:

scope(".-foaf:mbox->*",.)

This would seem to be the equivalent of the evaluation of the Graph(..son URI..,BGPb) SPARQL abstract expression. So far, so good. Parallel evaluation can be implemented in a manner that is transparent to the application. An analysis of the evaluation using some complexity theory would be interesting to see if RDF named graph traversal queries have a data complexity that scales.

Chimezie Ogbuji

via Copia

Patterns and Optimizations for RDF Queries over Named Graph Aggregates

In a previous post I used the term 'Conjunctive Query' to refer to a kind of RDF query pattern over an aggregation of named graphs. However, the term (apparently) has already-established roots in database querying and has a different meaning that what I intended. It's a pattern I have come across often and is for me a major requirement for an RDF query language, so I'll try to explain by example.

Consider two characters, King (Wen) and his heir / son (Wu) of the Zhou Dynasty. Let's say they each have a FOAF graph about themselves and the people they know within a larger database which holds the FOAF graphs of every historical character in literature.

The FOAF graphs for both Wen and Wu are (preceeded by the name of each graph):

<urn:literature:characters:KingWen>

@prefix : <http://xmlns.com/foaf/0.1/>.
@prefix rel: <http://purl.org/vocab/relationship/>.

<http://en.wikipedia.org/wiki/King_Wen_of_Zhou> a :Person;
    :name “King Wen”;
    :mbox <mailto:kingWen@historicalcharacter.com>;
    rel:parentOf [ a :Person; :mbox <mailto:kingWu@historicalcharacter.com> ].

<urn:literature:characters:KingWu>

@prefix : <http://xmlns.com/foaf/0.1/>.
@prefix rel: <http://purl.org/vocab/relationship/>.

<http://en.wikipedia.org/wiki/King_Wu_of_Zhou> a :Person;
    :name “King Wu”;
    :mbox <mailto:kingWu@historicalcharacter.com>;
    rel:childOf [ a :Person; :mbox <mailto:kingWen@historicalcharacter.com> ].

In each case, Wikipedia URLs are used as identifiers for each historical character. There are better ways for using Wikipedia URLs within RDF, but we'll table that for another conversation.

Now lets say a third party read a few stories about “King Wen” and finds out he has a son, however, he/she doesn't know the son's name or the URL of either King Wen or his son. If this person wants to use the database to find out about King Wen's son by querying it with a reasonable response time, he/she has a few thing going for him/her:

  1. foaf:mbox is an owl:InverseFunctionalProperty and so can be used for uniquely identifying people in the database.
  2. The database is organized such that all the out-going relationships (between foaf:Persons – foaf:knows, rel:childOf, rel:parentOf, etc..) of the same person are asserted in the FOAF graph associated with that person and nowhere else.
    So, the relationship between King Wen and his son, expressed with the term ref:parentOf, will only be asserted in
    urn:literature:characters:KingWen.

Yes, the idea of a character from an ancient civilization with an email address is a bit cheeky, but foaf:mbox is the only inverse functional property in FOAF to use to with this example, so bear with me.

Now, both Versa and SPARQL support restricting queries with the explicit name of a graph, but there are no constructs for determining all the contexts of an RDF triple or:

The names of all the graphs in which a particular statement (or statements matching a specific pattern) are asserted.

This is necessary for a query plan that wishes to take advantage of [2]. Once we know the name of the graph in which all statements about King Wen are asserted, we can limit all subsequent queries about King Wen to that same graph without having to query across the entire database.

Similarly, once we know the email of King Wen's son we can locate the other graphs with assertions about this same email address (knowing they refer to the same person [1]) and query within them for the URL and name of King Wen's son. This is a significant optimization opportunity and key to this query pattern.

I can't speak for other RDF implementations, but RDFLib has a mechanism for this at the API level: a method called quads((subject,predicate,object)) which takes three terms and returns a tuple of size 4 which correspond to the all triples (across the database) that match the pattern along with the graph that the triples are asserted in:

for s,p,o,containingGraph in aConjunctiveGraph.quads(s,p,o):
  ... do something with containingGraph ..

It's likely that most other QuadStores have similar mechanisms and given the great value in optimizing queries across large aggregations of named RDF graphs, it's a strong indication that RDF query languages should provide the means to express such a mechanism.

Most of what is needed is already there (in both Versa and SPARQL). Consider a SPARQL extension function which returns a boolean indicating whether the given triple pattern is asserted in a graph with the given name:

rdfg:AssertedIn(?subj,?pred,?obj,?graphIdentifier)

We can then get the email of King Wen's son efficiently with:

BASE  <http://xmlns.com/foaf/0.1/>
PREFIX rel: <http://purl.org/vocab/relationship/>
PREFIX rdfg: <http://www.w3.org/2004/03/trix/rdfg-1/>

SELECT ?mbox
WHERE {
    GRAPH ?foafGraph {
      ?kingWen :name “King Wen”;
                       rel:parentOf [ a :Person; :mbox ?mbox ].
    }  
     FILTER (rdfg:AssertedIn(?kingWen,:name,”King Wen”,?foafGraph) ).
}

Now, it is worth noting that this mechanism can be supported explicitly by asserting provenance statements associating the people the graphs are about with the graph identifiers themselves, such as:

<urn:literature:characters:KingWen> 
  :primaryTopic <http://en.wikipedia.org/wiki/King_Wen_of_Zhou>.

However, I think that the relationship between an RDF triple and the graph in which it is asserted, although currently outside the scope of the RDF model, should have it's semantics outlined in the RDF abstract syntax instead of using terms in an RDF vocabulary. The demonstrated value in RDF query optimization makes for a strong argument:

BASE  <http://xmlns.com/foaf/0.1/>
PREFIX rel: <http://purl.org/vocab/relationship/>
PREFIX rdfg: <http://www.w3.org/2004/03/trix/rdfg-1/>

SELECT ?kingWu,  ?sonName
WHERE {
    GRAPH ?wenGraph {
      ?kingWen :name “King Wen”;
                       :mbox ?wenMbox;
                       rel:parentOf [ a :Person; :mbox ?wuMbox ].
    }  
    FILTER (rdfg:AssertedIn(?kingWen,:name,”King Wen”,?wenGraph) ).
    GRAPH ?wuGraph {
      ?kingWu :name ?sonName;
                     :mbox ?wuMbox;
                     rel:childOf [ a :Person; :mbox ?wenMbox  ].
    }  
     FILTER (rdfg:AssertedIn(?kingWu,:name,?sonName,?wuGraph) ).
}

Generally, this pattern is any two-part RDF query across a database (a collection of multiple named graphs) where the scope of the first part of the query is the entire database, identifies terms that are local to a specific named graph, and the scope of the second part of the query is this named graph.

Chimezie Ogbuji

via Copia

Comprehensive RDF Query API's for RDFLib

[by Chimezie Ogbuji]

RDFLib's support for SPARQL has come full circle and I wasn't planning on blogging on the developments until they had settled some – and they have. In particular, the last piece was finalizing a set of APIs for querying and result processing that fit well within the framework of RDFLib's various Graph API's. The other issue was for the query APIs to accomodate eventual support for other querying languages (Versa for instance) that are capable of picking up the slack where SPARQL is wanting (transitive closures, for instance – try composing a concise SPARQL query for calculating the transitive closure of a given node along the rdfs:subClassOf property and you'll immediately see what I mean).

Querying

Every Graph instance now has a query method through which RDF queries can be dispatched:

def query(self, strOrQuery, initBindings={}, initNs={}, DEBUG=False, processor="sparql")
    """
    Executes a SPARQL query (eventually will support Versa queries with same method) against this Conjunctive Graph
    strOrQuery - Is either a string consisting of the SPARQL query or an instance of rdflib.sparql.bison.Query.Query
    initBindings - A mapping from variable name to an RDFLib term (used for initial bindings for SPARQL query)
    initNS - A mapping from a namespace prefix to an instance of rdflib.Namespace (used for SPARQL query)
    DEBUG - A boolean flag passed on to the SPARQL parser and evaluation engine
    processor - The kind of RDF query (must be 'sparql' until Versa is ported)
    """

The first argument is either a query string or a pre-compiled query object (compiled using the appropriate BisonGen mechanism for the target query language). Pre-compilation can be useful for avoiding redundant parsing overhead for queries that need to be evaluated repeatedly:

from rdflib.sparql.bison import Parse
queryObject = Parse(sparqlString)

The next argument (initBindings) is dictionary that maps variables to their values. Though variables are common to both languages, SPARQL variables differ from Versa queries in that they are string terms in the form “?varName”, wherease in Versa variables are QNames (same as in Xpath). For SPARQL queries the dictionary is expected to be a mapping from variables to RDFLib terms. This is passed on to the SPARQL processor as initial variable bindings.

initNs is yet another top-level parameter for the query processor: a namespace mapping from prefixes to namespace URIs.

The debug flag is pretty self explanatory. When True, it will cause additional print statements to appear for the parsing of the query (triggered by BisonGen) as well as the patterns and constraints passed on to the processor (for SPARQL queries).

Finally, the processor specifies which kind of processor to use to evaluate the query: 'versa' or 'sparql'. Currently (with emphasis on 'currently'), only SPARQL is supported.

Result formats

SPARQL has two result formats (JSON and XML). Thanks to Ivan Herman's recent contribution the SPARQL processor now supports both formats. The query method (above) returns instances of QueryResult, a common class for RDF query results which define the following method:

def serialize(self,format='xml')

The format argument determines which result format to use. For SPARQL queries, the allowable values are: 'graph' – for CONSTRUCT / DESCRIBE queries (in which case a resulting Graph object is returned), 'json',or 'xml'. The resulting object also behaves as an iterator over the bindings for manipulation in the host language (Python).

Versa has it's own set of result formats as well. Primarily there is an XML result format (see: Versa by Example) as well as Python classes for the various internal datatypes: strings,resources,lists,sets,and numbers. So, the eventual idea is for using the same method signature for serializing Versa queries as XML as you would for SPARQL queries.

SPARQL Limitations

The known SPARQL query forms that aren't supported are:

  • DESCRIBE/CONSTRUCT (very temporary)
  • Nested Group Graph Patterns
  • Graph Patterns can only be used (once) by themselves or with OPTIONAL patterns
  • UNION patterns can only be combined with OPTIONAL patterns
  • Outer FILTERs which refer to variables within an OPTIONAL

Alot of the above limitations can be addressed with formal equivalence axioms of SPARQL semantics, such as those mentioned in the recent paper on the complexity and semantics of SPARQL. Since there is very little guidance on the semantics of SPARQL I was left with the option of implementing only those equivalences that seemed obvious (in order to support the patterns in the DAWG test suite):

1) { patternA { patternB } } => { patternA. patternB }
2) { basicGraphPatternA OPTIONAL { .. } basicGraphPatternB }
  =>
{ basicGraphPatternA+B OPTIONAL { .. }}

It's still a work in progress but has come quite a long way. CONSTRUCT and DESCRIBE are already supported by the underlying processor, I just need to find some time to hook it up to the query interfaces. Time is something I've been real short of lately.

Chimezie Ogbuji

via Copia

Extension Functionality and Set Manipulation in RDF Query Languages

A recent bit by Andy Seaborne (on Property Functions in ARQ – Jena's query engine) got me thinking about general extension mechanisms for RDF querying languages.
In particular, he mentions two extensions that provide functionality for processing RDF lists and collections which (ironically) coincide with functions I had requested be considered for later generations of Versa.

The difference, in my case, was that the suggestions were for functions that cast RDF lists into Versa lists (or sets) – which are data structures native to Versa that can be processed with certain built-in functions.

Two other extensions I use quite often in Versa (and mentioned briefly in my XML.com article) are scope, and scoped-subquery. These have to do with identifying the context of a resource and limiting the evaluation of a query to within a named graph, respectively. Currently, the scoped function returns a list of the names of all graphs in which the resource is asserted as a member of any class (via rdf:type). I could imagine this being expanded to include the names of any graph in which statements about the resource are asserted. scoped-subquery doesn't present much value for a host language that can express queries as limited to a named context.

I also had some thoughts about an extension function mechanism that allowed an undefined function reference (for functions of arity 1 – i.e. functions that take only a single argument) to be interpreted as a request for all the objects of statements where the predicate is the function URI and the subject is the argument

I recently finished writing a SPARQL grammar for BisonGen and hope to conclude that effort (at some point) once I get over a few hurdles. I was pleasantly surprised to find that the grammar for function invocation is pretty much identical for both query languages. Which suggests that there is room for some thought about a common mechanism (or just a common set of extension functionality – similar to the EXSLT effort) for RDF querying or general processing.

CWM has a rich, and well documented set of RDF extensions. The caveat is that the method signatures are restricted to dual input (subject and object) since the built-ins are expressed as RDF triples where the predicate is the name of the function and the subject and object are arguments to is. Nevertheless, it is a good source from which an ERDF specification could be drafted.

My short wish-list of extension functions in such an effort would include:

  • List comprehension (intersection, union, difference, membership, indexing, etc.)
  • Resolving the context for a given node: context(rdfNode) => URI (or BNode?)
  • an is-a(resource) function (equivalent to Versa's type function without the entailment hooks)
  • a class(resource) which is an inverse of is-a
  • Functions for transitive closures and/or traversals (at the very least)
  • A fallback mechanism for undefined functions that allowed them to be interpreted as unary 'predicate functions'

Of course, functions which return lists instead of single resources would be problematic for any host language that can't process lists, but it's just some food for thought.

Chimezie Ogbuji

via Copia

Corrections to RDF Query Language Comparison

I recently came upon this dated comparison of features in RDF querying languages. It predates SPARQL and as a result prompted Dan Connoly to attempt to demonstrate how SPARQL fares in this matrix. In reading it, I realized some of the features marked as 'No' under Versa are incorrect. So here is my attempt to demonstrate how (current Versa specification) would implement these requirements:

5 Quantification: Return the persons who are authors of all publications

The section name is misleading as it suggests the use of FOPL semantics to resolve a pattern that can be solved without FOPL semantics:

filter(
        'all()',
        'eq(length(difference(all()-dc:creator->*. - dc:creator->*)),0)'
    )

10 Namespace: Return all resources whose namespace starts with "http://www.aifb.uni-karlsruhe.de/".

filter(
        'all()',
        'starts-with("http://www.aifb.uni-karlsruhe.de/")'
    )

14 Entailment: Return all instances of that are members of the class Publication.

The current specification says (about rdf:type entailment):

Returns a list of all resources of a specified type, as defined by RDFS and optionally DAML schema specifications ...

So, the implementation has the option to account for entailment rules (as the 4Suite implementation does):

type(resource('#Publication'))

Chimezie Ogbuji

via Copia

Hooking up an IRC Agent to a Query Interface

Uche gave an excellent suggestion to augment Emeka to work with Triclops. After finishing Triclops, I had realized that most of the functionality Emeka provided was now redundant since it could be performed using Triclops (with the added advantage of being able to diagram/navigate RDF Graphs). Triclops URIs are unfortunately very long for queries submitted through HTTP GET. This is unfortunately unavoidable mostly because the parameters to the query service are Versa queries (which have their own syntax that includes characters that may not be very friendly for URIs) and URIs, so the prospect of submitting URIs to a service identified to a URI will probably always be problematic and might be as fundamental as submitting XML content through an XML-based messaging format. I'm digressing, but the solution that worked for me (which Uche suggested) was to make use of a URI 'shortening' service (such as MakeAShorterLink - which I used) to generate a 'hashed' or 'compacted' form of the URI from the parts submitted from , say, IRC. For example, lets say I was interested in Daniel Krech's redfoot framework and wanted to navigate the ontology (available from the URL: http://redfoot.net/2005/redfoot) as an RDF diagram in order to get an idea of the components and how they related. In particular, lets say I knew to start with the redfoot:Code class, I could interactively ask Emeka to return to execute the query:

redfoot:Code

against the URL which returns the RDF/XML serialization of the redfoot ontology. I could also ask Emeka to have the result rendered as a JPEG diagram with an associated image map that allows me to navigate to other aspects of the ontology to learn more about redfoot. Below is how I would do so with the new .triclops service:

<chimezie> .triclops "redfoot:Code" http://redfoot.net/2005/redfoot jpeg
<Emeka> Shortened link to Triclops request: http://makeashorterlink.com/?I1F7262EB

Below is the help associated with the triclops service.

<chimezie> .help triclops
<Emeka> .triclops "..Versa Query.." <source graph> <triples|jpeg|svg|versa>
<Emeka> The first argument is quoted Versa query to execute.  The second is a URL which points to the RDF graph (N3 or RDF/XML) to query against
<Emeka> The third argument is one of "triples", "jpeg", or "svg" and specifies how to return the query result
<Emeka> "triples" - raw triples in a tabled-grid, "jpeg" or "svg" - as navigable RDF graphs, and "versa" - raw Versa datatypes (rendered as html)
<Emeka> The result is a uri (courtesy of http://makeashorterlink.com) which redirects to the appropriate Triclops request

[Uche Ogbuji]

via Copia

BNode Drama for your Mama

You know you are geek when it's 5am in the morning and you are wrestling with existential quantification and their value in querying. This was triggered originally by the ongoing effort to extend an already expressive pattern-based RDF querying language to cover more usecases. The motivation being that such patterns should be expressive beyond just the level of triple-matching since the core RDF model consists of a level of granularity below statements (you have literals, resources, and bnodes, ..). I asked myself if there was a justifiable reason why Versa at it's core does not include BNodes?:

Blank nodes are treated as simply indicating the existence of a thing, without using, or saying anything about, the name of that thing. (This is not the same as assuming that the blank node indicates an 'unknown' URI reference; for example, it does not assume that there is any URI reference which refers to the thing. The discussion of Skolemization in appendix A is relevant to this point.)

I don't remember the original motivation for leaving BNodes out of the core query data types, but in retrospect I think it it was a good decision and not only because the SPARQL specification does something similar (in interpreting BNodes as an open-ended variable). But it's worth noting that the section on blank nodes appearing in a query as opposed to appearing to a query result (or existing in the underlying knowledge base) is quite short:

A blank node can appear in a query pattern. It behaves as a variable; a blank node in a query pattern may match any RDF term.

Anyways, at the time I noticed this lack of BNodes in query languages, I had a misconception about BNodes. I thought they represented individual things we want to make statements about but don't know their identification or don't want to have to worry about assigning identification about them (this is probably 90% of the way BNodes are used in reality). This confusion came from the practical way BNodes are almost always handled by RDF data stores (Skolemization):

Skolemization is a syntactic transformation routinely used in automatic inference systems in which existential variables are replaced by 'new' functions - function names not used elsewhere - applied to any enclosing universal variables. In RDF, Skolemization amounts to replacing every blank node in a graph by a 'new' name, i.e. a URI reference which is guaranteed to not occur anywhere else. In effect, it gives 'arbitrary' names to the anonymous entities whose existence was asserted by the use of blank nodes: the arbitrariness of the names ensures that nothing can be inferred that would not follow from the bare assertion of existence represented by the blank node.

This misconception was clarified when Bijan Parsia ({scope(PyChinko)} => {scope(FuXi)}) expressed that he had issue with my assertion(s) that there are some compromising redundancies with BNodes, Literals, and simple entailment with regards to building programmatic APIs for them.

Then the light bulb went off that the semantics of BNodes are (as he put it) much stronger than they are most often used. Most people who use BNodes don't mean to use it to state that there is a class of things which have the asserted set of statements made about them. Consider the difference between:

  1. Who are all the people Chime knows?
  2. There is someone Chime knows, but I just don't know his/her name right now
  3. Chime knows someone! (dudn't madder who)

The first scenario is the basic use case for variable resolution in an RDF query and is asking for the resolution of variable ?knownByChime in:

<http://metaacognition.info/profile/webwho.xrdf#chime> foaf:knows ?knownByChime.

Which can be [expressed] in Versa (currently) as:

resource('http://metacognition.info/profile/webwho.xrdf#chime')-foaf:knows->*

Or eventually (hopefully) as:

foaf:knows(<http://metacognition.info/profile/webwho.xrdf#chime>)

And in SPARQL as:

select 
  ?knownByChime 
where  
{
  <http://metacognition.info/profile/webwho.xrdf#chime> foaf:knows ?knownByChime
}

The second case is the most common way people use BNodes. You want to say Chime knows someone but don't know a permanent identifier for this person or care to at the time you make the assertion:

http://metaacognition.info/profile/webwho.xrdf#chime foaf:knows _:knownByChime

But RDF-MT specifically states that BNodes are not meant to be interpreted in this way only. Their semantics are much stronger. In fact, as Bijaan pointed out to me, the proper use for BNodes is as scoped existentials within ontological assersions. For example owl:Restrictions which allow you to say things like: The named class KnowsChime consists of everybody who knows Chime:

@prefix mc <http://metaacognition.info/profile/webwho.xrdf#>.
  @prefix owl <http://www.w3.org/2002/07/owl#>.
  :KnowsChime a owl:Class;
        rdfs:subClassOf 
        [
          a owl:Restriction;
          owl:onProperty foaf:knows;
          owl:hasValue mc:chime
        ];
        rdfs:label "KnowsChime";
        rdfs:comment "Everybody who knows Chime";

The fact that BNodes aren't meant to be used in the way they often are leads to some suggested modifications to allow BNodes to be used as 'temporary identifiers' in order to simplify query resolution. But as clarified in the same thread, BNodes in a query doesn't make much sense - which is the conclusion I'm coming around to: There is no use case for asserting an existential quantification while querying for information against a knowledge base. Using a variable (in the way SPARQL does) should be sufficient. In fact, all RDF querying usecases (and languages) seem to be reducable to variable resolution.

This last part is worth noting because it suggests that if you have a library that handles variable resolution (such as rdflib's most recent addition) you can map any query language to (Versa/SPARQL/RDFQueryLanguage_X) it by reducing it to a set of triple patterns with the variables you wish to resolve.

So my conclusions?:

  • Blank Nodes are a neccessary component in the model (and any persistence API) that unfortunately have much stronger semantics (existential quanitifcation) than their most common use (as temporary identifiers)
  • The distinction between the way BNodes are most often used (as a syntactic shorthand for a single resource for which there is no known identity - at the time) and the formal definition of BNodes is very important to note - especially to those who are very much wed to their BNodes as Shelly Powers has shown to be :).
  • Finally, BNodes emphatically do not make sense in the context of a query - since they become infinitely resolvable variables: which is not very useful. This confusion is further proof that (once again), for the sake of minimizing said confusion and misinterpretation of some very complicated axioms there is plenty value in parenthetically (if not logically) divorcing (pun intended) RDF model theoretics from the nuts and bolts of the underlying model

Chimezie Ogbuji

via Copia

Triclops - Resurrected

Like a pheonix from the flames, I've resurrected an old RDF querying interface called Triclops. It used to be coupled with the 4Suite repository's defunct Dashboard (which I've also tried to resurrect as an XForms interface to a live 4Suite repository - there is more to come on that front, thanks to FormFaces) but I've broken it out into it's own stand alone application. It's driven by this stylesheet which makes use of two XSLT extensions (http://metacognition.info/extensions/remote-query.dd#query and http://metacognition.info/extensions/remote-query.dd#graph) both of which are defined here.

I've updated the tabled,triple result page to include ways to navigate graphs by clicking on subjects (which takes you to a subsequent triple result page for that particular subject), predicates (which takes you to another triple result page with statements which use that predicate), and objects (which allows you to jump from graph to graph along rdfs:seeAlso / rdfs:isDefinedBy relationships). Note that it assumes the objects of rdfs:seeAlso and rdfs:isDefined by are live URIs that return an RDF graph (the most common use for these is for FOAF networks and relationships between ontologies).

I've also included buttons for common, 'canned' queries that can be executed on any graph, such as:

  • All classes: type(list(rdfs:Class,owl:Class))
  • rss:items: type(rss:item)
  • Dated Resources: all()|-dc:date->*
  • Today's Items: all()|-dc:date->contains(.,".. today's date ..")
  • Annotated Resources: all()|-list(rdfs:label,rdfs:comment,dc:title,dc:description,rss:title,rss:description)->*
  • Ontologies: type(owl:Ontology)
  • People: type(foaf:Person)
  • Everything: all()

In addition, I've added some documentation

P.S.: The nodes in the RDF diagrams generated by Triclops (as an alternative to raw triples) are live links. The JPEG diagrams are associated with image maps (generated by graphviz) that allow you to click on the nodes and the SVG diagrams are rendered with links as well (depending on the level of maturity of your SVG viewer - this might or might not be the prefered diagram format). The SVG diagrams have alot potential for things such as styling and the other possibilities that are standard to SVG.

In all, I think it provides a very user-friendly way to quickly traverse, whittle, and circumnavigate the "Semantic Web" - whoops, there is that phrase again :)

4Suite Repository and 4Suite RDF have become sort of bastard children of recent 4Suite development and I've been focusing my efforts lately in moving the latter along - The former (the Repository) only lacks documentation IMHO as Metacognition is run entirely as a 4Suite Repository instance.

Chimezie Ogbuji

via Copia

Itinerant Binds - Better Software Documentation

It was brought to my attention that my recent entry about Sparta/Versa/rdflib possibilities was a little vague/unclear. This tends to happen when I get caught up in an interest. Anyways,.. I renamed the module to Itinerant Binds (I liked the term), created a page on Metacognition for the recent rdflib/4Suite RDF work I've been doing with some more details on how the components works. I added an example that better demonstrates isolating RDF resources through targeted Versa queries and using the bound python result objects to modify / extend the underlying graph.

Chimezie Ogbuji

via Copia

Wrapping rdflib's Graph around a 4RDF Model

Well, for some time I had pondered what it would take fo provide SPARQL support in 4Suite RDF. I fell upon sparql-p, earlier and noticed it was essentially a SPARQL query processor w/out a parser to drive it. It works over a deprecated rdflib interface: TripleStore. The newly suggested interface is Graph, which is as solid suggestion for a generic RDF:API as any. So, I wrote a 4Suite RDF model backend for rdflib, that allows the wrapping of Graph around a live 4Suite RDF model. Finally, I used this backend to execute a sparql-p query over http://http://del.icio.us/rss/chimezie:

SELECT
  ?title
WHERE {
  ?item rdf:type rss:item;
        dc:subject ?subj;
        rss:title ?title.
        FILTER (REGEX(?subj,".*rdf")).
}

The corresponding python code:

#Setup FtRDF Model
Memory.InitializeModule()   
db = Memory.GetDb('rules', 'test')
db.begin()
model = Model.Model(db)

#Parse my del.icio.us rss feed
szr = Dom.Serializer()
domStr=urllib2.urlopen('http://del.icio.us/rss/chimezie').read()        
dom = Domlette.NonvalidatingReader.parseString(domStr,'http://del.icio.us/rss/chimezie')
szr.deserialize(model,dom,scope='http://del.icio.us/rss/chimezie')

#Setup rdflib.Graph with FtRDF Model as Backend, using FtRdf driver
g=Graph(FtRdf(model))

#Setup sparql-p query processor engine
select = ("?title")

#Setup term
copia = URIRef('http://del.icio.us/chimezie')
rssTitle = URIRef('http://purl.org/rss/1.0/title')
versaWiki = URIRef('http://en.wikipedia.org/wiki/Versa')
dc_subject=URIRef("http://purl.org/dc/elements/1.1/subject")

#Filter on objects of statements (dc:subject values) - keep only those containing the string 'rdf'
def rdfSubFilter(subj,pred,obj):
    return bool(obj.find('rdf')+1)

#Execute query
where = GraphPattern([("?item",rdf_type,URIRef('http://purl.org/rss/1.0/item')),
                       ("?item",dc_subject,"?subj",rdfSubFilter),
                       ("?item",rssTitle,"?title")])    
tStore = myTripleStore(FtRdf(model))
result = tStore.query(select,where)
pprint(result)

The result (which will change daily as my links shift thru my del.icio.us channel queue:

[chimezie@Zion RDF-API]$ python FtRdfBackend.py
[u'rdflibUtils',
 u'Representing Specified Values in OWL: "value partitions" and "value sets"',
 u'Sparta',
 u'planner-rdf',
 u'RDF Template Language 1.0',
 u'SIOC Vocabulary Specification',
 u'SPARQL in RDFLib',
 u'MeetingRecords - ESW Wiki',
 u'Enumerated datatypes (OWL)',
 u'Defining N-ary Relations on the Semantic Web: Use With Individuals']

Chimezie Ogbuji

via Copia