Extension Functionality and Set Manipulation in RDF Query Languages

A recent bit by Andy Seaborne (on Property Functions in ARQ – Jena's query engine) got me thinking about general extension mechanisms for RDF querying languages.
In particular, he mentions two extensions that provide functionality for processing RDF lists and collections which (ironically) coincide with functions I had requested be considered for later generations of Versa.

The difference, in my case, was that the suggestions were for functions that cast RDF lists into Versa lists (or sets) – which are data structures native to Versa that can be processed with certain built-in functions.

Two other extensions I use quite often in Versa (and mentioned briefly in my XML.com article) are scope, and scoped-subquery. These have to do with identifying the context of a resource and limiting the evaluation of a query to within a named graph, respectively. Currently, the scoped function returns a list of the names of all graphs in which the resource is asserted as a member of any class (via rdf:type). I could imagine this being expanded to include the names of any graph in which statements about the resource are asserted. scoped-subquery doesn't present much value for a host language that can express queries as limited to a named context.

I also had some thoughts about an extension function mechanism that allowed an undefined function reference (for functions of arity 1 – i.e. functions that take only a single argument) to be interpreted as a request for all the objects of statements where the predicate is the function URI and the subject is the argument

I recently finished writing a SPARQL grammar for BisonGen and hope to conclude that effort (at some point) once I get over a few hurdles. I was pleasantly surprised to find that the grammar for function invocation is pretty much identical for both query languages. Which suggests that there is room for some thought about a common mechanism (or just a common set of extension functionality – similar to the EXSLT effort) for RDF querying or general processing.

CWM has a rich, and well documented set of RDF extensions. The caveat is that the method signatures are restricted to dual input (subject and object) since the built-ins are expressed as RDF triples where the predicate is the name of the function and the subject and object are arguments to is. Nevertheless, it is a good source from which an ERDF specification could be drafted.

My short wish-list of extension functions in such an effort would include:

  • List comprehension (intersection, union, difference, membership, indexing, etc.)
  • Resolving the context for a given node: context(rdfNode) => URI (or BNode?)
  • an is-a(resource) function (equivalent to Versa's type function without the entailment hooks)
  • a class(resource) which is an inverse of is-a
  • Functions for transitive closures and/or traversals (at the very least)
  • A fallback mechanism for undefined functions that allowed them to be interpreted as unary 'predicate functions'

Of course, functions which return lists instead of single resources would be problematic for any host language that can't process lists, but it's just some food for thought.

Chimezie Ogbuji

via Copia