Versa: Pattern Matching (article)

My Versa article (Versa: Path-Based RDF Query Language) is up but I've recently been tinkering with Emeka and haven't been able to post about it. I wanted Emeka functional so people could familiarize themselves with the language by example instead of specification deciphering. Simple saying ".help" in a channel where he is located (#swig,#swhack,#4suite,#foaf) should be sufficient. Please, if his commands interfere with an existing bot's, please let me know.

The article is based (in part) on an earlier paper I wrote on Versa. I reworked it to focus more on the use patterns in common with other existing query languages (SPARQL primarily) to make the point that RDF querying is truely not in it's infancy any more. I also wanted to use it as a spring board to suggest some possible enhancements to an already (IMHO) expressive syntax (mostly burrowed from N3)

My hope is to spark some conversation across the opposing ends as well as get people familiar with the language for the betterment of RDF and RDF querying.

See an exchange between Dave Beckett and myself on the #swig scratchpad.

[Uche Ogbuji]

via Copia

RDF IRC Agent - Emeka

I've recently been working on an IRC bot (based on Sean Palmer's phenny) called Emeka which is meant as a tool for demonstrating Versa and other related RDF facilities. Currently, it supports two functions:

  • .query <abritrary URI> " .. Versa Query .. "
  • .query-deep <arbitrary URI> steps " .. Versa Query .. "

The first, causes Emeka to parse the RDF document (associated with the given URI) as RDF/XML and then as N3 (if the first attempt fails). He then evaluates the submitted Versa Query against the Graph and responds with a result. The second function does the same with the exception that it recursively loads RDF graphs (following rdfs:seeAlso statements) N times, where N is the second argument. This is useful for extracting FOAF communities from a starting document (which was my original motivation for this).

By default Emeka has the following namespace prefixes bound:

daml,rdf,rdfs,owl,xsd,log (n3's log), dc,rss,foaf

Emeka is a work in progress and is currently idling in #swhack and #4suite (as we speak and #foaf,#swig eventually). Some ideas for other services available from this bot:

  • augmenting it's default namespace mapping
  • stored queries (for example: query for retrieving the latest rss:item in a feed)
  • Rule invokation (through FuXi's prospective-query function)
  • Interactive question and example demonstration of Versa function(s)
  • More sophisticated interaction with Del.icio.us RSS feeds (for web page cataloging)

Other suggestions are welcome

see #swhack logs for an example

Chimezie Ogbuji

via Copia

FuXi: FOAFed and DOAPed

I just upgraded and repackaged FuXi (v0.7): added some extra prints in the Versa extension function, added a 'test' directory to the source tree with an appropriate example of how FuXi could be used to make a prospective query against OWL rules and a source graph, created a DOAP instance for FuXi, a FOAF instance for myself, created a permanent home for FuXi, and added FuXi to the SemanticWebDOAPBulletinBoard WiKi. This was primarily motivated by Libby's post on Semantic Web Applications and Demos. I thought it would be perfect forum for FuXi. I probably need to move it into a CVS repository when I can find time.

Below is the output of running the test case:

loaded model with owl/rdfs minimal rules and initial test facts
executing Versa query:
prospective-query('urn:uuid:MinimalRules',
                               'urn:uuid:InitialFacts',
                               'distribute(list(test:Lion,test:Don_Giovanni),\'. - rdf:type -> *\')',
                               'urn:uuid:InitialFacts')
extracted 35 rules from urn:uuid:MinimalRules
extracted 16 facts from source model (scoped by urn:uuid:InitialFacts) into interpreter. Executing...
inferred 15 statements in 0.0526609420776 seconds
time to add inferred statements (to scope: urn:uuid:InitialFacts): 0.000159025192261
compiled prospective query, executing (over scope: urn:uuid:InitialFacts)
time to execute prospective query:  0.0024938583374
time to remove inferred statements:  0.0132219791412
[[[u'http://copia.ogbuji.net/files/FuXi/test/Animal',
   u'http://copia.ogbuji.net/files/FuXi/test/LivingBeing']],
 [[u'http://copia.ogbuji.net/files/FuXi/test/DaPonteOperaOfMozart']]]

This particular test extracts inferred statements from an initial graph using a functional subset of the original owl-rules.n3 and executes the Versa query (which essentially asks: what classes do the resources test:Lion and test:Don_Giovanni belong to?) to demonstrate OWL reasoning (class membership extension by owl:oneOf and owl:unionOf, specifically).

see previous posts on FuXi/N3/4Suite:

Chimezie Ogbuji

via Copia

Identifying BNodes via RDF Query

Sorry couldn't help but commence round 3 (I believe it is) of Versa vs SPARQL. In all honesty, however, this is has to do more with RDF itself than it with either query language. It is primarily motivated by a very informative and insightful take (by Benjamin Nowack) on the problems regarding identifying BNodes uniquely in a query. His final conclusion (as I understood it) is that although the idea of identifying BNodes directly by URI seems counter-inituitive to the very nature of BNodes (anonymous resources) it is a practical necessity (one that I have had to use more often than not with Versa and caused him to have to venture outside the boundaries of the SPARQL specification for a solution). This is especially the case when you don't have much identifying metadata associated with the BNode in question (where if you did you could rely on inferencing - explicit or otherwise).

Well, ironically, the reason why this issue never occured to me is that in Versa, you refer to resources (for identification purposes) by URI regardless of whether they are blank nodes or not. I guess I would interpet this functionality as leaving it up to the author of the query to understand the exact nature of BNode URI's (that they are transient,possibly inconsistent, etc.)

Chimezie Ogbuji

via Copia

Versa by Deconstruction

I was recently compelled to write an introductory companion to the Versa specification. The emphasis for this document (located here) is with readers with little to no experience with formal language specifications and/or with the RDF data model. It is inspired by it's predecessors (which make good follow-up material):

I initially started using Open Office Writer to compose an Open Office Document and export it to an HTML document. But I eventually decided to write it in MarkDown and use pymarkdown to render it to an HTML document stored on Copia.

The original MarkDown source is here.

-- Chimezie

[Uche Ogbuji]

via Copia

Rewriting Source Content Descriptions as Versa Queries

I recently read Morten Frederiksen's blog entry about implementing Source Content Descriptions as SPARQL queries in Redland and was quite interested. Especially the consideration that such queries could be automatically generated and the set of these queries you would want to ask is small and straight forward. Even more interesting was Morten's step-by-step walk-thru of how such queries would be translated to SQL queries on a Redland Triple store sitting on top of MySQL (my favorite RDBMS deployment for 4RDF as well).

However, I couldn't help but wonder how such a set of queries would be expressed in Versa (in my opinion, a language more aligned with the data model it queries than it's SQL-RDQL counter-parts). So below was my attempt to port the queries into versa:

Classes used in the store

SPARQL
SELECT DISTINCT ?Class
WHERE { ?R rdf:type ?Class }
Versa
set(all() - rdf:type -> *)

Predicates that are used with instances of each class

SPARQL
SELECT DISTINCT ?Class, ?Property
  WHERE { ?R rdf:type ?Class .
        OPTIONAL { ?R ?Property ?Object .
                   FILTER ?Property != rdf:type } }
Versa
difference(
  properties(set(all() - rdf:type -> *)),
  set(rdf:type)
)

Do all instances of each class have a statement with each predicate?

It wasn't clear to me if the intent was to check if all classes have a statement with each predicate as specified by an ontology or to just count how many properties each class instance has. The latter interpretation is the one I went with (it's also simpler). This particular query will return a list of lists, each inner list consisting of two values: the URI of a distinct class instance and the number of distinct properties described in a statements about it (except rdf:type)

Versa
distribute(
  set(all() |- rdf:type -> *),
  '.',
  'length(
    difference(
      properties(.),
      set(rdf:type)
    )
  )'
)

Is the type of object in a statement with each class/predicate combination always the same?

I wasn't clear on the intent of this query, either. I wasn't sure if he meant to ask this of the combination with all predicates defined in an ontology or all predicates on class instances in the graph being queried.

But there you have it.

NOTE: The use of the set function was in order to guarantee that only distinct values were returned and may have been used redundantly with functions and expressions that already account for duplication.

[Uche Ogbuji]

via Copia

SPARQL versus Versa

Booyakasha! In a few simple examples, Chime illustrates just why I was so annoyed when I read the SPARQL spec drafts. Eric also has some good words on the matter. Sure, I'm biased as one of the inventors of Versa, but my reaction has more to do with SPARQL than Versa. Frankly, SPARQL bends my brain and twists my gut. Before I continue with my rant, I should say that I'm not blameless in this matter. I have a huge respect for the people working on SPARQL, and a lot of them (Dan Brickley, Libby Miller and Kendall Clark come to mind) were very polite in trying to get me more directly engaged in the standardization process. I just never had the time for more than the informal discussions I had with these folks, and apparently those who prefer SQLish syntax ended up dominating the important discussion or decisions.

It has never been Versa or the highway for me, but I was never going to swallow an RDF query language that used SQLish syntax. I always wanted a path-like language, preferably with a very "composable" syntax (which is why I went with such a functional language flavor in Versa). I'm far from alone in this. There have been many other respectable "pathy" RDF query proposals, and the feedback on Versa has been almost universally positive.

Apparently some people are very tied to their "SELECT"s. Isn't there room for those of us who just find it way too much of a conceptual mismatch from SQL conventions to RDF graphs? I have no choice but to make my own room. I'll continue working on Versa: it's time to start gathering my Versa 2.0 thoughts together. I'll implement Versa 2.0 for 4Suite, and help anyone who wants to implement it for any other tool (I hope that encourages Eric a bit). I may work on a Versa to SPARQL converter, but honestly, that's as much as I expect to ever have to do with SPARQL. No offense to any of the fine people involved. It just doesn't come close to fitting my head.

Chimezie Ogbuji

via Copia