N3 Deserialization in 4RDF (and other possiblities)

Motivated by the idea that 4RDF (and 4Suite) could benefit greatly from being able to parse Notation 3 documents (see bottom), I attempted to write an N3 Deserializer for 4RDF that makes use of Sean B. Palmer's n3processor.

Ft.Rdf.Serializers.N3

The module simply needs to be added to 4RDF's Ft/Rdf/Serializers directory. I hesitate to check it in, since 4Suite is now in a feature-frozen beta release cycle. It implements a sink which captures generated triples and adds it to a 4RDF model:

class FtRDFSink:
  """
  A n3proc sink that captures statements produced from
   processing an N3 document
  """
  def __init__(self, scope,model):
     self.stmtTuples = []
     self.scope = scope
     self.model = model
     self.bnodes = {}
     self.resources = {}

  def start(self, root):
     self.root = root

  def statement(self, s, p, o, f):
     #First check if the subject is a bnode (via n3proc convention)
     #If so, use 4RDF's bnode convention instead
     #Use self.bnodes as a map from n3proc bnode uris -> 4RDF bnode urns
     if s[:2] == '_:':
        if s in self.bnodes:
           s = self.bnodes[s]
        else:
           newBNode = self.model.generateBnode()
           self.bnodes[s] = newBNode
           s = newBNode

     #Make the same check for the statement's object
     if o[:2] == '_:':
        if o in self.bnodes:
           o = self.bnodes[o]
        else:
           newBNode = self.model.generateBnode()
           self.bnodes[o] = newBNode
           o = newBNode

     #Mark the statement's subject as a resource (used later for objectType)
     self.resources[s] = None

     if f == self.root:
        #Regular, in scope statement
        stmt=(s,p,o,self.scope)
        self.stmtTuples.append(stmt)
     else:
        #Special case
        #This is where the features of N3 beyond standard RDF can be harvested
        #In particular, a statement with a different scope / context than
        #that of the containing N3 document is a 'hypothetical statement'
        #Such statement(s) are mostly used to specify impliciation via log:implies
        #Such implications rules can be persisted (by flattening the forumae)
        #and later interpreted by a backward-chaining inference process
        #triggered from Versa or from within the 4RDF Model retrieval interfaces
        #Forumulae are assigned a bnode uri by n3proc which needs to be mapped
        #to a 4RDF bnode urn
        if f in self.bnodes:
           f = self.bnodes[f]
        else:
           newBNode = self.model.generateBnode()
           self.bnodes[f] = newBNode
           f = newBNode

        self.resources[f] = None

        self.flatten(s, p, o, f)

  def flatten(self, s, p, o, f):
     """
     Adds a 'Reified' hypothetical statement (associated with the formula f)
     """
     fs = self.model.generateUri()
     self.stmtTuples.append((f,
                             N3R.statement,
                             fs,
                             self.scope))
     self.stmtTuples.append((fs,
                             N3R.subject,
                             s,
                             self.scope))
     self.stmtTuples.append((fs,
                             N3R.predicate,
                             p,
                             self.scope))
     self.stmtTuples.append((fs,
                             N3R.object,
                             o,
                             self.scope))

In addition, I made a patch to the 4RDF command that adds 'n3' as a input format. See my previous blog for an example of using this command to generate diagrams of 4RDF graphs.

For example, this diagram is of rdfs-rules - rendered via the 4rdf command line (patched in able to deserialize n3 documents)

Advantages

First, deserializing N3 will almost always be faster than deserializing from rdf/xml (especially for larger graphs) since it's a text parse vs an XML parse. So, if 4Suite repository XSLT Document Definitions are augmented to be able to deserialize into the model via n3, repository operations on documents with such Document Definition will be significanly faster.

Finally, by allowing the deserialization of SWAP constructs such as log:implies, formulae reification, existential and universal variables, reasoners capable of interpreting N3 rule semantics (such as Sean's pyrple Graph class - see a demonstration of it's inference capabilities) can perform inference externally (without having to build it into 4RDF or Versa) on a 4RDF store containing RDF deserialized from N3 documents with appropriate rules.

One thing to note about this implementation is that the default baseUri of N3 documents is http://nowhere when the specified scope is a urn (since the n3processor is unable to handle urn's). Otherwise, the given scope is used as the baseUri

[Chimezie Ogbuji]

via Copia

Rewriting Source Content Descriptions as Versa Queries

I recently read Morten Frederiksen's blog entry about implementing Source Content Descriptions as SPARQL queries in Redland and was quite interested. Especially the consideration that such queries could be automatically generated and the set of these queries you would want to ask is small and straight forward. Even more interesting was Morten's step-by-step walk-thru of how such queries would be translated to SQL queries on a Redland Triple store sitting on top of MySQL (my favorite RDBMS deployment for 4RDF as well).

However, I couldn't help but wonder how such a set of queries would be expressed in Versa (in my opinion, a language more aligned with the data model it queries than it's SQL-RDQL counter-parts). So below was my attempt to port the queries into versa:

Classes used in the store

SPARQL
SELECT DISTINCT ?Class
WHERE { ?R rdf:type ?Class }
Versa
set(all() - rdf:type -> *)

Predicates that are used with instances of each class

SPARQL
SELECT DISTINCT ?Class, ?Property
  WHERE { ?R rdf:type ?Class .
        OPTIONAL { ?R ?Property ?Object .
                   FILTER ?Property != rdf:type } }
Versa
difference(
  properties(set(all() - rdf:type -> *)),
  set(rdf:type)
)

Do all instances of each class have a statement with each predicate?

It wasn't clear to me if the intent was to check if all classes have a statement with each predicate as specified by an ontology or to just count how many properties each class instance has. The latter interpretation is the one I went with (it's also simpler). This particular query will return a list of lists, each inner list consisting of two values: the URI of a distinct class instance and the number of distinct properties described in a statements about it (except rdf:type)

Versa
distribute(
  set(all() |- rdf:type -> *),
  '.',
  'length(
    difference(
      properties(.),
      set(rdf:type)
    )
  )'
)

Is the type of object in a statement with each class/predicate combination always the same?

I wasn't clear on the intent of this query, either. I wasn't sure if he meant to ask this of the combination with all predicates defined in an ontology or all predicates on class instances in the graph being queried.

But there you have it.

NOTE: The use of the set function was in order to guarantee that only distinct values were returned and may have been used redundantly with functions and expressions that already account for duplication.

[Uche Ogbuji]

via Copia

SPARQL versus Versa

Booyakasha! In a few simple examples, Chime illustrates just why I was so annoyed when I read the SPARQL spec drafts. Eric also has some good words on the matter. Sure, I'm biased as one of the inventors of Versa, but my reaction has more to do with SPARQL than Versa. Frankly, SPARQL bends my brain and twists my gut. Before I continue with my rant, I should say that I'm not blameless in this matter. I have a huge respect for the people working on SPARQL, and a lot of them (Dan Brickley, Libby Miller and Kendall Clark come to mind) were very polite in trying to get me more directly engaged in the standardization process. I just never had the time for more than the informal discussions I had with these folks, and apparently those who prefer SQLish syntax ended up dominating the important discussion or decisions.

It has never been Versa or the highway for me, but I was never going to swallow an RDF query language that used SQLish syntax. I always wanted a path-like language, preferably with a very "composable" syntax (which is why I went with such a functional language flavor in Versa). I'm far from alone in this. There have been many other respectable "pathy" RDF query proposals, and the feedback on Versa has been almost universally positive.

Apparently some people are very tied to their "SELECT"s. Isn't there room for those of us who just find it way too much of a conceptual mismatch from SQL conventions to RDF graphs? I have no choice but to make my own room. I'll continue working on Versa: it's time to start gathering my Versa 2.0 thoughts together. I'll implement Versa 2.0 for 4Suite, and help anyone who wants to implement it for any other tool (I hope that encourages Eric a bit). I may work on a Versa to SPARQL converter, but honestly, that's as much as I expect to ever have to do with SPARQL. No offense to any of the fine people involved. It just doesn't come close to fitting my head.

Chimezie Ogbuji

via Copia

Contexts, and Scopes, and Provenance, Oh My!

Being the KR theory masochist I am, I've lately been wrestling with the concept of RDF Graph Contexts - yeah, ouch! The motivation is to determine the optimal RDF statement vector size or database configuration for representing RDF sufficiently and in a scalable way. Graph / SubGraph identification seems advantageous for query optimization for large (> 5 Million Triples) RDF stores - especially those built on RDBMSs. At any rate, below are some good references I found on the subject(s):

I'm sure I've missed some other useful ones

[Uche Ogbuji]

via Copia

Using 4RDFs Triclops from Commandline

I recently merged an old RDF graphing library (within 4Suite) into the 4Suite 4RDF command-line in preperation for the beta release. The library is called Triclops. With this addition, the 4RDF command-line can now render RDF graphs into a representative SVG or Jpeg diagram.

Using Graphviz

Triclops makes use of Graphviz to render .dot graphs (generated from RDF serializations) into various formats. One of the advantages is that Graphviz's neato can be used to apply a spring graph layout algorithm to the final graph. This often results in a more informative layout of the final graph than the default. The downside is the large amount of processing needed by this function.

4RDF Options

Below is a listing of the 4RDF command-line options

The Triclops integration consists of 3 additions:

First, the additional values to the -s / --serialize option:

  • svg
  • jpeg

Second, the -g / --graphviz option (which is required when either of the above options are used) takes the path to the dot / neato executables. And finally, the -l / --spring option requests that neato is used instead of dot. This results in the spring algorithm being applied to the graph.

FOAF example

To demonstrate, I'm using the Dan Brickley FOAF example (as listed in the specification). In the terminal below, I list the content of the FOAF document, then convert it to a jpeg diagram first and then an SVG diagram right afterwards (using neato to layout the graph). On my machine, the dot and neato executables are located in /usr/bin, so I set the -g option accordingly:

The generated jpg diagram is below while its svg alternative is here.

Another example

Below are jpeg and svg diagrams of the 4Suite Repository Ontology (modeled in OWL):

My long-term plan is to make Triclops completely configurable so that the generated graphs are tailored to the user's specification for things such as the font used for text, how to format blank nodes, etc. Porting it to use Pydot might go a long way in this regard.

[Chimezie Ogbuji]

via Copia