FuXi v0.6 - Rearchitected / Repackaged

I've been experimenting with the use of FuXi as an alternative in situations where I had been manipulating application-specific RDF content using Versa within a host language (XSLT). In some cases I've been able to reduce a very complex set of XSLT logic to 1-2 queries on RDF models extended via a handful of very concise rules (expressed as N3). I'm hoping to build some usecases to demonstrate this later.

The result is that I've rearchitected FuXi to work as a blackbox directly with a 4RDF Model instance (it is now query agnostic, so it can be plugged in as an extension library to any other/future RDF querying language bound to a 4RDF model). Prior to this version, it was extracting formulae statements by Versa queries instead of directly through the Model interfaces.

Right now I primarily use it through a single Versa function prospective-query. Below is an excerpt from the README.txt describing it's parameters:

prospective-query

prospective-query( ruleGraph, targetGraph, expr, qScope=None)

Using FuXi, it takes all the facts from the current query context (which may or may not be scoped) , the rules from the <ruleGraph> scope and invokes/executes the Rete reasoner. It adds the inferred statements to the <targetGraph> scope. Then, it performs the query <expr> within the <qScope> (or the entire model if None), removing the inferred statements upon exit


FuXi is is now a proper python package (with a setup.py) and I've moved it (permanently - I hope) to: http://copia.ogbuji.net/files/FuXi

I was a little unclear on Pychinko's specific dependencies with rdflib and cwm in my previous post, but Yarden Katz cleared up the confusion in his comments (thanks).

The installation and use of FuXi should be significantly easier than before with the recent inclusion of the N3 deserializer/parser into 4Suite.

Chimezie Ogbuji

via Copia

FuXi - Versa / N3 / Rete Expert System

Pychinko is a python implementation of the classic Rete algorithm which provides the inferencing capabilities needed by an Expert System. Part of Pychinko works ontop of cwm / afon out of the box. However, it's Interpreter only relies on rdflib to formally represent the terms of an RDF statement.

FuXi only relies on Pychinko itself, the N3 deserializer for persisting N3 rules, and rdflib's Literal and UriRef to formally represent the corresponding terms of a Pychinko Fact. FuXi consists of 3 components (in addition to a 4RDF model for Versa queries):

I. FtRdfReteReasoner

Uses Pychinko and N3RuleExtractor to reason over a scoped 4RDF model.

II. N3RuleExtractor

Extracts Pychinko rules from a scoped model with statements deserialized from an N3 rule document

III. 4RDF N3 Deserializer

see: N3 Deserializer

The rule extractor reverses the reification of statements contained in formulae/contexts as performed by the N3 processor. It uses three Versa queries for this

Using the namespace mappings:

Extract ancendent statements of logical implications

distribute(
  all() |- log:implies -> *,
  '.',
  '. - n3r:statement -> *'
)

Extract implied / consequent statements of logical implications

distribute(
  all() - log:implies -> *,
  '.',
  '. - n3r:statement -> *'
)

Extract the terms of an N3 reified statement

distribute(
  <statement>,
  '. - n3r:subject -> *',
  '. - n3r:predicate -> *',
  '. - n3r:object -> *'
)

The FuXi class provides methods for performing scoped Versa queries on a model extended via inference or on just the inferred statements:

For example, take the following fact document deserialized into a model:

@prefix : <http://foo/bar#> .
:chimezie :is :snoring .

Now consider the following rule:

@prefix ex: <http://foo/bar#> .
{?x ex:is ex:snoring} => {?x a ex:SleepingPerson} .

Below is a snapshot of Fuxi perforing the Versa query “type(ex:SleepingPerson)” on a model extended by inference using the above rule:

Who was FuXi? Author of the predecessor to the King Wen Sequence

Chimezie Ogbuji

via Copia

N3 Deserialization in 4RDF (and other possiblities)

Motivated by the idea that 4RDF (and 4Suite) could benefit greatly from being able to parse Notation 3 documents (see bottom), I attempted to write an N3 Deserializer for 4RDF that makes use of Sean B. Palmer's n3processor.

Ft.Rdf.Serializers.N3

The module simply needs to be added to 4RDF's Ft/Rdf/Serializers directory. I hesitate to check it in, since 4Suite is now in a feature-frozen beta release cycle. It implements a sink which captures generated triples and adds it to a 4RDF model:

class FtRDFSink:
  """
  A n3proc sink that captures statements produced from
   processing an N3 document
  """
  def __init__(self, scope,model):
     self.stmtTuples = []
     self.scope = scope
     self.model = model
     self.bnodes = {}
     self.resources = {}

  def start(self, root):
     self.root = root

  def statement(self, s, p, o, f):
     #First check if the subject is a bnode (via n3proc convention)
     #If so, use 4RDF's bnode convention instead
     #Use self.bnodes as a map from n3proc bnode uris -> 4RDF bnode urns
     if s[:2] == '_:':
        if s in self.bnodes:
           s = self.bnodes[s]
        else:
           newBNode = self.model.generateBnode()
           self.bnodes[s] = newBNode
           s = newBNode

     #Make the same check for the statement's object
     if o[:2] == '_:':
        if o in self.bnodes:
           o = self.bnodes[o]
        else:
           newBNode = self.model.generateBnode()
           self.bnodes[o] = newBNode
           o = newBNode

     #Mark the statement's subject as a resource (used later for objectType)
     self.resources[s] = None

     if f == self.root:
        #Regular, in scope statement
        stmt=(s,p,o,self.scope)
        self.stmtTuples.append(stmt)
     else:
        #Special case
        #This is where the features of N3 beyond standard RDF can be harvested
        #In particular, a statement with a different scope / context than
        #that of the containing N3 document is a 'hypothetical statement'
        #Such statement(s) are mostly used to specify impliciation via log:implies
        #Such implications rules can be persisted (by flattening the forumae)
        #and later interpreted by a backward-chaining inference process
        #triggered from Versa or from within the 4RDF Model retrieval interfaces
        #Forumulae are assigned a bnode uri by n3proc which needs to be mapped
        #to a 4RDF bnode urn
        if f in self.bnodes:
           f = self.bnodes[f]
        else:
           newBNode = self.model.generateBnode()
           self.bnodes[f] = newBNode
           f = newBNode

        self.resources[f] = None

        self.flatten(s, p, o, f)

  def flatten(self, s, p, o, f):
     """
     Adds a 'Reified' hypothetical statement (associated with the formula f)
     """
     fs = self.model.generateUri()
     self.stmtTuples.append((f,
                             N3R.statement,
                             fs,
                             self.scope))
     self.stmtTuples.append((fs,
                             N3R.subject,
                             s,
                             self.scope))
     self.stmtTuples.append((fs,
                             N3R.predicate,
                             p,
                             self.scope))
     self.stmtTuples.append((fs,
                             N3R.object,
                             o,
                             self.scope))

In addition, I made a patch to the 4RDF command that adds 'n3' as a input format. See my previous blog for an example of using this command to generate diagrams of 4RDF graphs.

For example, this diagram is of rdfs-rules - rendered via the 4rdf command line (patched in able to deserialize n3 documents)

Advantages

First, deserializing N3 will almost always be faster than deserializing from rdf/xml (especially for larger graphs) since it's a text parse vs an XML parse. So, if 4Suite repository XSLT Document Definitions are augmented to be able to deserialize into the model via n3, repository operations on documents with such Document Definition will be significanly faster.

Finally, by allowing the deserialization of SWAP constructs such as log:implies, formulae reification, existential and universal variables, reasoners capable of interpreting N3 rule semantics (such as Sean's pyrple Graph class - see a demonstration of it's inference capabilities) can perform inference externally (without having to build it into 4RDF or Versa) on a 4RDF store containing RDF deserialized from N3 documents with appropriate rules.

One thing to note about this implementation is that the default baseUri of N3 documents is http://nowhere when the specified scope is a urn (since the n3processor is unable to handle urn's). Otherwise, the given scope is used as the baseUri

[Chimezie Ogbuji]

via Copia

XML recursive directory listing, part 4

"Its hard to finish; or that Pythons tail's a long way away!", by Dave Pawson

Well, I posted Dave's dirlist.py as a little example, in part, of how quickly an XML expert/Python newbie could get something useful whipped up in 4Suite. Based on the very detail-oriented comments, it seems people in general have found it useful, and have run into limitations from the Python newbie side of that equation. Another example of people taking the code very seriously is Lars Trieloff's posting, "Your filesystem is an XML document"

As I mentioned in the posting, I have not put Dave's code through proper code review: I merely tweaked the command line code a bit to get it to work on my Linux box well enough for me to post an example of its workings. Dave has taken it all a bit to heart, but he shouldn't. He got very far in a short amount of time, and it's always the case in learning any new language or platform that the last 10% of polish is very hard won, and yet worth the experience.

I'm passing on all the comments to the other posting to Dave, and he's already sent me an updated version that fixes some issues. I'll post his version if he wishes, but I'll also give his code a proper, full review this weekend, and post that, for the folks who seem to want to use the code practically. The first thing I'll do it to make it conform to PEP 8.

[Uche Ogbuji]

via Copia

XSLT 2.0 might be worth a second look, if...

XSLT 2.0 Is Way Cool, by Micah Dubinko

Micah. Kimber. Pawson. A handful of the folks who have, like me, turned up their nose at XSLT 2.0, are starting to reconsider. This is not a massive drugging campaign by XSLT 2.0 boosters: it seems all these folks still don't want anything to do with the oppressive type system of XPath and XSLT 2.0, and all balk at the stupendous complexity of the specifications. The key to me is that they see these specs as usable without choking on the types mess. Some folks were claiming this was possible 2 years ago or so, but when I checked, I wasn't convinced. Perhaps things have improved since then.

So I may be up for reconsidering my shunning of XSLT 2.0, but as Micah mentions, I'm not about to wade into 9 documents to work on implementation. (OK, so it would really be 4 or so, but those are 4 huge documents, compared to the 1.0 series, which was 2 modestly sized documents). If someone comes up with a coherent spec that omits the type info, it could somehow make its way into the 4Suite post 1.0.

Micah says, "XSLT 2.0 is a power tool. I don't think it will displace XSLT 1.0, which is remarkable for its power in a small package." For a while I've wanted to write a series of comparisons between XSLT 2.0 and Amara code (which includes XPath 1.0 support). Amara is my power tool, for when XSLT 1.0 + EXSLT is not enough, and I find it hard to imagine XSLT 2.0 as offering more power.

And I really need to get back to work on EXSLT. Folks are getting very restless with the fact that work on EXSLT has been fallow for most of 2005. I just wish I could count on some help. Part of what impedes me is a shrinking back from all the demands of the EXSLT community without many offers of help.

[Uche Ogbuji]

via Copia

Python/XML community:

lxml 0.6.0
Picket (updated)

lxml 0.6.0 is an alternative, more Pythonic binding for the libxml2 and libxslt XML processing libraries. Martijn Faassen says "lxml 0.6 contains important bugfixes, in particular better namespace support while handling attributes, as well as a fix for what turned out to be totally broken behavior for etree.tostring(). An upgrade is recommended."

Sylvain Hellegouarch updated Picket, a simple CherryPy filter for processing XSLT as a template language. It uses 4Suite to do the job. He incorporated feedback, including my own thoughts on Processor object management. A CherryPy "filter is an object that has a chance to work on a request as it goes through the usual CherryPy processing chain."

[Uche Ogbuji]

via Copia

XML recursive directory listing, part 3

In parts 1 and 2 I discussed code to use Python to recursively walk a directory and emit a nested XML representation of the contents.

Dave Pawson built on my basic techniques and came up with dirlist.py, a fully tricked-out version with all sorts of options and amenities. Well, he wasn't even finished. He sent me a further version today in which he "tidied up [the] program, and added options [for file] date and size."

Cool. I've posted it here: dirlist2.py. If further versions are toward, I'll move it into my CVS. Dave is a self-confessed Python newbie. I had to make some quick fixes just to get it to work on my machine, but I haven't had time to carefully vet the entire program. Please let us know if you run into trouble (a comment here should suffice).

Usage example:

$ mkdir foo
$ mkdir foo/bar
$ touch foo/a.txt
$ touch foo/b.txt
$ touch foo/bar/c.txt
$ touch foo/bar/d.txt
$ python dirlist2.py foo/
Processing /home/uogbuji/foo
<?xml version="1.0" encoding="UTF-8"?>
<directory name="/home/uogbuji/foo">
  <file name="a.txt"/>
  <file name="b.txt"/>
  <directory name="/home/uogbuji/foo/bar">
    <file name="c.txt"/>
    <file name="d.txt"/>
  </directory>
</directory>

$ python dirlist2.py -d foo
Adding file dates
Processing /home/uogbuji/foo
<?xml version="1.0" encoding="UTF-8"?>
<directory name="/home/uogbuji/foo">
  <file date="2005-05-09" name="a.txt"/>
  <file date="2005-05-09" name="b.txt"/>
  <directory name="/home/uogbuji/foo/bar">
    <file date="2005-05-09" name="c.txt"/>
    <file date="2005-05-09" name="d.txt"/>
  </directory>
</directory>

$ python dirlist2.py foo/ foo.xml
Processing /home/uogbuji/foo
$ cat foo.xml
<?xml version="1.0" encoding="UTF-8"?>
<directory name="/home/uogbuji/foo">
  <file name="a.txt"/>
  <file name="b.txt"/>
  <directory name="/home/uogbuji/foo/bar">
    <file name="c.txt"/>
    <file name="d.txt"/>
  </directory>
</directory>

[Uche Ogbuji]

via Copia

Using 4RDFs Triclops from Commandline

I recently merged an old RDF graphing library (within 4Suite) into the 4Suite 4RDF command-line in preperation for the beta release. The library is called Triclops. With this addition, the 4RDF command-line can now render RDF graphs into a representative SVG or Jpeg diagram.

Using Graphviz

Triclops makes use of Graphviz to render .dot graphs (generated from RDF serializations) into various formats. One of the advantages is that Graphviz's neato can be used to apply a spring graph layout algorithm to the final graph. This often results in a more informative layout of the final graph than the default. The downside is the large amount of processing needed by this function.

4RDF Options

Below is a listing of the 4RDF command-line options

The Triclops integration consists of 3 additions:

First, the additional values to the -s / --serialize option:

  • svg
  • jpeg

Second, the -g / --graphviz option (which is required when either of the above options are used) takes the path to the dot / neato executables. And finally, the -l / --spring option requests that neato is used instead of dot. This results in the spring algorithm being applied to the graph.

FOAF example

To demonstrate, I'm using the Dan Brickley FOAF example (as listed in the specification). In the terminal below, I list the content of the FOAF document, then convert it to a jpeg diagram first and then an SVG diagram right afterwards (using neato to layout the graph). On my machine, the dot and neato executables are located in /usr/bin, so I set the -g option accordingly:

The generated jpg diagram is below while its svg alternative is here.

Another example

Below are jpeg and svg diagrams of the 4Suite Repository Ontology (modeled in OWL):

My long-term plan is to make Triclops completely configurable so that the generated graphs are tailored to the user's specification for things such as the font used for text, how to format blank nodes, etc. Porting it to use Pydot might go a long way in this regard.

[Chimezie Ogbuji]

via Copia

4Suite 1.0b1 for Fedora Core 4 Test 3

Hooray! The 4Suite RPM shipped with Fedora Core 4 Test 3 release is "4Suite-1.0-8.b1.i386.rpm", according to the RPM list. As Dave Pawson and I found out a few days ago, this is 4Suite 1.0b1. I was worried it may not make it all the way through FC quality control in time, but seems it did. I need to try out FC4T3 on one of my non-critical machines this weekend. I also need yet another non-critical machine so I can check out all the hype about Ubuntu.

[Uche Ogbuji]

via Copia

SSH CVS access for 4Suite developers

I've lost count of the number of times I've gone looking for this post by Jeremy which describes how to add additional SSH keys to our accounts on the cvs server. Having several machines I develop 4Suite on, I keep having to add new SSH keys. The instructions are below:

All CVS developers,

There now exists an automatic system for adding additional SSH keys to your existing account. The login message has been updated to display this information as well. To add SSH keys to your account, use:

  • [version 1] "scp identity.pub @cvs.4suite.org:sshkey"
  • [version 2] "scp id_dsa.pub @cvs.4suite.org:sshkey2"

My pserver string is: :ext:cogbuji@cvs.4suite.org:/var/local/cvsroot

[Uche Ogbuji]

via Copia