Progress on two Reference Implementations for RETE and GRDDL

Whew! During the moments I was able to spare while at ISWC (I was only there on monday and tuesday, unfortunately), I finished up two 'reference' implementations I've been enthusiastically hacking on quite recently. By reference implementation, I mean an implementation that attempts to follow a specification verbatim as an exercise to get a better understanding of it.

I still need a write up on the engaging conversations I had at ISWC (it was really an impressive conference, even from just the 2 days worth I got to see) as well as the presentation I gave on using GRDDL and Web architecture to meet the requirements of Computer-based Patient Records.

FuXi and DLP

The first milestone was with FuXi, which I ended up rewriting completely based on some exchanges with Peter Lin and Charles Young.

This has probably been the most challenging piece of software I've ever written and I was quite niave in the beginning about my level of understanding of the nuances of RETE. Anyone interested in the formal intersection of Notation 3 / RDF syntax and the RETE algorithm(s) will find the exchanges in the comments of the above post very instructive - and Peter Lin's blog in general. Though he and I have our differences in the value of mapping RETE to RDF/N3 his insights into my efforts have been very helpful.

In the process, I discovered Robert Doorenbos PhD thesis "Production Matching for Large Learning Systems" which was incredibly valuable in giving me a comprehensive picture of how RETE (and RETE/UL) could be 'ported' to accomodate Notation 3 reasoning.

The primary motivation (which has led to many sleepless nights and late night hackery) is what I see as a lack of understanding within the community of semantic web developers of the limitations of Tableux-based reasoning and the common misconception that the major Tableaux-based reasoners (Fact++, Racer, Pellet, etc..) represent the ceiling of DL reasoning capability.

The reality is that logic programming has been around the block much longer than DL and has much more mature algorithms available (the primary one being RETE). I've invested quite a bit of effort in what I believe will (eventually) demonstrate very large-scale DL reasoning performance that will put Tableaux-based reasoning to shame - if only to make it clear that more investigation into the intersection of LP and DL is crucial for making the dream of the SW a verbatim reality.

Ofcouse, there is a limit to what aspects of DL can be expressed as LP rules (this subset is called Description Logic Programming). The 'original' DLP paper does well to explain this limitation, but I believe this subset represents the more commonly used portions of OWL and the portions of OWL 1.0 (and OWL 1.1 for that matter) left out by such an intersection will not be missed.
Ivan Herman pointed me to a paper by Horst, Herman which is quite comprehensive in outlining how this explicit intersection can be expressed axiomatically and the computational consequences of such an axiomatization. Ivan used this a guide for his RDFSClosure module.

Not enough (IMHO) has been done to explore this intersection because people are comfy with the confines of non-LP algorithms. The trail (currently somewhat cold) left by the Mindswap work on Pychinko needs to be picked up, followed and improved.

So, I rolled up my sleeves, dug deep and did my best to familiarize myself with the nuances of production system optimization. Most of the hard work has already been done, thanks to Robert Doorenbos subsetting (and extension) of the original Charles Forgy algorithm. FuXi, gets through a large majority the OWL tests using a ruleset that closely implements what Horst lays out in his paper and does so with impressive times - even with more optimizations pending.

The most recent changes include a command-line interface for launching it:

chimezie@Zion:~/devel/Fuxi$ python --out=n3 --ruleFacts
Time to build production rule (RDFLib): 0.0172629356384 seconds
Time to calculate closure on working memory: 224.906921387 m seconds

@prefix owl: <>.
@prefix rdf: <>.
@prefix test: <>.

 test:Animal test:in _:XRfZIlKy56,

 test:Cosi_fan_tutte a test:DaPonteOperaOfMozart;
    test:in _:XRfZIlKy47,

 test:Don_Giovanni a test:DaPonteOperaOfMozart;
    test:in _:XRfZIlKy47,

 .... snip ...

FuXi still doesn't support 'built-ins' (or custom comparisons), but luckily the thesis includes a section on how to implement non equality testing of rule constraints that should be (relatively) easy to add. The theseis also includes a section on how negated conditions can be implemented (which is probably the most glaring axiom missing from DLP). Finally, Robert's paper includes a clever mechanism for hashing the Alpha network that has yet to be implemented (and is simple enough to implement) that should contribute significant performance gains.

There are other pleasant surprises with the current codebase. The rule compiler can be used to identify inefficencies in rule patterns, the command-line program can be used to serialize the closure delta (i.e., only the triples inferred from the ruleset and facts), and (my favorite) a Notation 3 ruleset can be exported as a graphviz diagram in order to visualize the rule network. Having 'browsed' various rule-sets in this way, I must say it helps in understanding the nuances of optimization when you can see the discrimination network that the triples are propagated through.

I don't have a permanent home for FuXi yet but have applied for a sourceforge project (especially since SF now supports SVN, apparently). So, until then, FuXi can be downloaded from:

GRDDL Client for 4Suite and RDFLib

During the same period, I've also been working on a 'reference' implementation of GRDDL (per the recently released Working Draft) for 4Suite and RDFLib. It's a bit ironic in doing so, since the 4Suite repository framework has essentially been using GRDDL-like Content Management Systems mechanisms since its inception (sometime in 2001).
However, I thought doing so would be the perfect oppurtunity to:

  • Demonstrate how 4Suite can be used with RDFLib (as a prep to the pending deprecation of 4Suite RDF for RDFLib)
  • Build a framework to compose illustrative test cases for the Working Group (of which I'm a member)
  • As a way to completely familiarize myself with the various GRDDL mechanisms

I posted this to the Working Group mailing list and plan to continue working on it. In particular, the nice thing about the convergence of these two projects of mine is that I've been able to think a bit about how both GRDDL and Fuxi could be used to implement efficient, opportunistic programs that extract RDF via GRDDL and explicit links (rdf:seeAlso, owl:imports, rdfs:isDefinedBy) and perform incremental web closures by adding the triples discovered in this way one at a time to a live RETE network.

The RETE algorithm is tailored specifically to behave as a black box to changes to a production system and so crawling the web, extracting RDF (a triple at a time) and reasoning over it in this way (the ultimate semantic web scenario) becomes a very real scenario with sub-second response times to boot. At the very least, reasoning should cease to be much of a bottleneck compared to the actual dereferencing and parsing of RDF from distributed locations. Very exciting prospects. Watch this space..

Chimezie Ogbuji

via Copia