Well, I was hoping to hold off on the ongoing work I've doing with FuXi, until I could get a decent test suite working, but I've been engaged in several threads (older) that left wanting to elaborate a bit.
There is already a well established precedent with Python/N3/RDF reasoners (Euler, CWM, and Pychinko). FuXi used to rely on Pychinko, but I decided to write a Rete implementation for N3/RDF from scratch - trying to leverage the host language idioms (hashing, mappings, containers, etc..) as much as possible for areas where it could make a difference in rule evaluation and compilation.
What I have so far is more Rete-based than a pure Rete implementation, but the difference comes mostly from the impedance between the representation components in the original algorithm (which are very influenced by FOL and Knowledge Representation in general) and those in the semantic web technology stack.
Often with RDF/OWL, there is more talk than neccessary, so I'll get right to the meat of the semantic mapping I've been using. This assumes some familiarity with the original algorithm. Some references:
- Jess, the Java Expert System Shell - The Rete Algorithm
- The Road hits the rubber: Explaining RETE - same as above
- Combining Left and Right Unlinking for Matching a Large Number of Learned Rules
- Comparison of the Rete and Treat Production Matchers for Soar (A Summary)
- Citations of original paper
- Optimising Description Logic Subsumption
The working memory of the network is fed by an N3 graph. Tokens represent the propagation of RDF triples (no variables or formulae identifiers) from the source graph through the rule network. These can represent token addition (where the triples are added to the graph - in which case the live network can be associated with a live RDF graph) and token removals (where triples are removed from the source graph). When tokens pass an alpha nodes intra-element test (see below) the tokens are passed on with a subtitution / mapping of variables in the pattern to the corresponding terms in the triples. This variable substitution is used to check for consistent variable bindings across beta nodes.
ObjectType Nodes and Working Memory
ObjectType nodes can be considered equivalent to the test for concept subsumption (in Description Logics). and therefore equivalent to the alpha node RDF pattern:
?member rdf:type ?klass.
'Classic' Alpha node patterns (the one below is taken directly from the original paper) map to multiple RDF-triple alpha node patterns:
(Expression ^Op X ^Arg2 Y)
Would be equivalent to the following triples patterns (the multiplicative factor is that RDF assertions are limited to binary predicates):
- ?Obj rdf:type Expression
- ?Obj Op ?X
- ?Obj Arg2 ?Y
Alpha nodes correspond to patterns in rules, which can be
- Triple patterns in N3 rules
- N-array functions.
Alpha node intra-element tests have a 'default' mechanism for matching triple patterns or they exhibit the behavior associated with a registered set of N-array functions - the core set coincide with those used by CWM/Euler/Pychinko (and often called N3 built-ins). Fuxi will support an extension mechanism for registering additional N-array N3 functions by name which associate them with a Python function that implements the constraint in a similar fashion to SPARQL extension functions. N-aray functions are automatically propagated through the network so they can participate in Beta Node activation (inter-element testing in Rete parlance) with regular triple patterns, using the bindings to determine the arguments for the functions.
The default mechanism is for equality of non-variable terms (URIs).
Beta nodes are pretty much verbatim Rete, in that they check for consistent variable substitution between their left and right memory.This can be considered similar to the unification routine common to both forward and backward chaining algorithms which make use of the Generalized Modus Ponens rule. The difference is that the sentences aren't being make to look the same but the existing variable substitutions are checked for consistency. Perhaps there is some merit in this similarity that would make using a Rete network to faciliate backward-chaining and proof generation an interesting possiblity, but that has yet to be seen.
These correspond with the end of the LHS of a N3 rule, and is associated with the RHS and when 'activated' they 'fire' the rules, apply the propaged variable substitution, and add the newly inferred triples to the network and to the source graph.
Testing and Visualizing RDF/N3 Rete Networks
I've been able to adequately test the compilation process (the first of two parts in the original algorithm), using a visual aid. I've been developing a library for generating Boost Graph Library DiGraphs from RDFLib graphs - called Kaleidos. The value being in generating GraphViz diagrams, as well as access to a whole slew of graph heuristics / algorithms that could be infinitely useful for RDF graph analysis and N3 rule network analysis:
- Breadth First Search
- Depth First Search
- Uniform Cost Search
- Dijkstra's Shortest Paths
- Bellman-Ford Shortest Paths
- Johnson's All-Pairs Shortest Paths
- Kruskal's Minimum Spanning Tree
- Prim's Minimum Spanning Tree
- Connected Components
- Strongly Connected Components
- Dynamic Connected Components (using Disjoint Sets)
- Topological Sort
- Reverse Cuthill Mckee Ordering
- Smallest Last Vertex Ordering
- Sequential Vertex Coloring
Using Kaleidos, I'm able to generate visual diagrams of Rete networks compiled from RDF/OWL/N3 rule sets.
However, the heavy cost with using BGL is the compilation process of BGL and BGL python, which is involved if doing so from source.