FuXi - Versa / N3 / Rete Expert System

Pychinko is a python implementation of the classic Rete algorithm which provides the inferencing capabilities needed by an Expert System. Part of Pychinko works ontop of cwm / afon out of the box. However, it's Interpreter only relies on rdflib to formally represent the terms of an RDF statement.

FuXi only relies on Pychinko itself, the N3 deserializer for persisting N3 rules, and rdflib's Literal and UriRef to formally represent the corresponding terms of a Pychinko Fact. FuXi consists of 3 components (in addition to a 4RDF model for Versa queries):

I. FtRdfReteReasoner

Uses Pychinko and N3RuleExtractor to reason over a scoped 4RDF model.

II. N3RuleExtractor

Extracts Pychinko rules from a scoped model with statements deserialized from an N3 rule document

III. 4RDF N3 Deserializer

see: N3 Deserializer

The rule extractor reverses the reification of statements contained in formulae/contexts as performed by the N3 processor. It uses three Versa queries for this

Using the namespace mappings:

Extract ancendent statements of logical implications

distribute(
  all() |- log:implies -> *,
  '.',
  '. - n3r:statement -> *'
)

Extract implied / consequent statements of logical implications

distribute(
  all() - log:implies -> *,
  '.',
  '. - n3r:statement -> *'
)

Extract the terms of an N3 reified statement

distribute(
  <statement>,
  '. - n3r:subject -> *',
  '. - n3r:predicate -> *',
  '. - n3r:object -> *'
)

The FuXi class provides methods for performing scoped Versa queries on a model extended via inference or on just the inferred statements:

For example, take the following fact document deserialized into a model:

@prefix : <http://foo/bar#> .
:chimezie :is :snoring .

Now consider the following rule:

@prefix ex: <http://foo/bar#> .
{?x ex:is ex:snoring} => {?x a ex:SleepingPerson} .

Below is a snapshot of Fuxi perforing the Versa query “type(ex:SleepingPerson)” on a model extended by inference using the above rule:

Who was FuXi? Author of the predecessor to the King Wen Sequence

Chimezie Ogbuji

via Copia
10 responses
Yay, someone's using Pychinko! :>



Now, I gotta figure out what you guys are doing with it...Hmm, it seems you guys are using it to apply rules to a factbase (a 4RDF model) from within Versa. I can't recall what Versa's distribute() thingie does, but I guess I understand this tolerably well.



Actually, we (Mindlab people) are thinking of using it for the same purpose, only in a SPARQL implementation rather than Versa. So this work is especially relevant. At some point basic conjunctive query over an rdflib model was added to Pychinko, but I can't remember what happened to that code or where it is.



Good stuff, Chimezie.
I'm glad your interested, Pychinko was especially intrumental for me.  My only prior option was CWM (which is an impressive body of work but quite large and almost unmanageable).  BTW the distribute function takes a list of resources and applies a list of expressions against it. 



If it is your intention to do the same thing then I'm sure you'll find it as easy as I did to plug it into an well architected rdf db
api (which rdflib is).  I eventually hope to be able to expose this prospective querying capability as a versa function so it's
transparent - right now it has to be used programatically.


The speed at which Pychinko does full RDFS/OWL closures is quite impressive and I've been losing sleep over the million opportunities it opens up with the ontology management automation work I've been doing recently. 



One desire that came out of this effort is to be able to have Pychinko 'consume' triples matched by rule ancendents which
will basically allow me to do 'semantic compression' - i.e.  compress a group of statements that express a single concise concept to combat the vulnerability of RDF stores to scalability (my primary motivation for exploring RDF inferencing).
I was mistaken, Pychinko's interpreter does rely on cwm, but specifically:



cwm_math.py

cwm_os.py

cwm_string.py



are needed for implementation of log,math,and os constructs used within rules. 



see:



http://dev.w3.org/cvsweb/~checkout~/2000/10/swap/log.n3?content-type=text/plain

http://dev.w3.org/cvsweb/~checkout~/2000/10/swap/math.n3?content-type=text/plain

http://dev.w3.org/cvsweb/~checkout~/2000/10/swap/os.n3?content-type=text/plain



The suggested CWM tarball can be downloaded from:

http://infomesh.net/2001/cwm/cwm1.82.tar.gz
Hey,



  I second Kendall's enthusiasm -- it's great that you're using Pychinko! I am looking forward to looking more deeply into your usage of it. 



Just a quick comment: technically, as a package Pychinko does depend on the CWM modules listed; the dependency is due to the fact Pychinko is capable of processing some of the builtins in those modules, and there is no sense in duplicating CWM's code, so the modules are required.  There are also several test cases that check whether CWM and Pychinko get the same results.  We figured there wouldn't be many people who use Pychinko (or would like to) and are opposed to installing (or haven't already installed) CWM.



This being said, conceptually the parts are separate.  If you do not want to use CWM builtins,  you can still use Pychinko; it's software engineering slacking on our part that the package is not more modular.  But now that somebody is using this, we have motivation to change it!



  --Yarden
Great work everyone!



How scalable and fast is Fuxi/Pychinko? I'm interested in using it to store and manipulate the large common sense databases, ConceptNet and Wordnet.  http://web.media.mit.edu/~hugo/conceptnet/



Thanks for any help. -Huu
Hey Huu, FuXi's speed is bound by :



1) 4RDF's query response speed - I've found 4RDF (when used with a MySQL backend) to be incrdedibly quick in it's responses

2) PyChinkos's inference speed - which is unbelievably quick in it's own right. For example, my primary use for FuXi is for infering relationships within a rather large patient record ontology for cardiothoracic surgical events that we use at the Cleveland Clinic.  Below is an excerpt from a run over this ontology (using FuXi) which 1) extracts the rules to use, 2) the facts from the ontologyg 3) infers extra statements from the ontology using the rules 4) adds the newly inferred statements to the model and 4) executes a 'prospective' query against the combination of the original facts + the newly inferred statements



The query time is significantly faster when I use a 4RDF Model with MySQL instead of the Memory driver (which is what I use here), but notice how fast PyChinko infers 1444 statements out of 3153 facts!:





extracted 7 rules from urn:SemanticConcisenessRules

extracted 3153 facts from source model into interpreter. Executing...

inferred 1444 statements in 1.44343113899 seconds

time to add inferred statements 0.0433931350708

compiled prospective query, executing

time to execute prospective query: 126.726189852

time to remove inferred statements: 22.5435330868





4RDF is about as scalable as the data store used and in the case of MySQL (which I am most familiar with) I've found it

be just about as scalable as any other MySQL based RDF store (Jena,Redland,RdfLib).  The issue of RDF scalability is specifi$mismatch between it's data model and that of the data stores it is commonly stored in (relational databases mostly).

So the scalability ceiling is the same for every RDF DB that uses the same underlying data store.  For MySQL I've

found this ceiling (beyond which query response becomes unreasonable) to be between 4-10 Million triples.  I think

other studies on Jena/Redland concur with these numbers.
Hi! Have you had much progress on your "semantic compression" yet? Do you have any more details? I have started thinking about it again in the mobile context. I recently met someone who wants "low-bandwidth Virtual Learning Environments" for a Kenyan mobile/cell phone project. My idea though was to have user models at the client end to help with personalisation/semantic expansion...I'd appreciate any insights you've gained so far. Cheers, Tim
3 visitors upvoted this post.