Python APIs for the Upper Portions of the SW Layer Cake

[by Chimezie Ogbuji]

So, I haven't written about some recent capabilities I've been developing in FuXi. Ultimately, I would like it to be part of a powerful open-source library for semantic web middle-ware development. The current incarnation of this is python-dlp.

The core inference engine ( a RETE-based production system ) has been stable for some time now. FuXi is comprised of the following major modules:

  • DLP (An implementation of the DLP transformation)
  • Horn (an API for RIF Basic Logic Dialect)
  • Rete (the inference engine)
  • Syntax (Idiomatic APIs for OWL - InfixOWL)

I've written about all but the 2nd item (more on this in another article) and the last in the above list. I've come full circle on APIs for RDF several times, but I now believe that a pure object RDF model (ORM) approach will never be as useful as a object RDF vocabulary model. I.e., an API built for a specific RDF vocablary. The Manchester OWL syntax has quickly become my syntax of choice for OWL and (at the time) I had been toying with an idea of turning it into an API.

Then one day, I had an exchange with Sandro Hawke on IRC about what a Python API for the OWL Abstract Syntax would look like. I had never taken a close look at the abstract syntax until then and immediately came away thinking something more light-weight and idiomatic would be preferable.

I came across a powerful infix operator recipe for Python:

Infix operators

I wrote an initial, standalone module called InfixOWL and put up a wiki which still serves as decent initial documentation for the syntax. I've since moved it into a proper module in FuXi, fixed some bugs and very recently added even more idiomatic APIs.

The module defines the following top-level Python classes:

  • Individual - Common class for 'type' descriptor
  • AnnotatibleTerm(Individual) - Common class for 'label' and 'comment' descriptors
  • Ontology(AnnotatibleTerm) - OWL ontology
  • Class(AnnotatibleTerm) - OWL Class
  • OWLRDFListProxy - Common class for class descriptions composed from boolean operators
  • EnumeratedClass(Class) - Class descriptions consisting of owl:oneOf
  • BooleanClass(Class,OWLRDFListProxy) - Common class for owl:intersectionOf / owl:unionOf descriptions
  • Restriction(Class) - OWL restriction
  • Property(AnnotatibleTerm) - OWL property

Example code speaks much louder than words, so below is a snippet of InfixOWL code which I used to compose (programmatically) the CPR ontology:

CPR   = Namespace('http://purl.org/cpr/0.75#')
INF   = Namespace('http://www.loa-cnr.it/ontologies/InformationObjects.owl#')
EDNS  = Namespace('http://www.loa-cnr.it/ontologies/ExtendedDnS.owl#')
DOLCE = Namespace('http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#')
OBI   = Namespace('http://obi.sourceforge.net/ontology/OBI.owl#')
SNAP  = Namespace('http://www.ifomis.org/bfo/1.0/snap#')
SPAN  = Namespace('http://www.ifomis.org/bfo/1.0/span#')
REL   = Namespace('http://www.geneontology.org/owl#')
GALEN = Namespace('http://www.co-ode.org/ontologies/galen#')
TIME  = Namespace('http://www.w3.org/2006/time#')
CYC   = Namespace('http://sw.cyc.com/2006/07/27/cyc/')
XML   = Namespace('http://www.w3.org/2001/04/infoset#')
g = Graph()    
g.namespace_manager = namespace_manager
Class.factoryGraph = g
Property.factoryGraph = g
Ontology.factoryGraph = g

cprOntology = Ontology("http://purl.org/cpr/owl")
cprOntology.imports = ["http://obo.sourceforge.net/relationship/relationship.owl",
                       DOLCE,
                       #EDNS,
                       URIRef("http://obi.svn.sourceforge.net/viewvc/*checkout*/obi/ontology/trunk/OBI.owl"),
                       "http://www.w3.org/2006/time#"]
representationOf = Property(CPR['representation-of'],
                            inverseOf=Property(CPR['represented-by']),
                            domain=[Class(CPR['representational-artifact'])],
                            comment=[Literal("...")])
... snip ...
person = Class(CPR.person,
               subClassOf=[Class(SNAP.Object)])
... snip ...
clinician = Class(CPR.clinician)
clinician.comment=Literal("A person who plays the clinician role (typically Nurse, Physician / Doctor,etc.)")
#This expressio is equivalent to cpr:clinician rdfs:subClassOf cpr:person
person+=clinician    
.. snip ..
patientRecord = Class(CPR['patient-record'])
patientRecord.comment=Literal("an electronic document (a representational artifact [REFTERM])  "+
                               "which captures clinically relevant data about a specific patient and "+
                               " is primarily comprised of one or more cpr:clinical-descriptions.")
patientRecord.seeAlso = URIRef("http://ontology.buffalo.edu/bfo/Terminology_for_Ontologies.pdf")
patientRecord.subClassOf = \
    [bytes,
     #Class(CYC.InformationBearingThing),
     CPR['representation-of'] |only| patient,
     REL.OBO_REL_has_proper_part |some| Class(CPR['clinical-description'])]
... snip ...
problemDisj=Class(CPR['pathological-state']) | organismalOccurrent | extraOrganismalOccurrent
problem = Class(CPR['medical-problem'],
                subClassOf=[problemDisj,
                            realizedBy|only|Class(CPR['screening-act']),                                
                            DOLCE['has-quality'] |some| owlTimeQuality])

After the OWL graph is composed, it can be serialized by simply invoking:

g.serialize(format='pretty-xml')

[Chimezie Ogbuji]

via Copia

Prooving a URI is a Document (Formally)

I'm about ready to check-in some updates for proof generation in FuXi, and was having some fun with httpRange-14 I wanted to share. Some time ago, I wrote Emeka, a phenny-based IRC bot to experiment with how intelligent agents might best be used over IRC. I've recently been playing around with using REGEX patterns to match and invoke "Semantic Web" services (none seem "generally" useful, so far). But, I recently added a "hook" to generate proofs of RDF assertions from rulesets and ontologies. Below is how Emeka & FuXi deduce that http://dbpedia.org/data/DBpedia is a "document" as defined by the FOAF ontology:

<chimezie> .varBind proofSrc list('http://dbpedia.org/data/DBpedia','http://xmlns.com/foaf/spec/')
<chimezie> Emeka, proove "@prefix foaf: <http://xmlns.com/foaf/0.1/>.  <http://en.wikipedia.org/wiki/DBpedia> a foaf:Document ."  from $proofSrc
<Emeka>  Calculated closure in 514.962911606 milli seconds
<Emeka>  reached Q.E.D. See: http://rafb.net/p/5PvAhz98.html

The first line uses a Versa expression to build a variable bound to a list of resources, which can be referenced within Versa expressions (these are the ontologies that should be used with "graph links"). The second line uses a combination of Notation 3 (to express the "goal") and Versa (for the list of information resources). I'm still considering whether having an IRC bot automatically paste content to a "paste bin" is rude.

The contents of the pastebin comprises a (somewhat) human-readable (linear) proof.

Human-readable proof:
Building nodeset around antecedent goal (justified by a direct assertion): foaf:page(dbpedia:DBpedia wiki:DBpedia)
Building inference step for Proof step for foaf:topic(wiki:DBpedia dbpedia:DBpedia)
Inferred from RETE node via foaf:topic(?X ?sgflxxAo413) :- foaf:page(?sgflxxAo413 ?X)
Bindings: {?X: rdflib.URIRef('http://en.wikipedia.org/wiki/DBpedia'), ?sgflxxAo413: rdflib.URIRef('http://dbpedia.org/resource/DBpedia')}
Building nodeset around antecedent goal: foaf:topic(wiki:DBpedia dbpedia:DBpedia)
Building inference step for Proof step for foaf:Document(wiki:DBpedia)
Inferred from RETE node via foaf:Document(?X) :- foaf:topic(?X ?sgflxxAo1286)
Bindings: {?X: rdflib.URIRef('http://en.wikipedia.org/wiki/DBpedia'), ?sgflxxAo1286: rdflib.URIRef('http://dbpedia.org/resource/DBpedia')}
Building proof around goal: foaf:Document(wiki:DBpedia)
## Generated by Emeka at 2007-09-12T20:52:53.728814 in 6.64019584656 milli seconds

Note, the proof doesn't rely on what the transport protocol says about http://en.wikipedia.org/wiki/DBpedia

I had some problems getting the Standford Inference Web proof browser to render FuXi's RDF/XML serialization, I wonder if it is because of my assumption (in the serialization) that the 'entailment' semantics of a RETE network is equivalent to Generalized Modus Ponens. Anyway, the diagram FuXi generates is below:

Proof about DBPedia

This was generated via:

Fuxi --output=dot --ns=wiki=http://en.wikipedia.org/wiki/  --ns=dbpedia=http://dbpedia.org/resource/  --ns=foaf=http://xmlns.com/foaf/0.1/ --proove="@prefix foaf: <http://xmlns.com/foaf/0.1/>.  <http://en.wikipedia.org/wiki/DBpedia> a foaf:Document ." --dlp http://xmlns.com/foaf/spec/ http://dbpedia.org/data/DBpedia

I just need to wrap up the doctests and unit tests and push the latest build to cheeseshop and google code

Chimezie Ogbuji

via Copia

What Do Closed Systems Have to Gain From SW Technologies?

Aaron Swartz asked that I elaborate on a topic that is dear to me and I didn't think a blog comment would do it justice, so here we are :)

The question is what do single-purpose (closed) databases have to gain from SW technologies. I think the most important misconception to clear up first is the idea that XML and Semantic Web technologies are mutually exclusive.
They are most certainly not.

It's not that I think Aaron shares this misconception, but I think that the main reason why the alternative approach to applying SW technologies that he suggests isn't very well spoken for is that quite a few on the opposing sides of the issue assume that XML (and it's whole strata of protocols and standards) and RDF/OWL (the traditionally celebrated components of SW) are mutually exclusive. There are other misconceptions that hamper this discussion, such as the assumption that the SW is an all or nothing proposition, but that is a whole other thread :)

As we evolve towards a civilization where the value in information and it's synthesis is of increasing importance, 'traditional' data mining, expressiveness of representation, and portability become more important for most databases (single-purpose or not).

These are areas that these technologies are meant to address, explicitly because “standard database” software / technologies are simply not well suited for these specific requirements. Not all databases are alike and so it follows that not all databases will have these requirements: consider databases where the primary purpose is the management of financial transactions.

Money is money, arithmetic is arithmetic, and the domain of money exchange and management for the most part is static and traditional / standard database technologies will suffice. Sure, it may be useful to be able to export a bank statement in a portable (perhaps XML-based) format, but inevitably the value in using SW-related technologies is very minimal.

Ofcourse, you could argue that online banking systems have a lot to gain from these technologies, but the example was of pure transactional management, the portal that manages the social aspects of money management is a layer on top.

However, where there is a need to leverage:

  • More expressive mechanisms for data collection (think XForms)
  • (Somewhat) unambiguous interpretation of content (think FOL and DL)
  • Expressive data mining (think RDF querying languages)
  • Portable message / document formats (think XML)
  • Data manipulation (think XSLT)
  • Consistent addressing of distributed resources (think URLs)
  • General automation of data management (think Document Definitions and GRDDL)

These technologies will have an impact on how things are done. It's worth noting that these needs aren't restricted to distributed databases (which is the other assumption about the Semantic Web - that it only applies within the context of the 'Web'). Consider the Wiki example and the advantages that Semantic Wikis have over them:

  • Much Improved possibility of data mining from more formal representation of content
  • 'Out-of-the-box' interoperability with tools that speak in SW dialects
  • Possibility of certain amount of automation from the capabilities that interpretation bring

It's also worth noting that recently the Semantic Wiki project introduced mechanisms for using other vocabularies for 'marking-up' content (FOAF being the primary vocabulary highlighted).

It's dually important in that 1) it demonstrates the value in incorporating well-established vocabularies with relative ease and 2) the policed way in which these additional vocabularies can be used demonstrate precisely the middle ground between a very liberal, open world assumption, approach to distributed data in the SW and controlled, closed, (single-purpose) systems approach.

Such constraints can allow for some level of uniformity that can have very important consequences in very different areas: XML as a messaging interlingua and extraction of RDF.

Consider the value in developing a closed vocabulary with it's semantics spelled out very unambiguously in RDF/RDFS/OWL and a uniform XML representation of it's instances with an accompanying XSLT transform (something the AtomOWL project is attempting to achieve).

What do you gain? For one thing, XForms-based data entry for the uniform XML instances and a direct, (relatively) unambiguous mapping to a more formal representation model – each of which have their own very long list of advantages they bring by themselves much less in tandem!

Stand-alone databases (where their needs intersect with the value in SW technologies) stand to gain: Portable, declarative data entry mechanisms, interoperability, much improved capabilities for interpretation and synthesis of existing information, increased automation of data management (by closing the system certain operations become much more predictable), and the additional possibilities for alternative reasoning heuristics that take advantage of closed world assumptions.

Chimezie Ogbuji

via Copia

What Are You Doing, Dave?

I just updated the 4Suite Repository Ontology (as an OWL instance). Specifically, I added appropriate documentation for most of the major components and added rdfs:subPropertyOf/rdfs:subClass/rdfs:seeAlso relationships with appropriate / related vocabularies (WordNet/Foaf/Dublin Core/Wikipedia). In addition, where appropriate, I've added links to 4suite literature (currently scattered between IBM Developer Works articles/tutorials and Uche's Akara sections).

There are some benefits:

  • This can serve as a framework for documenting the 4Suite repository (to augment the very sparse documentation that does exist)
  • Provide a formal model for the underlying RDF Graph that 'drives' the repository

This latter benefit might not be so obvious, but imagine being able to provide rules that cause implications identifying certain repository containers as RSS channels (and their child Xml documents / Rdf document as the corresponding RSS items) and associating Foaf metadata with repository users.

Some of the more powerful hooks to the System RDF graph (which the above ontology is a model of) - such as the starting/stopping of servers (currently triggered by the fchema:server.running property on fchema:server instances), purging of resources marked as temporary (by the fchema:time_to_live property), and triggering of an XSLT transform (by the fchema:run_on_strobe property) - can be further augmented by other associations in the graph, resulting in an almost 'sentient' content/application server. A little far-fetched?

[Uche Ogbuji]

via Copia