4Suite and Rdflib - Advancing the State of The Art in Python / RDF

A few days ago I checked in a 4RDF Driver that wraps the rdflib persistence API. 4RDF has a standard API for abstracting the persistence of RDF that sits directly below the Model interface and allows an author to implement a mechanism for persisting an RDF graph in whatever database, filesystem, etc.. they choose. rdflib has a similar seperation but the actual interfaces differ. Daniel Krech and I have been working rather diligently on formalizing a universal persistence API that allows an implementation to support a graduated set of features:

  • Core RDF Store
  • Context-aware RDF Store
  • Notation 3 RDF Store (Formula-aware RDF Store)

It's still a work in progress (with regards to Notation 3 persistence, mostly) but at least those interfaces/method signatures needed by a context-aware RDF store are well spec'd out.

I won't bore you with the details (and there are plenty - as we've covered alot of ground) but you can dive in here. This driver, which allows 4Suite to use rdflib for persistence of it's RDF data, is the first step an an agreed migration that will phase out 4RDF and replace it with rdflib, eventually. This module, at the very least, allows for the dispatching of Versa queries on an rdflib Graph, is the first step in allowing a 4Suite repository to it's RDF graphs in an rdflib store - I think there is alot of synthesis worth exploring with redfoot, provides rdflib with access to a rather voluminous 4RDF test suite, and demonstrates how existing applications that use 4RDF could be ported to use rdflib instead.

Outstanding / Possible Issues:

1) the current rdflib Graph interfaces do not account for RDF reification. These are somewhat covered by support for Notation 3 quoted/hypothetical contexts. The only visible difference is in the test cases that match by statementUri

2) This driver has only been tested against the MySQL implementation of an rdflib store. This is mainly because it's currently the only rdflib store implementation that supports matching arbitrary triple / statement terms by REGEX patterns and/or the production of quads instead of triples (i.e. the name of the context of each resulting statement in addition). This is only an issue for RDF stores that are at least context-aware, but an interface mismatch at most.

I plan to do some more experimentation on the possiblities that this synthesis provides (surprise, surprise). The timing is rather appropriate given the on-going development on the next generation Versa query specification, the concurrent effort to graduate the 4Suite code base to 1.0, and my recent pleasant surprise regarding Versa, Sparta, and rdflib.

If you are interested in helping or learning more about the roadmap, you can pay #redfoot (on irc.freenode.net) a visit. That's where Daniel Kreck and the other rdflib folks have been burning braincells as of late.

Chimezie Ogbuji

via Copia

Redfoot: Updated Documentation

Daniel Krech, recently updated the Redfoot homepage with some additional documentation on what Redfoot is. It's a very interesting concept for leveraging Python (or any other scriptable language) and RDF as a distributed framework for applications.

Beyond the known advantages of modelling distributed components on an RDF Graph with well defined semantics for how you retrieve programs and execute them it also relies on a hybrid of XML and Python called Kid to facilitate templating of HTML.

The advantages of using a flexible programming language (such as Python) for manipulating XML is well written about (sift through the Copia archives, you'll find plenty). Couple that with a well modelled framework for including and executing remote modules as well as a programmatic access (using a similar idiom) to an underlying RDF Graph and you have yourself a very flexible foundation.

For example. Below is the Kid template used to render the contributers page on rdflib.net:

<div xmlns="http://www.w3.org/1999/xhtml"
 xmlns:kid="http://purl.org/kid/ns#">
    <?python
    FOAF = redfoot.namespace("http://xmlns.com/foaf/0.1/")
    DOAP = redfoot.namespace("http://usefulinc.com/ns/doap#")
    project = URIRef("%s#" % request.host)
    people = []
    seen = set()
    for property in [DOAP.maintainer, DOAP.developer, DOAP.documenter,    DOAP.translator, DOAP.tester,DOAP.helper]:
        for person in redfoot.objects(project, property):
            if person not in seen:
                seen.add(person)
                label = redfoot.label(person) or person
                relationships = set()
                for relationship in redfoot.predicates(project, person):
                    relationships.add(redfoot.label(relationship))
                people.append((label, person, relationships))

    people.sort()
    ?>

    <ul>
      <li kid:for="label, person, relationships in people">
        ${label},
    ${redfoot.value(person, FOAF.nick)},
    (${", ".join(relationships)})
      </li>
    </ul>
</div>

Redfoot feels like a hybrid of Narval and the 4Suite repository and represents what is common between the tangential goals of those two projects.

rdflib.net and redfoot.net (as well as some other sites) are examples of applications that run on a Redfoot instance.

[Uche Ogbuji]

via Copia

Itinerant Binds - Better Software Documentation

It was brought to my attention that my recent entry about Sparta/Versa/rdflib possibilities was a little vague/unclear. This tends to happen when I get caught up in an interest. Anyways,.. I renamed the module to Itinerant Binds (I liked the term), created a page on Metacognition for the recent rdflib/4Suite RDF work I've been doing with some more details on how the components works. I added an example that better demonstrates isolating RDF resources through targeted Versa queries and using the bound python result objects to modify / extend the underlying graph.

Chimezie Ogbuji

via Copia

RDF-API: Reconciling the redundancy in pythonic RDF store implementations

I just wrapped up the second of two rdflib-related libraries I wrote with the aim of bridging the gap between rdflib and 4Suite RDF. The latter (BoundVersaResult.py) is a little more interesting than the former in that it uses Sparta to allow the distinct components of a Versa query result to each be bound to appropriate python objects. 4Suite RDF's Versa implementation already provides such a binding:

  • String -> Python unicode
  • Number -> Python float
  • Boolean -> Python boolean
  • List -> Python list
  • Set -> Python Sets
  • Resource/BlankNodes -> Python unicode

The bindings for all the datatypes except Resource/BlankNodes are straight forward. This library extends the datatype binding to include the ability to bind Sparta Things to Versa Resources and BlankNodes. Since Sparta only works with rdflib Graphs, the FtRdfBackend.py module is used to wrap an rdflib.Graph around a 4Suite Model.

Sparta takes an RDF Graph and a defining Ontology which dictates the cardinality of properties bound to resource objects (Things). It allows an RDF Graph to be traversed (and extended) via pythonic idiom. The combination of being able to isolate resources by Versa query (or SPARQL queries eventually - as soon as the ongoing rdflib effort in that regard is completed) and bind them to python objects whose properties reflect the properties on the underlying RDF resources they are bound to is very cool, IMHO. The ability to provide an implementation agnostic way to modify an RDF graph, using a host language as expressive as Python is the icing on the cake. For example, check out the following code snippet demonstrating the use of this library:

#Setup FtRDF Model
Memory.InitializeModule()   
db = Memory.GetDb('', '')
db.begin()
model = Model.Model(db)

#Parse my del.icio.us rss feed
szr = Dom.Serializer()
delUri="http://del.icio.us/rss/chimezie/academic+rdf"
domStr=urllib2.urlopen(delUri).read()        
dom = Domlette.NonvalidatingReader.parseString(domStr,'http://del.icio.us/rss/chimezie')
szr.deserialize(model,dom,scope=delUri)

#Setup rdflib.Graph with FtRDF Model as Backend, using FtRdf driver
generator=VersaThingGenerator(model)
#generator.query("type(rss:item)")
for item in generator.query("type(rss:item)"):        
    [pprint(link) for link in item.rss_link]
    print generator.query("distribute(@'%s','.-rss:title->*','.-dc:subject->*')"%item._id)[0]

Note that (within the loop over the rss:items in the graph), the rss:link property returns an iterator over the possible values (since there is no defining ontology that could have specified that the rss:link property has a cardinality of 1, or is an inverse functional property - which would have caused Sparta to bind the rss_link property to a single object instead of an iterator).

The result of running this code:

u'http://lists.w3.org/Archives/Public/public-rdf-dawg/2004JulSep/0069'
[[u'More on additional semantic information from Enrico Franconi on 2004-07-12 (public-rdf-    dawg@w3.org from 
July to September 2004)'], [u'academic architecture archive community dawg email logic query rdf reference 
semantic']]
u'http://www.w3.org/TR/swbp-specified-values/'
[[u'Representing Specified Values in OWL: "value partitions" and "value sets"'], [u'academic datatypes ontology owl 
rdf semantic standard w3c']]
u'http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0386.html'
[[u'boolean operators and type errors from Jeen Broekstra on 2005-09-07 (public-rdf-dawg@w3.org from July to 
September 2005)'], [u'academic architecture archive community dawg email logic rdf reference semantic w3c']]
u'http://www.w3.org/DesignIssues/Diff'
[[u'RDF Diff, Patch, Update, and Sync -- Design Issues'], [u'academic paper rdf semantic standards tbl w3c']]
u'http://www.w3.org/TR/rdf-dawg-uc/'
[[u'RDF Data Access Use Cases and Requirements'], [u'academic architecture framework query rdf reference semantic 
specification standard w3c']]
u'http://www.w3.org/DesignIssues/RDB-RDF'
[[u'Relational Databases and the Semantic Web (in Design Issues)'], [u'academic architecture framework rdb rdf 
reference semantic tbl w3c']]
u'http://www.w3.org/TR/swbp-n-aryRelations/'
[[u'Defining N-ary Relations on the Semantic Web: Use With Individuals'], [u'academic logic ontology owl predicate 
rdf reference relationships semantic standard w3c']]

Chimezie Ogbuji

via Copia

Wrapping rdflib's Graph around a 4RDF Model

Well, for some time I had pondered what it would take fo provide SPARQL support in 4Suite RDF. I fell upon sparql-p, earlier and noticed it was essentially a SPARQL query processor w/out a parser to drive it. It works over a deprecated rdflib interface: TripleStore. The newly suggested interface is Graph, which is as solid suggestion for a generic RDF:API as any. So, I wrote a 4Suite RDF model backend for rdflib, that allows the wrapping of Graph around a live 4Suite RDF model. Finally, I used this backend to execute a sparql-p query over http://http://del.icio.us/rss/chimezie:

SELECT
  ?title
WHERE {
  ?item rdf:type rss:item;
        dc:subject ?subj;
        rss:title ?title.
        FILTER (REGEX(?subj,".*rdf")).
}

The corresponding python code:

#Setup FtRDF Model
Memory.InitializeModule()   
db = Memory.GetDb('rules', 'test')
db.begin()
model = Model.Model(db)

#Parse my del.icio.us rss feed
szr = Dom.Serializer()
domStr=urllib2.urlopen('http://del.icio.us/rss/chimezie').read()        
dom = Domlette.NonvalidatingReader.parseString(domStr,'http://del.icio.us/rss/chimezie')
szr.deserialize(model,dom,scope='http://del.icio.us/rss/chimezie')

#Setup rdflib.Graph with FtRDF Model as Backend, using FtRdf driver
g=Graph(FtRdf(model))

#Setup sparql-p query processor engine
select = ("?title")

#Setup term
copia = URIRef('http://del.icio.us/chimezie')
rssTitle = URIRef('http://purl.org/rss/1.0/title')
versaWiki = URIRef('http://en.wikipedia.org/wiki/Versa')
dc_subject=URIRef("http://purl.org/dc/elements/1.1/subject")

#Filter on objects of statements (dc:subject values) - keep only those containing the string 'rdf'
def rdfSubFilter(subj,pred,obj):
    return bool(obj.find('rdf')+1)

#Execute query
where = GraphPattern([("?item",rdf_type,URIRef('http://purl.org/rss/1.0/item')),
                       ("?item",dc_subject,"?subj",rdfSubFilter),
                       ("?item",rssTitle,"?title")])    
tStore = myTripleStore(FtRdf(model))
result = tStore.query(select,where)
pprint(result)

The result (which will change daily as my links shift thru my del.icio.us channel queue:

[chimezie@Zion RDF-API]$ python FtRdfBackend.py
[u'rdflibUtils',
 u'Representing Specified Values in OWL: "value partitions" and "value sets"',
 u'Sparta',
 u'planner-rdf',
 u'RDF Template Language 1.0',
 u'SIOC Vocabulary Specification',
 u'SPARQL in RDFLib',
 u'MeetingRecords - ESW Wiki',
 u'Enumerated datatypes (OWL)',
 u'Defining N-ary Relations on the Semantic Web: Use With Individuals']

Chimezie Ogbuji

via Copia

Pythonic SPARQL API over rdflib

I've recently been investigating the possiblity of adapting an existing SPARQL parser/query engine on top of 4RDF - mostly for the eventual purpose of implementing a sparql-eval Versa extension function - was pleased to see there has already been some similar work done:

Although this isn't exactly what I had in mind (the more robust option would be to write an adaptor for Redland's model API and execute SPARQL queries via rasqal ), it provides an interesting pythonic analog to querying RDF.

Chimezie Ogbuji

via Copia