4Suite and Rdflib - Advancing the State of The Art in Python / RDF

A few days ago I checked in a 4RDF Driver that wraps the rdflib persistence API. 4RDF has a standard API for abstracting the persistence of RDF that sits directly below the Model interface and allows an author to implement a mechanism for persisting an RDF graph in whatever database, filesystem, etc.. they choose. rdflib has a similar seperation but the actual interfaces differ. Daniel Krech and I have been working rather diligently on formalizing a universal persistence API that allows an implementation to support a graduated set of features:

  • Core RDF Store
  • Context-aware RDF Store
  • Notation 3 RDF Store (Formula-aware RDF Store)

It's still a work in progress (with regards to Notation 3 persistence, mostly) but at least those interfaces/method signatures needed by a context-aware RDF store are well spec'd out.

I won't bore you with the details (and there are plenty - as we've covered alot of ground) but you can dive in here. This driver, which allows 4Suite to use rdflib for persistence of it's RDF data, is the first step an an agreed migration that will phase out 4RDF and replace it with rdflib, eventually. This module, at the very least, allows for the dispatching of Versa queries on an rdflib Graph, is the first step in allowing a 4Suite repository to it's RDF graphs in an rdflib store - I think there is alot of synthesis worth exploring with redfoot, provides rdflib with access to a rather voluminous 4RDF test suite, and demonstrates how existing applications that use 4RDF could be ported to use rdflib instead.

Outstanding / Possible Issues:

1) the current rdflib Graph interfaces do not account for RDF reification. These are somewhat covered by support for Notation 3 quoted/hypothetical contexts. The only visible difference is in the test cases that match by statementUri

2) This driver has only been tested against the MySQL implementation of an rdflib store. This is mainly because it's currently the only rdflib store implementation that supports matching arbitrary triple / statement terms by REGEX patterns and/or the production of quads instead of triples (i.e. the name of the context of each resulting statement in addition). This is only an issue for RDF stores that are at least context-aware, but an interface mismatch at most.

I plan to do some more experimentation on the possiblities that this synthesis provides (surprise, surprise). The timing is rather appropriate given the on-going development on the next generation Versa query specification, the concurrent effort to graduate the 4Suite code base to 1.0, and my recent pleasant surprise regarding Versa, Sparta, and rdflib.

If you are interested in helping or learning more about the roadmap, you can pay #redfoot (on irc.freenode.net) a visit. That's where Daniel Kreck and the other rdflib folks have been burning braincells as of late.

Chimezie Ogbuji

via Copia

Today's XML WTF: UTF-8 BOM madness in Windows browsers

Earlier this week I had to add an option for 4Suite's XSLT processor to emit a UTF-8 BOM (or ZERO WIDTH NON-BREAKING SPACE as I prefer, given the annoying situation). See details of the 4Suite extension XSLT attribute below. I did so when upon user request, even though this need seems to come from a case of sheer lunacy in Windows browsers, and especially MSIE. Mike Brown, Jeremy Kloth and I originally wondered why on earth anyone would need a UTF-8 BOM. We figured that by serving his files with the right HTTP Content-Type header or using meta/http-equiv he could signal UTF-8 without needing the BOM (after all, there's no byte order to mark). Apparently, the problem scenario really kicks in when users have set their browser's encoding auto-detect to an encoding. In this case most of the user's clients would have it set to Russian. As Mike Brown said in the IRC conversation:

it is my understanding that in Russia as well as the Far East it is very typical to leave your browser set to ignore declared encodings and just use whatever is common for your region

The problem is that this user wants to send UTF-8, and it seems to be hard to get browsers on Windows to believe a file is UTF-8 without using a BOM.

Actually, when I researched the situation, it seems that it's merely hard in Mozilla/Windows (which does pay attention to the HTTP headers, if not HTML meta/http-equiv). With MSIE it's apparently impossible. See "On the 'charset' parameter of the 'Content-Type' header", by Anne van Kesteren. Her entry itself is tangentially interesting, but see the comments for the issue at hand, in particular comment #10 by Lachlan Hunt.

I think Zack may be correct, IE does ignore the Content-Type header in some circumstances. I set up a test case serving the document with Content-Type: text/html; charset=iso-8859-1 but also starting with a UTF-8 BOM. IE incorrectly parses the file as UTF-8, while Firefox and Opera correctly obeyed the Content-Type header. I know the test isn't exactly what he described with Urdu text detected as Turkish, but I don't know those languages, nor whether the file he was talking about was correctly encoded as UTF-8. This test does, however, show that Internet Explorer breaks the rules yet again.

This is what our user was finding as well.

Of course a UTF-8 BOM is never illegal, but that doesn't mean it's not supremely stupid to make such an optional marker the only way to identify UTF-8, despite the availability of multiple alternative mechanisms in standards. Sure, ultimate blame for this goes to all the browser vendors and Web designers over the years who have turned the Web into encoding soup as well as tag soup, but now that the browser standards have all sorts of character encoding markers available, users should no longer find themselves in such a quandary when dealing with Web publishers who are willing to play by the rules.

This is an XML WTF, rather than a general Web WTF because our user was trying to produce XHTML, which means that he should have had recourse to the XML declaration encoding pseudo-attribute. When a Web browser flouts the explicit words of the XML 1.0 spec:

In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8.

I started out trying to make sense of the situation by warning the user it's never really a good idea to serve XHTML as text/html (regardless of compatibility guidelines: see Anne again for an example of a good argument as to why). I was amazed to find that in the case of MSIE you couldn't make things right even by using the proper application/xhtml+xml. There was nothing I could do but shut my smacked gob, and then, clenching my teeth all the while, implement the requested extension.

And coming to that, in 4Suite latest CVS you can use XSLT such as the following:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:ft="http://xmlns.4suite.org/ext"
>

<xml:output ft:utf-bom="yes"/>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>
</xsl:stylesheet>

The xml:output/@ft:utf-bom attribute is a flag to force the BOM to be manually emitted. I decided not to do much idiot-proofing as users who tack on this option had better know what they're doing. In general, if you use this flag, you'd best ensure your output encoding is UTF-8 (or UTF-7, if anyone is still using that). The above listing, in effect, is a variant of the identity transform that tacks on the UTF-8 BOM. This extention is also available on our exsl:document implementation.

For more on XHTML browser madness, see "Today's XML WTF: Internal entites in browsers". For more on XHTML overall, see "XHTML, step-by-step".

[Uche Ogbuji]

via Copia

Quotīdiē

The only reason I'm influential is that I say what's on my mind.... Think about it. I make [movies]. You couldn't possibly be worrying about a film career and sit up and be saying the stuff I say on radio. You'd be like: Oh wow, what if they don't put me on the next movie... I don't care. I don't care about these records, these movies. I don't care. I've have to live my life from point A to point B and saying what's on my mind. That's what I care about: being in tune with myself, and people respect that. That's why I'm on a college lecture tour. I'm on my way to Harvard or some place, to teach people how to be real. Isn't that stupid?

—Ice T—Fresh Air interview, originally 1992, re-aired on hip-hop week 2005

Ice-T in that quote, provides one of the best definitions of hip-hop I've heard. Overall, this is a great interview for anyone who wants to break through all the media circus that has ever surrounded Ice-T and get a sense of what the man is really about. In fact, for anyone who really wants to get a sense of Hip-Hop's essence, whether you're down like Foxy Brown, or one of the many who has no earthly idea how a genre as horrible as Rap has flourished for three decades, you need to check out Fresh Air's Hip-Hop week . (Click the "Listen" then "Next Story" links until you've heard Monday through Friday. One hour each, all trimmed to the choicest segments). I do wish she'd had more on grafitti, break-dancing and even beat-boxing, which are among the core elements of Hip-Hop culture, but MCing and DJing are well covered.

I like the bit at the end about Ice-T taking it to college campuses to teach some realness. From what I gather, US Academia needs a heavy dose of realness. There are attacks on academic freedom coming from all directions, with sanctions and threats taking the place of debate. Whether they're fundamentalist Christian/Muslim/Jewish, atheist, gay, straight, white supremacist, black revolutionary, Communist, Neocon, Democrat, Republican, Libertarian, or whatever, it's all about students and faculty choosing to be conventional, surprising, or even shocking in their ideas. Universities only thrive under Hip-Hop's first principles: "speak your clout"; "show and prove". Sad that it takes a sometime controversial rapper to put it down like that.

[Uche Ogbuji]

via Copia

Redfoot: Updated Documentation

Daniel Krech, recently updated the Redfoot homepage with some additional documentation on what Redfoot is. It's a very interesting concept for leveraging Python (or any other scriptable language) and RDF as a distributed framework for applications.

Beyond the known advantages of modelling distributed components on an RDF Graph with well defined semantics for how you retrieve programs and execute them it also relies on a hybrid of XML and Python called Kid to facilitate templating of HTML.

The advantages of using a flexible programming language (such as Python) for manipulating XML is well written about (sift through the Copia archives, you'll find plenty). Couple that with a well modelled framework for including and executing remote modules as well as a programmatic access (using a similar idiom) to an underlying RDF Graph and you have yourself a very flexible foundation.

For example. Below is the Kid template used to render the contributers page on rdflib.net:

<div xmlns="http://www.w3.org/1999/xhtml"
 xmlns:kid="http://purl.org/kid/ns#">
    <?python
    FOAF = redfoot.namespace("http://xmlns.com/foaf/0.1/")
    DOAP = redfoot.namespace("http://usefulinc.com/ns/doap#")
    project = URIRef("%s#" % request.host)
    people = []
    seen = set()
    for property in [DOAP.maintainer, DOAP.developer, DOAP.documenter,    DOAP.translator, DOAP.tester,DOAP.helper]:
        for person in redfoot.objects(project, property):
            if person not in seen:
                seen.add(person)
                label = redfoot.label(person) or person
                relationships = set()
                for relationship in redfoot.predicates(project, person):
                    relationships.add(redfoot.label(relationship))
                people.append((label, person, relationships))

    people.sort()
    ?>

    <ul>
      <li kid:for="label, person, relationships in people">
        ${label},
    ${redfoot.value(person, FOAF.nick)},
    (${", ".join(relationships)})
      </li>
    </ul>
</div>

Redfoot feels like a hybrid of Narval and the 4Suite repository and represents what is common between the tangential goals of those two projects.

rdflib.net and redfoot.net (as well as some other sites) are examples of applications that run on a Redfoot instance.

[Uche Ogbuji]

via Copia

XForms Submission to Copia (Mozilla / FormsPlayer)

Uche recently setup Copia to accept HTTP PUT submission of content as atom entry instances. I wrote 3 XForms documents which collect the data from a form and submit it to the service (each for a seperate XForms implementation):

This post was submitted using the Mozilla XForms implementation

Forms Player is the most compliant of the 3 (it supports full XForms 1.0 and some aspects of XForms 1.1) but functions as an Internet Explorer plugin.

Mozilla XForms is an up and coming effort to build native XForms support into Mozilla. The supported feature set has now reached a point where a majority of the useful capabilities are supported.

FormsFaces is a javascript library that attempts to implement XForms functionality completely independent of the browser. Unfortunately I wasn't able get the submission action to fire properly in order to submit new content from a FormsFaces XForm.

The FormsPlayer implementation is available here and the Mozilla implementation is available here. The primary difference is styling (specifically the CSS neccessary to style forms individually) and the mechanism for invoking the XForms processor.

With FormsPlayer, the following bits are needed:


    
FormsPlayer has failed to load!

<?import namespace="xf" implementation="#FormsPlayer" ?>
FormsPlayer has failed to load!

With the Mozilla implementation, nothing is needed (since the support is native to the browser).

With FormsFaces, the following would be needed to include the javascript library that facilitates XForms support:

<script type="text/javascript" src="/path/to/formfaces.js"></script>

The current features of XForms supported by FormsFaces are listed here

Below are the two stylesheets used to style the FormsPlayer XForms and the Mozilla XForm followed by screenshots of the rendered XForms in IE, and Firefox 1.5b2.

FormsPlayer XForms CSS

* {
            margin:0;
            padding:0;
        }

        .title {
            text-align: center;
        }

        xf\:input,xf\:switch {
            display: block;
        }
        .author_input .value {                
            width: 7em;
        }

        .title_input .value {                
            width: 30em;
        }

        xf\:input xf\:label {
            font-weight: bold;
            padding-right: 5px;
            width: 100px;
            float: left;
        }

        xf\:textarea xf\:label {
            font-weight: bold;
            padding-right: 5px;
            width: 100px;
            float: left;
        }

        xf\:secret xf\:label {
            font-weight: bold;
            padding-right: 5px;
            width: 100px;
            float: left;
        }            

        .textarea-value {
            width: 50em;
            height: 30em;            
        }

        .leftPadded {
            padding-left: 100px;
        }            

        .category_input .value {
            width: 20em;
        }

Mozilla XForms CSS

@namespace xf url("http://www.w3.org/2002/xforms");
        * {
            margin:0;
            padding:0;
        }

        .title {
            text-align: center;
        }                        

        xf|secret.author_input {
            display: table-row;                
        }

        xf|secret.author_input secret {                
            width: 7em;            
        }

        xf|secret.author_input > xf|label span {
            display: table-cell;
            width:100px;
            font-weight: bold;               
        }

        xf|input.category_input {
            display: table-row;
        }

        xf|input.category_input > xf|label span {
            display: table-cell;
            width: 100px;                
            font-weight: bold;               
        }            

        xf|input.category_input input {
            width: 20em;
        }

        xf|textarea.content_input {
            display: table-row;
        }

        xf|textarea.content_input > xf|label span {
            display: table-cell;
            width: 100px;                
            font-weight: bold;
            vertical-align: top;                
        }            

        xf|textarea.content_input textarea {
            width: 50em;
            height: 30em;
        }            

        #show_content xf|trigger {
            display: block;
            padding-left: 200px;                
        }

        xf|input.title_input {
            display: table-row;
        }

        xf|input.title_input > xf|label span {
            display: table-cell;
            width: 100px;                
            font-weight: bold;
        }            

        xf|input.title_input input {
            width: 30em;
        }


        .leftPadded {
            padding-left: 200px;                
        }

FormsPlayer Screenshot

Copia Entry Submission FormsPlayer XForms

Mozilla Screenshot

Copia Entry Submission Mozilla XForms

The Mozilla implementation's support for XForms CSS styling is discussed here (briefly)

Chimezie Ogbuji

via Copia

More on the PyBlosxom del.icio.us plug-in, and introducing task_control.py, a a pseudo cron plug-in for PyBlosxom

Micah put my del.icio.us daily links tool to immediate use on his blog. He uncovered a bug in the character handling, which is now fixed in the posted amara_delicious.py file.

I usually invoke the script from cron, but Micah asked if there was an alternative. I've been meaning to hack up a poor man's cron for PyBlosxom and this gave me an additional push. The result is task_control.py.

A sort of poor man's cron for PyBlosxom, this plug-in allows you to specify tasks (as Python scripts) to be run only at certain intervals Each time the plug-in is invoked it checks a set of tasks and the last time they were run. It runs only those that have not been run within the specified interval.

To run the Amara del.icio.us daily links script once a day, you would add the following to your config file:

py["tasks"] = {"/usr/local/bin/amara_delicious.py": 24*60*60}
py["task_control_file"] = py['datadir'] + "/task_control.dat"

You could of course have multiple script/interval mappings in the "tasks" dict. The scripts are run with variables request and config set, so, for example, if running from task_control.py, you could change the line of amara_delicious.py from

BASEDIR = '/srv/www/ogbuji.net/copia/pyblosxom/datadir'

to

BASEDIR = config['datadir']

[Uche Ogbuji]

via Copia

Election district Google/Yahoo/whatever maps mashup?

I was looking for a mapping resource for U.S. electoral districts recently, a resource that would provide maps or map overlays for congressional or state assembly election districts. I could find nothing like. Out of curiosity I checked into how hard it is to find maps of my own districts. It turned out to be quite difficult. I did find some fuzzy maps at the Boulder Clerk and Recorder Elections site, but you would really have to know your county's geography like the front of your hand to get a lot from those. Also, it seemed difficult to raise that site by going through any of the major search engines using likely search terms. Finally, I assumed that knowing how to get the district maps from Boulder would be useless for other counties, and I tested that assumption by visiting a few neighboring counties such as Weld. I could always find the maps, but it took very different site navigation, and the resulting maps differed hugely in format (embedded image vs PDF download) and detail.

With all the talk of Web mash-ups, I wonder whether anyone has any sort of site or tool for overlaying elections district information over mapping services. I suppose one big problem is that there isn't much commercial prospect for such a service, but surely this would be a prime candidate civic service mashups, funded by government or philanthropes. Another question is whether districting information is available in computer-readable form regular enough for inexpensive implementation of such overlays.

I still don't know why the U.S. insists of complicating its nation/state/county/municipality breakdown with a Klee-canvas of congressional, state assembly (and sometimes even educational) districts. Why aren't town or county the basic units? If we want more house reps than there are counties, why not have multiple per overall country, much as we have two senators per state? Wouldn't it reduce gerrymandering and save resources to not invent temporary bantustans every ten years as electoral units? Anyway, these last naïve thoughts are topic for another entry another day.

[Uche Ogbuji]

via Copia

Corrections to RDF Query Language Comparison

I recently came upon this dated comparison of features in RDF querying languages. It predates SPARQL and as a result prompted Dan Connoly to attempt to demonstrate how SPARQL fares in this matrix. In reading it, I realized some of the features marked as 'No' under Versa are incorrect. So here is my attempt to demonstrate how (current Versa specification) would implement these requirements:

5 Quantification: Return the persons who are authors of all publications

The section name is misleading as it suggests the use of FOPL semantics to resolve a pattern that can be solved without FOPL semantics:

filter(
        'all()',
        'eq(length(difference(all()-dc:creator->*. - dc:creator->*)),0)'
    )

10 Namespace: Return all resources whose namespace starts with "http://www.aifb.uni-karlsruhe.de/".

filter(
        'all()',
        'starts-with("http://www.aifb.uni-karlsruhe.de/")'
    )

14 Entailment: Return all instances of that are members of the class Publication.

The current specification says (about rdf:type entailment):

Returns a list of all resources of a specified type, as defined by RDFS and optionally DAML schema specifications ...

So, the implementation has the option to account for entailment rules (as the 4Suite implementation does):

type(resource('#Publication'))

Chimezie Ogbuji

via Copia