Copia

Cannavaro, punked

Good grief! The man everyone hailed as the true best player of the World Cup certainly looked like a scrub in Lyon's domination of Real Madrid this week. It's something close to criminal that Lyon didn't cash in their complete domination of the game for 6-0 or suchlike score rather than the 2-0 the board showed at the final whistle. Cannavaro especially stank up the pitch. He looked worse than Carragher did during his own embarrassing turn handing out gifts on the blue end of Merseyside. Cannavaro completely lost Juninho's looping ball to lay a red carpet for Fred's goal, and he committed no end of other shocking blunders that were less severely punished. Speaking of Juninho, I think I'm more in awe of his dead ball prowess than I am of Roberto Carlos's (BTW RC was definitely the goat on the other Lyon goal, by Tiago). He dealt Casillas some bitter punishment, and it's amazing none of his blasts ripped the net open. Six fois champions? Quel autre avis? Champions de Europe? Peut être, vieux, peut être.

[Uche Ogbuji]

via Copia

del.icio.us bookmarks for 2006-09-10

"CSS Galleries: aggregating web design inspiration": A Weblog aggregating several galleries of exemplary Web design. (from uche)
"Wrapping command-line programs [Andrew Dalke]": A useful HOWTO for wrapping command line programs as Python libraries. (from uche)

[Uche Ogbuji]

via Copia

Go go transfer glitter

What a week it's been! It's been a laden week for me at work, so it's only now, as I settle in to watch this week's EPL fixtures (update: and realise there are none) that I learn of all the madness that's been going on in the transfer market. And what an eyeful! Tevez and Mascherano to the Hammers? Are you kidding me? The man (Pardew) is a complete genius. His team was already a joy to watch. As a motivator he seems to get the sort of commitment from his team that only the greatest coaches manage. And now he's pulled of the off-pitch masterstroke. Tevez will be able to take an all four of most Premiership back lines on his own. Combined with the power of Zamora and Harewood, West Ham should produce even more goals than they did last season. And what better bodyguard for West Ham's own back line could they find than Mascherano?

But what am I talking about West Ham for? It's all about The Gunners. I've been wanting them to get rid of pouty Reyes and Cole for ages. This is The Arsenal. If you're so keen to play somewhere else, then off with you. Truth be told, though, the departure of Vieira and general absence (and now full departure) of Campbell has really left Arsenal without much of a backbone lately. So what do we have this week? Reyes gets his eager wish to be a Galactico, and we get "The Beast". Julio Baptista. Not quite as skillful as Vieira, but much more imposing. . Oh, you want more? Gallas and 5 million quid come Arsenal's way in exchange for Cole. Wow. I would have thought Cole/Gallas was fair as a straight swap, so I think Arsenal got the sweet end here. Gallas won't provide as clever a supporting attack as Cole did, but we have an excess of attack right now, and Gallas is just that defensive rock we're lacking. Oh, you want more? Wenger negotiated a very modest transfer fee to secure the dazzling Denilson. Denilson has almost never in his career played up to his amazing potential (neither for country nor any club outside Brazil), but I think he's never played with a club that makes such good use of skilled midfielders.

It's been an uninspiring August for The Gunners (to put it politely), but it looks as if, as always, Wenger is finding spectacular ways to bring new life to the team.

In former Gunner news, what a week for Kanu at Pompeii. Check out the way he way he literally (yes, literally, literally) held back the Middlesborough defence like a one man levee to score his second goal at Teeside. It's great to see him doing so well; I figured after watching his strong performance for Nigeria in the African Cup of Nations he still had it in him. And Anelka to Bolton? Wow. £8M seems steep, but if anyone can get the best of Anelka, who's been dissapointing since leaving London, it's Big Sam, the master at marshalling maligned players.

[Uche Ogbuji]

via Copia

Amara en Español

My focus during open-source development availability has been on pushing 4Suite XML to 1.0 (and we're on the very final leg of that journey). I'm still putting a bit of time into Amara, but I should have even more time for it soon, and I have many ideas for what to do with that time.

Others have been up to fun stuff with Amara as well, and no more so, it seems, than Spanish speakers. Luis Miguel Morillas has been putting Amara through its paces in his LivingPyXML project. César Cárdenas Desales has contributed a nice intro "Procesamiento fácil de XML con Python y Amara"

A pesar de que la libreria estándar Python cuenta con herramientas y modulos para el procesamiento de XML con SAX y DOM, muchos programadores han pensado que podrían existir formas más simples de trabajar con XML. Amara es un conjunto de herramientas que sirven para facilitar el procesamiento de XML usando Python. En este manual se da una breve introducción al uso de Amara para dichas tareas.

Yep. That was pretty much the entire idea.

Original link (not as up-to-date): "Procesamiento fácil de XML con Python y Amara"

[Uche Ogbuji]

via Copia

A Relational Model for FOL Persistance

A short while ago I was rather engaged in investigating the most efficient way to persist RDF on Relational Database Management Systems. One of the outcomes of this effort that I have yet to write about is a relational model for Notation 3 abstract syntax and a fully funcitoning implementation - which is now part of RDFLib's MySQL drivers.

It's written in with Soft4Science's SciWriter and seems to render natively in Firefox alone (havne't tried any other browser)

Originally, I kept coming at it from a pure Computer Science approach (programming and datastructures) but eventually had to roll my sleeves and get down to the formal logic level (i.e., the Deconstructionist, Computer Engineer approach).

Partitioning the KR Space

The first method with the most impact was seperating Assertional Box statements (statements of class membership) from the rest of the Knowledge Base. When I say Knowledge Base, I mean a 'named' aggregation of all the named graphs in an RDF database. Partitioning the Table space has a universal effect on shortening indices and reducing the average number of rows needed to be scanned for even the worts case scenario for a SQL optimizer. The nature of RDF data (at the syntactic level) is a major factor. RDF is Description Logics-oriented representation and thus relies heavily on statements of class membership.

The relational model is all about representing everything as specific relations and the 'instanciation' relationship is a perfect candidate for a database table.

Eventually, it made sense to create additional table partitions for:

RDF statments between resources (where the object is not an RDF Literal).
RDF's equivalent to EAV statements (where the object is a value or RDF Literal).

Matching Triple Patterns against these partitions can be expressed using a decision tree which accomodates every combination of RDF terms. For example, a triple pattern:

?entity foaf:name "Ikenna"

Would only require a scan through the indices for the EAV-type RDF statements (or the whole table if neccessary - but that decision is up to the underlying SQL optimizer).

Using Term Type Enumerations

The second method involves the use of the enumeration of all the term types as an additional column whose indices are also available for a SQL query optimizer. That is:

ANY_TERM = ['U','B','F','V','L']

The terms can be partitioned into the exact allowable set for certain kinds of RDF terms:

ANY_TERM = ['U','B','F','V','L']
CONTEXT_TERMS   = ['U','B','F']
IDENTIFIER_TERMS   = ['U','B']
GROUND_IDENTIFIERS = ['U']
NON_LITERALS = ['U','B','F','V']
CLASS_TERMS = ['U','B','V']
PREDICATE_NAMES = ['U','V']

NAMED_BINARY_RELATION_PREDICATES = GROUND_IDENTIFIERS
NAMED_BINARY_RELATION_OBJECTS    = ['U','B','L']

NAMED_LITERAL_PREDICATES = GROUND_IDENTIFIERS
NAMED_LITERAL_OBJECTS    = ['L']

ASSOCIATIVE_BOX_CLASSES    = GROUND_IDENTIFIERS

For example, the Object term of an EAV-type RDF statment doesn't need an associated column for the kind of term it is (the relation is explicitely defined as those RDF statements where the Object is a Literal - L)

Efficient Skolemization with Hashing

Finally. thanks to Benjamin Nowack's related efforts with ARC - a PHP-based implementation of an RDF / SPARQL storage system, Mark Nottinghams suggestion, and an earlier paper by Stephen Harris and Nicholas Gibbins: 3store: Efficient Bulk RDF Storage, a final method of using a half-hash (MD5 hash) of the RDF identifiers in the 'statement' tables was employed instead. The statements table each used an unsigned MySQL BIGint to encode the half hash in base 10 and use as foreign keys to two seperate tables:

A table for identifiers (with a column that enumerated the kind of identifier it was)
A table for literal values

The key to both tables was the 16 byte unsigned integer which represented the half-hash

This ofcourse introduces a possibility of collision (due to the reduced hash size), but by hashing the identifier along with the term type, this further dilutes the lexical space and reduces this collision risk. This latter part is still a theory I haven't formally proven (or disproven) but hope to. At the maximum volume (around 20 million RDF assertions) I can resolve a single triple pattern in 8 seconds on an SGI machine and there is no collision - the implementation includes (disabled by default) a collision detection mechanism.

The implementation includes all the magic needed to generate SQL statements to create, query, and manage indices for the tables in the relational model. It does this from a Python model that encapsulates the relational model and methods to carry out the various SQL-level actions needed by the underlying DBMS.

For me, it has satisfied my needs for an open-source maximally efficient RDBM upon which large volume RDF can be persisted, within named graphs, with the ability to persist Notation 3 formulae in a seperate manner (consistent with Notation 3 semantics).

I called the Python module FOPLRelationModel because although it is specifically a relational model for Notation 3 syntax it covers much of the requirements for the syntactic representation of First Order Logic in general.

Chimezie Ogbuji

via Copia

Parsing RDF from XSLT Prospectively

4Suite repository Document Definitions can now support both XML and text-based serialization of RDF. Document Definitions essentially facilitate database replication of XML to RDF (within a content management system that persists both). The mechanism is similar to transactional data replication in database management systems where modifications to a table triggers the replication. Previously, they were only expected to output to RDF/XML - which has well-known issues.

Now, the repository persistence driver attempts to parse the resulting RDF syntax based on the XSLT output method. This allows for a hueristic to prospectively attempt to accomodate non-XML syntax (such as Notation 3 - the only substantial RDF text-based syntax) as well as RDF/XML (and even TriX).

The main advantage for these syntax alternatives is a faster, more efficient parse time in addition to more human readable syntax (especially for data that was meant to be expressed in this way). This switching off the xsl:output method is analagous to switching off HTTP header content-type values for remote RDF graphs (where the parsing is also a bottleneck).

Ofcourse, 4Suite's aging RDF library doesn't properly perist N3 formulae (which are logic syntactic sugar specific to Notation 3) from the parser it uses.

Imagine using a Document Definition to, say, replicate SWRL's XML syntax into Notation 3's implication syntax for a logic programming database:

if x1 hasParent x2, x2 hasSibling x3, and x3 hasSex male, then x1 hasUncle x3

SWRL Rule

<ruleml:imp> 
  <ruleml:_rlab ruleml:href="#example1"/>
  <ruleml:_body> 
    <swrlx:individualPropertyAtom  swrlx:property="hasParent"> 
      <ruleml:var>x1</ruleml:var>
      <ruleml:var>x2</ruleml:var>
    </swrlx:individualPropertyAtom> 
    <swrlx:individualPropertyAtom  swrlx:property="hasBrother"> 
      <ruleml:var>x2</ruleml:var>
      <ruleml:var>x3</ruleml:var>
    </swrlx:individualPropertyAtom> 
  </ruleml:_body> 
  <ruleml:_head> 
    <swrlx:individualPropertyAtom  swrlx:property="hasUncle"> 
      <ruleml:var>x1</ruleml:var>
      <ruleml:var>x3</ruleml:var>
    </swrlx:individualPropertyAtom> 
  </ruleml:_head> 
</ruleml:imp>

Notation 3 Rule

{ ?x1 :hasParent ?x2; ?x2 :hasSibling ?x3; ?X3 :hasSex :Male } => { ?x1 :hasUncle ?x3 }.

Now imagine using GRDDL to publish a common set of rules as SWRL, with a profile to transform them to Notation 3 for scutters that understand.

Chimezie Ogbuji

via Copia

LazyWeb Ho! Detecting whether a browser supports XML+XSLT

I'm wrapping up applyxslt, a WSGI middleware module to serve separate XML and XSLT to browser that can handle it (using the stylesheet PI. For browsers that can't it would intercept the response and perform the XSLT transform for the browser, sending on the result. BTW, for more on WSGI Middleware, see “Mix and match Web components with Python WSGI”.

My biggest uncertainty is the best way to determine whether a browser can handle XML+XSLT. I doubt anything in the Accept header would help, so I'm left having to list all User-Agent strings for browsers that I know can handle this (basically Firefox, Opera, and recent Mozilla, Safari and MSIE).

So far I'm deriving my User-Agent list from several sources, including

Wikipedia (the daddy of all User-Agent lists I've seen)
"Masquerading Your Browser", by Eric Giguere (the "Common User-Agent Values" section) and
"Understanding user-agent strings"

Really the Wikipedia list is all I needed, but I found and worked with the other ones first.

So based on that here is the list of User-Agent string patterns I am treating as evidence the browser does understand XML+XSLT (Python/Perl regex):

.*MSIE 5.5.*
.*MSIE 6.0.*
.*MSIE 7.0.*
.*Gecko/2005.*
.*Gecko/2006.*
.*Opera/9.*
.*AppleWebKit/31.*
.*AppleWebKit/4.*

Note: this hoovers up a few browser versions I'm not entirely sure of: Minimo, AOL Explorer and OmniWeb. I'm fine with some such uncertainty, but if anyone has any suggestions for further refinement of this list, let me know. I'd like to keep it updated.

[Uche Ogbuji]

via Copia

What does GRDDL have to do with Intelligent Agents?

GRDDL. What is it? Why the long name? It does something very specific that requires a long name to describe it. Etymology of biological names includes examples of the same phenomenon in a different discipline. I starting writing on this weblog mainly as a way to regularly excercise my literary expression, so (to that end) I'm going to try to explain GRDDL in as few words as I can while simultaneously embelishing.

It is a language (or dialect) translator. It Gleans (gathers or harvests) Resource Descriptions. Resource Descriptions can be thought to refer to the use of constructs in Knowledge Representation. These constructs are often used to make assertions about things in sentence form - from which additional knowledge can be infered. However, it is also the 'Resource Description' in RDF (no coincidence there). RDF is the target dialect. GRDDL acts as an intelligent agent (more on this later) that performs translations from specific (XML) vocabularies, or Dialects of Languages to abstract RDF syntax.

Various languages can be used but there is a natural emphasis on a language (XSLT) with a native ability to process XML.

GRDDL is an XML & RDF formalism in what I think is a hidden pearl of web architecture: a well-engineered environment for distributed processing by intelligent agents. It's primarily the well-engineered nature of web architecture that lends the neccessary autonomy that intelligent agents require. Though hidden, there is much relevance with contemporaries, predecessors, and distant cousins:

It earns its keep mostly with small, well-designed XML formats. As a host language for XSLT it sets out to be (perhaps) a bridge across the great blue and red divide of XML & RDF. To quote a common parlance: watch this space.

Chimezie Ogbuji ]

via Copia

The BDFL's boundary

I think my response to the recent news that Guido had "pronounced" Django as the Python Web framework was "so wot's 'e think 'e's doing, anyway?"

First of all, I don't think it's a big deal. Every few months something happens to get the Python world all abuzz about the number of Python Web frameworks. It might be the announcement of a new one (Django, TurboGears, web.py, Clever Harold, etc.). It may be some bit of Ruby on Rails news, which for some reason seems to strike fear into the depths of Pythoneers' consciousness. (I think O'Reilly amuses themselves by publishing book sales figures just to see how high they can get some in the Python community to jump). Whatever the stimulus, people blog back and forth about it, a handful of folks switch frameworks, just to be safe, but in the end it all dies down with the status quo still pretty much as it is. When Python Web frameworks die (which seems to be a rare event), they die with a whimper, and not a bang.

Some might think that a BDFL "pronouncement" is no ordinary stimulus, but I'd bet good money that this case is indeed quite ordinary. You see, that's where we have these lovely things called boundaries and it seems one has unmistakably been exceeded in this case. When the BDFL pronounces on a matter, it has always been with the intention, and effect, of settling a troubling argument once and for all. That's not possible in this case. It's not as with Python decorators where Guido's decision to include them, and his choice of syntax, became part of the language proper, and the only thing you could do if you disagreed was take the enormous step of forking the language. In this case, if you disagree with Guido, what do you do? You ignore him and keep using (or developing) your favorite framework. Nothing in the evolution of Python impedes you the slightest bit. In the end, I think this is what each individual developer will end up doing. A few will indeed give Django a new look, and some will convert, but in the main, all will be back to normal in a few weeks.

For my part every sniff I've had of Django makes me think it's way too large and monolithic for my taste. In other words, it's far too much like Zope. (Speaking of Zope, if it were ever appropriate to declare one Python framework to rule them all, Zope is the one with the popularity and maturity for such a role—which is why I'm glad such a declaration is not appropriate). I understand they're getting rid of a lot of the magic, which is another thing that gave me flashbacks of Zope, but I doubt that would be nearly enough for me. For now, I'll continue to work with my favorite three frameworks: CherryPy, RhubarbTart and Pylons, although I do carry some hope that the latter two will merge, and leave me with just two favorites. I'll also focus as much Web development work as I can in creating WSGI middleware, which provides the greatest flexibility.

Discussion group for Atom protocol implementations in Python

I've had discussions about implementing Atom protocol in Python with man colleagues, and I decided to create a proper forum for discussion, and so the Google group atom-protocol-python was born.

A group dedication to discussion among developers and users of Python libraries and tools for processing the atom protocol, either as client or server.

Honestly, the idea of an Atom store, and of an Atom client is so broad that I expect there to be several implementations in Python. This group is to be very open, and I'd love for even folks working on competing implementations to join up, so we can at least discuss interoperability.

And don't forget there is also an Atom IRC channel where we can discuss Atom syntax and Atom protocol. And while I'm plugging stuff, I shan't forget Planet Atom.

[Uche Ogbuji]

via Copia