Amara API quick reference, and Windows packages

I forgot to mention in the Amara 1.1.6 announcement that I drafted an API quick reference. I've put a link to it on the Amara home page.

I've also added a Windows installer created by Sylvain Hellegouarch, with some help from Jeremy Kloth. It's an installer for Amara "allinone", so all you need is to have installed Python 2.4 for Windows, then you run this installer, and you should be all set.

[Uche Ogbuji]

via Copia

Amara 1.1.6

I released Amara 1.1.6 last week (see the announcement). This version requires 4Suite XML 1.0b2. As usual, though, I have prepared an "allinone" package so that you do not need to install 4Suite separately to use Amara.

The biggest improvements in ths release are to performance and to the API. Amara takes advantage of a lot of the great performance work that has gone into 4Suite (e.g. Saxlette). There is also a much easier API on-ramp that I expect most users will appreciate. Rather than having to parse using:

from amara import binderytools as bt
doc = bt.bind_string(XML) #or bt.bind_uri or bt.bind_file or bt.bind_stream

You can use

import amara
amara.parse(XML) #Whether XML is string, file-like object, URI or local file path

There are several other such simplifications. There is also the xml_append_template facility, which is very handy for generating XML (see how Sylvain uses it to simplify atomixlib).

Thanks to all the folks who helped with suggestions, patches, review, etc.

[Uche Ogbuji]

via Copia

Agile Web #1: "Google Sitemaps"

"Google Sitemaps"

Uche Ogbuji's new XML.com column, "Agile Web," explores the intersection of agile programming languages and Web 2.0. In this first installment he examines Google's Sitemaps schema, as well as Python and XSLT code to generate site maps. [Oct. 26, 2005]

And with this article the "Python and XML" column has been replaced by a new one titled "Agile Web".

I wrote the Python-XML column for three years, discussing the combination of an agile programming language with an agile data format. It's time to pull the lens back a bit to take in other such technologies. This new column, "Agile Web," will cover the intersection of dynamic programming languages and web technologies, particularly the sorts of dynamic developments on the web for which some use the moniker, "Web 2.0." The primary language focus will still be Python, with some ECMAScript. Occasionally there will be some coverage of other dynamic languages as well.

In this first article I introduce the Google SiteMaps program, XML format and Python tools.

[Uche Ogbuji]

via Copia

Cop it while it's hot: 4Suite XML 1.0b2

Updated with working link for the manual

We've announced 4Suite XML 1.0b2 We're a big step towards a 1.0 release, even bigger than most of our releases because what we've done with this is trim the overall 4Suite package to a sensible size and scope. This release contains only the XML processing core and some support libraries. It does not contain the RDF libraries and the repository. This does not mean those components are stranded (see, for example, the rdflib/4RDF merger effort for a sense of the new juice being fed into 4Suite/RDF). It's just that the core XML libraries are so much more mature than the other components, and so much more widely used, that it made no sense not to set it free and let it quickly march to 1.0 on its own. This release features some serious performance improvements, some simplified APIs for a gentler user learning curve, and a lot of fixes and other improvements (see the announcement for the full, long list).

In fact, the code we released is just about 1.0 in itself, as far as the XML component goes. A code freeze is in place, and we'll focus on fixing bugs and wrapping up the user manual effort. (BTW, if you'd like to help chip into the manual, please say so on the 4Suite-dev mailing list; there is a lot of material in place, and what we need is mostly in the way of editing and improving details). Our plan is to get the XML core to 1.0 more quickly than we would have been able before breaking 4Suite into components, and then we can focus on RDF and the repository. 4Suite/RDF will probably disappear into the new rdflib, and the repository will probably go through heavy refactoring and simplification.

Today, after some day-job tasks, my priority will be getting Amara 1.1.5 out. It's been largely ready and waiting for the 4Suite release. Some really sweet improvements and additions in this Amara release (though I do say so myself). More on that later.

[Uche Ogbuji]

via Copia

Processing "Web 2.0" using XSLT document() variants? No thanks.

Mark Nottingham has written an intriguing piece "XSLT for the Rest of the Web". It's drummed up some interest, some of which has even leaked into the 4Suite mailing list thanks to the energetic Sylvain Hellegouarch. Mark says:

I’ve raved before about how useful the XSLT document() function is, once you get used to it. However, the stars have to be aligned just so to use it; the Web site can’t use cookies for anything important, and the content you’re interested in has to be available in well-formed XML.

He goes on to present a set of extension functions he's created for libxslt. They are basically smarter document() functions that can do fancy Web things, including HTTP POST, and using HTML Tidy to grab tag soup HTML as XHTML.

As I read through it, I must say my strong impression was "been there, done that, probably never looking back". Certainly no diss of Mark intended there. He's one of the sharper hackers I know. I guess we're just at different points in our thinking of where XSLT fits into the Web-savvy apps toolkit.

First of all, I think the Web has more dragons than you could easily tame with even the mightiest XSLT extension hackery. I think you need general-purpose programming language to wrangle "Web 2.0" without drowning in tears.

More importantly, if I ever needed XSLT's document() function to process anything more than it's spec'ed to, I would consider that a pretty strong indicator that it's time to rethink part of my application architecture.

You see, I used to be a devotee of XSLT all over the place, and XSLT extensions for just about every limitation of the language. Heck, I wrote a whole framework of such things into 4Suite Repository. I've since reformed. These days I take the pipeline approach to such processing, and I keep XSLT firmly in the narrow niche for which it was designed. I have more on this evolution of thinking in "Lifting XSLT into application domain with extension functions?".

But back to Mark's idea. I actually implemented 4Suite XSLT extensions to use HTTP POST and to tidy tag soup HTML into XHTML, but I wouldn't dream of using these extensions any more. Nowadays, I use Python to gather and prepare data into a model representation that I then hand over to XSLT for pure presentation processing. Complex logical tasks such as accessing Web data beyond trivially fetched XML are matters for the model layer, and not the presentation logic. For example, if I need to tidy something, I tidy it at the Python level and put what I need of the resulting XHTML into the model XML before passing it to XSLT. I use Amara XML Toolkit with John Cowan's TagSoup for my tidying needs. I prefer TagSoup rather than tidy because I find it's faster and more robust.

Even if you use the libxml2 family of tools, I still think it's better to use libxml, and perhaps the libxml HTML parser to do the model processing and hand over resulting XML to libxslt in a separate step.

XSLT is pretty cool, but these days rather than reproduce all of Python's dozens of Web processing libraries therein, I plump for Python itself.

[Uche Ogbuji]

via Copia

4Suite and Rdflib - Advancing the State of The Art in Python / RDF

A few days ago I checked in a 4RDF Driver that wraps the rdflib persistence API. 4RDF has a standard API for abstracting the persistence of RDF that sits directly below the Model interface and allows an author to implement a mechanism for persisting an RDF graph in whatever database, filesystem, etc.. they choose. rdflib has a similar seperation but the actual interfaces differ. Daniel Krech and I have been working rather diligently on formalizing a universal persistence API that allows an implementation to support a graduated set of features:

  • Core RDF Store
  • Context-aware RDF Store
  • Notation 3 RDF Store (Formula-aware RDF Store)

It's still a work in progress (with regards to Notation 3 persistence, mostly) but at least those interfaces/method signatures needed by a context-aware RDF store are well spec'd out.

I won't bore you with the details (and there are plenty - as we've covered alot of ground) but you can dive in here. This driver, which allows 4Suite to use rdflib for persistence of it's RDF data, is the first step an an agreed migration that will phase out 4RDF and replace it with rdflib, eventually. This module, at the very least, allows for the dispatching of Versa queries on an rdflib Graph, is the first step in allowing a 4Suite repository to it's RDF graphs in an rdflib store - I think there is alot of synthesis worth exploring with redfoot, provides rdflib with access to a rather voluminous 4RDF test suite, and demonstrates how existing applications that use 4RDF could be ported to use rdflib instead.

Outstanding / Possible Issues:

1) the current rdflib Graph interfaces do not account for RDF reification. These are somewhat covered by support for Notation 3 quoted/hypothetical contexts. The only visible difference is in the test cases that match by statementUri

2) This driver has only been tested against the MySQL implementation of an rdflib store. This is mainly because it's currently the only rdflib store implementation that supports matching arbitrary triple / statement terms by REGEX patterns and/or the production of quads instead of triples (i.e. the name of the context of each resulting statement in addition). This is only an issue for RDF stores that are at least context-aware, but an interface mismatch at most.

I plan to do some more experimentation on the possiblities that this synthesis provides (surprise, surprise). The timing is rather appropriate given the on-going development on the next generation Versa query specification, the concurrent effort to graduate the 4Suite code base to 1.0, and my recent pleasant surprise regarding Versa, Sparta, and rdflib.

If you are interested in helping or learning more about the roadmap, you can pay #redfoot (on irc.freenode.net) a visit. That's where Daniel Kreck and the other rdflib folks have been burning braincells as of late.

Chimezie Ogbuji

via Copia

Redfoot: Updated Documentation

Daniel Krech, recently updated the Redfoot homepage with some additional documentation on what Redfoot is. It's a very interesting concept for leveraging Python (or any other scriptable language) and RDF as a distributed framework for applications.

Beyond the known advantages of modelling distributed components on an RDF Graph with well defined semantics for how you retrieve programs and execute them it also relies on a hybrid of XML and Python called Kid to facilitate templating of HTML.

The advantages of using a flexible programming language (such as Python) for manipulating XML is well written about (sift through the Copia archives, you'll find plenty). Couple that with a well modelled framework for including and executing remote modules as well as a programmatic access (using a similar idiom) to an underlying RDF Graph and you have yourself a very flexible foundation.

For example. Below is the Kid template used to render the contributers page on rdflib.net:

<div xmlns="http://www.w3.org/1999/xhtml"
 xmlns:kid="http://purl.org/kid/ns#">
    <?python
    FOAF = redfoot.namespace("http://xmlns.com/foaf/0.1/")
    DOAP = redfoot.namespace("http://usefulinc.com/ns/doap#")
    project = URIRef("%s#" % request.host)
    people = []
    seen = set()
    for property in [DOAP.maintainer, DOAP.developer, DOAP.documenter,    DOAP.translator, DOAP.tester,DOAP.helper]:
        for person in redfoot.objects(project, property):
            if person not in seen:
                seen.add(person)
                label = redfoot.label(person) or person
                relationships = set()
                for relationship in redfoot.predicates(project, person):
                    relationships.add(redfoot.label(relationship))
                people.append((label, person, relationships))

    people.sort()
    ?>

    <ul>
      <li kid:for="label, person, relationships in people">
        ${label},
    ${redfoot.value(person, FOAF.nick)},
    (${", ".join(relationships)})
      </li>
    </ul>
</div>

Redfoot feels like a hybrid of Narval and the 4Suite repository and represents what is common between the tangential goals of those two projects.

rdflib.net and redfoot.net (as well as some other sites) are examples of applications that run on a Redfoot instance.

[Uche Ogbuji]

via Copia

More on the PyBlosxom del.icio.us plug-in, and introducing task_control.py, a a pseudo cron plug-in for PyBlosxom

Micah put my del.icio.us daily links tool to immediate use on his blog. He uncovered a bug in the character handling, which is now fixed in the posted amara_delicious.py file.

I usually invoke the script from cron, but Micah asked if there was an alternative. I've been meaning to hack up a poor man's cron for PyBlosxom and this gave me an additional push. The result is task_control.py.

A sort of poor man's cron for PyBlosxom, this plug-in allows you to specify tasks (as Python scripts) to be run only at certain intervals Each time the plug-in is invoked it checks a set of tasks and the last time they were run. It runs only those that have not been run within the specified interval.

To run the Amara del.icio.us daily links script once a day, you would add the following to your config file:

py["tasks"] = {"/usr/local/bin/amara_delicious.py": 24*60*60}
py["task_control_file"] = py['datadir'] + "/task_control.dat"

You could of course have multiple script/interval mappings in the "tasks" dict. The scripts are run with variables request and config set, so, for example, if running from task_control.py, you could change the line of amara_delicious.py from

BASEDIR = '/srv/www/ogbuji.net/copia/pyblosxom/datadir'

to

BASEDIR = config['datadir']

[Uche Ogbuji]

via Copia