Planet chuffed

I'm still stumbling along a bit in my weblogging journey (just over one month now), but I've been gratified by the great feedback, and Copia has garnered a burst of attention in the past few days.

First of all, Copia is now a province on three planets that I know of:

The first two are topic-specific feeds, which is as I think it should be. I write on a very broad range of topics, and I have plenty on Python and XML alone, so there's no need to bombard the planets with masses of off-topic posts. I'm in great company on all three feeds.

But perhaps most pleasing has been a very kind comment from Bill de hÓra, someone's whose thinking and writing I respect a lot.

I think I've figured out a time management scheme that allows for posting to Copia without sacrificing the time I used to spend on so much other work that continues to pile up.

Now all I have to do is convince Chime to post more (he's never been as talkative as I am, to his credit).

[Uche Ogbuji]

via Copia

XML recursive directory listing, part 2

In part 1 I started to talk about dueling iterations for the use-case of using Python's os.walk() to emit a nested XML representation of a directory listing. I presented a working, but unsatisfactory approach and left off until part 2. Eric Gaumer wasted no time covering one of the key angles, so go read his follow- up.

It's the classic approach of turning recursion into iteration by managing one's own stack, which adds a lot more flexibility at the expense of a bit more opaque code. In this case it's not so bad because there is the old os.path.walk() standby that subsumes the recursive call-back. Eric uses a closure, though he doesn't need to (it's a good choice, though, if just for modularity).

Another place to turn for a bit of assistance is the XML API. 4Suite's MarkupWriter is a streaming output API, and so you pretty much have to process the file in the order in which you'll write their output. It would be neat if it supported modes or bookmarks, where you could move a "cursor" around to produce different sections of output. I know some tools in other languages have such facilities, and I've often considered adding these to MarkupWriter, using the power of Python's generators. Maybe this discussion will spur me on to doing so.

But there is also the fall-back of a node-based output API. I discussed the contrast between stream and node-based XML writers in "Proper XML output with new APIs in 4Suite and Amara". The following is equivalent code using Amara :

import os
import sys
from amara import binderytools

root = sys.argv[1]

doc = binderytools.create_document()
name = unicode(root)
doc.xml_append(
    doc.xml_element(u'directory', attributes={u'name': name})
)
dirs = {root: doc.directory}

for cdir, subdirs, files in os.walk(root):
    cdir_elem = dirs[cdir]
    name = unicode(cdir)
    for f in files:
        name = unicode(f)
        cdir_elem.xml_append(
            doc.xml_element(u'file', attributes={u'name': name})
            )
    for subdir in subdirs:
        full_subdir = os.path.join(root, subdir)
        name = unicode(full_subdir)
        subdir_elem = doc.xml_element(u'directory',
                                      attributes={u'name': name})
        cdir_elem.xml_append(subdir_elem)
        dirs[full_subdir] = subdir_elem

print doc.xml(indent=u"yes")  #Print it

It's not actually as much of a simplification as I'd thought it would be while working it out in my head. It's certainly more linear, but the need to track the mapping from directory name to directory element node adds back the cognitive load saved by eliminating the recursion. Ah well, it's another example.

Meanwhile, Dave Pawson had taken off with the example from yesterday and turned it into a full-fledged command-line utility, dirlist.py . It's long, so I posted it for download rather than in-line. Dave Pawson has more on his blog. Interesting journey, but thanks to Python, he was happy with the result.

[Uche Ogbuji]

via Copia

4Suite 1.0b1 via yum?

Dave Pawson was asking how to grab 4Suite using yum. I'm still yet to post a follow-up based on Dave's earlier question, and thanks to Eric Gaumer for carrying on the thread in some of the direction I'd planned, and I'll try to get back to that topic today. Anyway, Dave and I weren't really successful getting 4Suite 1.0b1 yum. I'm posting here for reference to our journey, and in the hopes that someone can help.

I use apt rather than yum, so i had to remember the right yum mojo again, but I started by looking at what I had on my system:

# rpm -q 4Suite
4Suite-1.0-3

OK. That's odd. 4Suite 1.0 is still in beta, so that's a strange version number. So I found out the real version number:

# rpm -ql 4Suite | grep "Xml/__packageInfo__.py$" | xargs grep

"^version" version = '1.0a3'

Ah. I see now. They omitted the "a" part. Well, it's one 4Suite release behind—not bad, but there are so many improvements in 4Suite 1.0b1 that you should really get the latest.

I went looking on google and found a promising candidate, 4Suite-1.0-8.b1.i386. This looks like it's in fedora-devel, so I tried looking at how to add that repository. I found help on aaltonen.us, where you can find the following yum repo spec:

[development] 
name=Fedora Core $releasever - Development Tree
#baseurl=http://download.fedora.redhat.com/pub/fedora/linux/core/development/$basearch/
mirrorlist=http://fedora.redhat.com/download/mirrors/fedora-core-rawhide
enabled=1
gpgcheck=1

I handed this off to Dave to try out (turned out the magic incantation is yum install 4Suite.i386). But the resulting chain of dependencies was way too far out on the bleeding edge. Dave was seeing updates to the likes of "perl, python, libxml, mysql kde, gnome, k3b the list goes on!":

I can't see that this is a true dependency from 4suite Uche?
Error: Missing Dependency: libdb_cxx-4.2.so is needed by package openoffice.org-libs
Error: Missing Dependency: libedataserver.so.3 is needed by package openoffice.org
Error: Missing Dependency: libebook.so.8 is needed by package openoffice.org
Error: Missing Dependency: gcc = 3.4.3-22.fc3 is needed by package gcc-g77

Oops. Ouch. The problem with the RPMs seems to be that fedora core is still testing the transition from 4Suite 1.0a3 to 1.0b1, and that's quite understandable. I look forward to seeing the more recent version in fedora core base.

At this point I advised David to ditch yum, just use the .src.rpm from the official 4Suite download and use rpmbuild to make himself a package. That also turned out to be a dead end: the spec file in the 1.0b1 release appears to be borked. Our fault. Ay ay ay. One of those days. I'll make sure it's fixed before the next release.

In the end Dave installed 4Suite from source, using "setup.py install", and all was well. I should have just told him to do that from the start.

Meanwhile, some notes from the fedora-devel 4Suite-1.0-8.b1 RPM.

The description is way out of date. I think it's 2 years old or more. For one thing 4Suite hasn't included 4DOM in aeons. I suggest the Fedora maintainers take the description from 4Suite.org.

Also, it requires "PyXML >= 0.7", but we dropped that requirement in the 4Suite 1.0b1 release.

Finally, it says "python-abi=2.4" is required. I suppose that might be FC3 maintainer preference, but I did want to mention that Python 2.2.3 is sufficient (though we do recommend 2.3.5).

[Uche Ogbuji]

via Copia

Python/XML community:

xmldiff 0.6.7
Picket

Xmldiff is a utility for extracting differences between two xml files. It returns a set of primitives to apply on source tree to obtain the destination tree.

LogiLab's Xmldiff is interesting for several reasons, including the fact that it uses XUpdate to represent the XMl differences. You can then use 4Suite's command-line XUpdate tool (or any other tool you like) to "patch" XML files with the diff. See Sylvain Thénault's announcement.

Picket is a CherryPy XSLT filter developed by Sylvain Hellegouarch.

The Picket filter is a simple CherryPy filter for processing XSLT as a template language. It uses 4Suite to do the job.

Nice. Preliminary inspection seems to recommend it as a good example of 4XSLT in server architecture in general. It makes good use of the API, and even implements processor object pooling (helps performance). As the CherryPy tutorial says,

A filter is an object that has a chance to work on a request as it goes through the usual CherryPy processing chain.

[Uche Ogbuji]

via Copia

XML recursive directory listing, part 1

Dave Pawson asked for help with using Python's os.walk() to emit a nested XML representation of a directory listing. The semantics of os.walk make this a bit awkward, and I have a good deal to say on the matter, but I first wanted to post some code for David and others with such a need before diving into fuller discussion of the matter. Here's the code.

import os
import sys

root = sys.argv[1]

from Ft.Xml import MarkupWriter
writer = MarkupWriter(indent=u"yes")

def recurse_dir(path):
    for cdir, subdirs, files in os.walk(path):
        writer.startElement(u'directory', attributes={u'name': unicode(cdir)})
        for f in files:
            writer.simpleElement(u'file', attributes={u'name': unicode(f)})
        for subdir in subdirs:
            recurse_dir(os.path.join(cdir, subdir))
        writer.endElement(u'directory')
        break

writer.startDocument()
recurse_dir(root)
writer.endDocument()

Save it as dirwalker.py or whatever. The following is sample usage (in UNIXese):

$ mkdir foo
$ mkdir foo/bar
$ touch foo/a.txt
$ touch foo/b.txt
$ touch foo/bar/c.txt
$ touch foo/bar/d.txt
$ python dirwalker.py foo/
<?xml version="1.0" encoding="UTF-8"?>
<directory name="foo/">
  <file name="a.txt"/>
  <file name="b.txt"/>
  <directory name="foo/bar">
    <file name="c.txt"/>
    <file name="d.txt"/>
  </directory>
</directory>[uogbuji@borgia tools]$ rm -rf foo
$

Notice that the code is really preempting the recursiveness of os.walk in order to impose its own recursion. This is the touchy issue I want to expand on. Check in later on today...

[Uche Ogbuji]

via Copia

To die for every day

The other day a friend who happened to check out Copia told me "I really liked that quote-to-die bit". It took me a moment to realize she really meant "Quotidie". Her pronunciation had never even occurred to me (I guess I lack imagination).

"Quotidie" is, of course, Latin for "daily". There is meant to be a pun on the fact that a quote heads each entry ("Quote-a-day")., but this is a bit incidental. I certainly pronounce the o as in "ought", the "tid" as in "teed off", and the ending almost to rhyme with in "dee-jay" (much less emphasis on the "dee"). I'm too lazy to bust out the proper IPA for it.

Anyway, in hopes that it will prevent any misunderstanding, I'll use the syllable length markers in the title from now, with macrons over the first i and the e ("Quotīdiē").

I guess as Michael Kaplan would say:

This post brought to you by the letters "ī" (U +012B, a.k.a. LATIN SMALL LETTER I WITH MACRON) and "ē" (U +012B, a.k.a. LATIN SMALL LETTER E WITH MACRON)

[Uche Ogbuji]

via Copia

A couple of Amara/CherryPy Demos

As I've mentioned, I've been playing with Amara/4Suite and CherryPy. Luis Miguel Morillas has been as well. We're both taking things slowly, pursuing it from different angles.

Luis has a "Web-based docbook browser and processor using CherryPy and Amara.". It's a very simple script for rendering as Web content an index and chapters of Mark Pilgrim's Dive into Python book as XML and XML+CSS (which seems to be creeping into the mainstream?).

I also have a demo as part of Amara, cherrypy-xml- inspector.py, which allows you to "inspect" an XML document, through a Web form using CherryPy and Amara. You can load any document off the Web and then enter in an amara expression, such as "doc.html.head.title" and get the result.

[Uche Ogbuji]

via Copia

Quotīdiē

I pitched my day's leazings in Crimmercrock Lane,
To tie up my garter and jog on again,
When a dear dark-eyed gentleman passed there and said,
In a way that made all o' me colour rose-red,
   "What do I see—
   O pretty knee!"
And he came and he tied up my garter for me.

'Twixt sunset and moonrise it was, I can mind:
Ah, 'tis easy to lose what we nevermore find!—
Of the dear stranger's home, of his name, I knew nought,
But I soon knew his nature and all that it brought.
   Then bitterly
   Sobbed I that he
Should ever have tied up my garter for me!

Yet now I've beside me a fine lissom lad,
And my slip's nigh forgot, and my days are not sad;
My own dearest joy is he, comrade, and friend,
He it is who safe-guards me, on him I depend;
   No sorrow brings he,
   And thankful I be
That his daddy once tied up my garter for me!

Thomas Hardy—"the Dark-Eyed Gentleman", Time's Laughingstocks and Other Verses

I've since finished Expansive Poetry ("Essays on the New Narrative and the New Formalism"), but as I mentioned before, I would like to get back to some of the attacks on Ezra Pound and T.S.Eliot in that volume.

Richard Moore, who is no lightweight authority, uses Hardy's wonderful "Dark-Eyed Gentleman" to set up an assault on the acknowledged pillars of modern poetry. He shows how Hardy's work was couched in a tradition while shrewdly undermining the ugliest aspects of that tradition. Moore points out that Hardy, rather than rail on in the poem about the near misogynistic standards of Victorian mores, chose to create a vivid character who expressed the problem using a frank, affecting voice with just the right amount of irony. So the point is that Hardy is an poet of extraordinary skill and sensitivity? As a devotee of Hardy, I whole- heartedly agree. After working this point about quiet revolution through a wandering journey in Euripides, Shakespeare and others, Moore arrives at his main task.

Moore starts by blasting Pound's "In a station of the metro". The entire poem:

The apparition of these faces in the crowd:
Petals on a wet, black bough.

Moore scoffs at the slackness of the iambic hexameter of the first line and at the second line, which he says "makes no metric sense at all". He then goes on for almost two pages about how the poem shows contempt for the conventions of English poetry by merely "alluding" to iambic pentameter, rather than properly using it.

At this point I'm bewildered. Can this really be Richard Moore writing? It seems obvious that Pound here is translating Chinese and Japanese prosodic conventions into English accentual verse. But never mind the eastern motor within this poem. One needn't know the first thing about haiku in order to feel the power of Pound's 4 stresses per line.

The apparition of these faces in the crowd:
Petals on a wet, black bough.

Coleridge used the same 4 stresses non-syllabic in Christabel. You need look no further than the famous opening lines:

'T is the middle of night by the castle clock
And the owls have awakened the crowing cock;

I can imagine Moore praising Coleridge as a metrical genius for the way he mirrors the lines as anapestic tetrameters with iambic final feet, but then how would he explain the "Tu–whit !— — Tu–whoo !" of the next line? What are those, anyway? Two iambs (surely no one would stoop to such scansion)? Two spondees? What of the clue in the em dashes? (Christabel was a favorite of an old girlfriend of mine, and I remember her reading it aloud. She was no prosodist, but I remember that she nailed the crucial caesura in that line.) How would Moore go on to explain the rest of Christabel? (Forget for a moment that even Coleridge can't adequately explain "How drowsily it crew")? Of course a discerning critic such as Moore would appreciate the accentual meter for what it is. Why can't he see Pound's poem the same way?

Clearly "In a station of the metro" is accentual, rather than accentual- syllabic. It would take a truly tin ear to want to stress "of" and "in" in the first line, just because it made them iambic, as it would to add "like" to the beginning of the second, because it makes it approximately iambic. Pound uses the copious unstressed syllables in the first line to emphasize that he's starting from a very quotidian encounter ("quotidian" in the modern sense, as borrowed by English from medieval Latin, rather the classical Latin adverbial expression, as used in the title of this article). The second line explodes into the image as even Moore acknowledges. The lack of unstressed syllables cuts the image into the line as surely as a laser etching. Even the unstressed syllables in the second line have a purpose. It occurs to me to think of "on a" as saying "if you mistook that last word for a cheap-trick trochee, just you watch what comes next".

I have always counted this poem as one of the triumphs of "free verse". De gustibus non disputandem and all that–I can understand a critic's not liking it, but I cannot understand a critic's insistence in setting up such a blatant straw man of false prosody.

Moore then goes on to work on Eliot (I can just hear Robert Graves goading him: "These be thy Gods, O Israel").

Let us go then, you and I,
When the evening is spread out against the sky
Like a patient etherised upon a table;

(from "The Love Song of J. Alfred Prufrock")

Moore goes on about how this simile just does not appeal to him. Again this is a matter of taste, and there's not much material for argument. But he also claims that Eliot is being merely capricious and manipulative by (ab)using such an outlandish trope. Surely Moore has read the numerous critics, such as I.A. Richards who recognized Eliot's hearkening back to the metaphysical conceit. Surely he knows that Eliot himself wrote about his debt to the Metaphysicals. Throughout Eliot's work, his exploration of the conceit has always been his most personal application of the ideas he expounded in "Tradition and the Individual Talent". No news there. Surely Moore would not dare castigate John Donne for his extravagance in trope. Should we invoke the Dean of St. Pauls in Eliot's defence? Alas. It seems that Donne can only commiserate:

Who would not laugh at mee, if I should say
I saw a flaske of powder burn a day

(from "The Broken Heart") (Note: the italics are original (I think), not mine. I'm not sure why the Luminarium doesn't italicize that phrase. All my dead tree texts do.

Ah well. Moore's attack on "Prufrock" again reads to me as from a critic who probably knows better when he chooses to ignore the precedents on which the poet is building. The critic doesn't want to admit any mitigation of his attack on the poets supposed sullying of tradition.

Many of the free verse fundamentalist followers of Pound and Eliot seemed truly confused about the craft of those two. They thought that Pound and Eliot helped free them from the supposed tyranny of form. They clearly couldn't appreciate what great examples Pound and Eliot were of the fact that free verse ain't no free lunch. It takes even more craft to execute free verse well than it does to write in form, and that craft right now must be learned rigorously from the metrical tradition. As I've said, I agree with Expansive Poetry in its denunciation of the movement that marginalized form in the middle 20th century, but I find it unfortunate when a critic such as Moore makes the same mistake as the free verse fundamentalists, even in pursuit of the opposite argument.

[Uche Ogbuji]

via Copia

Some 4Suite repository extension

If you just want to try out some handy XSLT extension modules for 4Suite's repository and skip all the blather, just scroll to the bottom of this item...

Akara is an extensible information gathering and presentation framework implemented in 4Suite.

As I describe it on the site:

In simple terms, you put notes into Akara (like a notepad). You put FAQ entries in (like a FAQ wizard). You put links and comments on those lings (like a Web log or bookmark manager). You put discussion logs in (like mailing list archives and instant messaging logs). You put code examples, articles, proposals, specifications, stories and reviews in (like a content manager). You put it all where it's convenient for the moment (like a Wiki). You can later on reorganize things relatively easily (like, ummm... like what?). You can see an example of Akara in action on my Akara site on XML processing in Python

I never really got it mature enough for release, in part because it's the project that finally left me gob-smacked with the sense that although 4Suite's core libraries are super-useful, the server framework is rather rickety and could do with a lot less wheel reinvention (I've discussed this matter with regard to my recent advocacy of CherryPy as a protocol server backbone for 4Suite after 1.0).

Anyways I'm rebuilding Akara to be a proof of concept of 4Suite repository/CherryPy integration. It's going slowly due to workload, and since many of the Akara XSLT extension modules are useful independently from Akara, I'm posting them here for now. They are:

  • cachetool.py—an extension for caching results of common and slow XSLT templates. I use this heavily to cache the XML results of Versa queries in 4Suite. It stores and manages the caches as XML resources in the repository, with a given time-to-live. There is also a method to invalidate a cached value.
  • calwidget.py—a widget that inserts an XHTML calendar into the XSLT output
  • emailftext.py—a widget for reading UNIX mailboxes and using XSLT dispatch to process the items, and to send messages. Not vetted for security
  • feedtools.py—an extension for RSS aggregation. Uses Mark Pilgrim's Universal Feed Parser to read a list of feeds given by URL and then write the result to the XSLT output as a consolidated RSS 1.0 feed. You probably want to use this together with cachetool.py so it's not retrieving feeds on every request.
  • akaraftext.py—parses Akara markup (a wiki-like language) and inserts XHTML into the output stream

[Uche Ogbuji]

via Copia