Planet chuffed

I'm still stumbling along a bit in my weblogging journey (just over one month now), but I've been gratified by the great feedback, and Copia has garnered a burst of attention in the past few days.

First of all, Copia is now a province on three planets that I know of:

The first two are topic-specific feeds, which is as I think it should be. I write on a very broad range of topics, and I have plenty on Python and XML alone, so there's no need to bombard the planets with masses of off-topic posts. I'm in great company on all three feeds.

But perhaps most pleasing has been a very kind comment from Bill de hÓra, someone's whose thinking and writing I respect a lot.

I think I've figured out a time management scheme that allows for posting to Copia without sacrificing the time I used to spend on so much other work that continues to pile up.

Now all I have to do is convince Chime to post more (he's never been as talkative as I am, to his credit).

[Uche Ogbuji]

via Copia

XML recursive directory listing, part 2

In part 1 I started to talk about dueling iterations for the use-case of using Python's os.walk() to emit a nested XML representation of a directory listing. I presented a working, but unsatisfactory approach and left off until part 2. Eric Gaumer wasted no time covering one of the key angles, so go read his follow- up.

It's the classic approach of turning recursion into iteration by managing one's own stack, which adds a lot more flexibility at the expense of a bit more opaque code. In this case it's not so bad because there is the old os.path.walk() standby that subsumes the recursive call-back. Eric uses a closure, though he doesn't need to (it's a good choice, though, if just for modularity).

Another place to turn for a bit of assistance is the XML API. 4Suite's MarkupWriter is a streaming output API, and so you pretty much have to process the file in the order in which you'll write their output. It would be neat if it supported modes or bookmarks, where you could move a "cursor" around to produce different sections of output. I know some tools in other languages have such facilities, and I've often considered adding these to MarkupWriter, using the power of Python's generators. Maybe this discussion will spur me on to doing so.

But there is also the fall-back of a node-based output API. I discussed the contrast between stream and node-based XML writers in "Proper XML output with new APIs in 4Suite and Amara". The following is equivalent code using Amara :

import os
import sys
from amara import binderytools

root = sys.argv[1]

doc = binderytools.create_document()
name = unicode(root)
doc.xml_append(
    doc.xml_element(u'directory', attributes={u'name': name})
)
dirs = {root: doc.directory}

for cdir, subdirs, files in os.walk(root):
    cdir_elem = dirs[cdir]
    name = unicode(cdir)
    for f in files:
        name = unicode(f)
        cdir_elem.xml_append(
            doc.xml_element(u'file', attributes={u'name': name})
            )
    for subdir in subdirs:
        full_subdir = os.path.join(root, subdir)
        name = unicode(full_subdir)
        subdir_elem = doc.xml_element(u'directory',
                                      attributes={u'name': name})
        cdir_elem.xml_append(subdir_elem)
        dirs[full_subdir] = subdir_elem

print doc.xml(indent=u"yes")  #Print it

It's not actually as much of a simplification as I'd thought it would be while working it out in my head. It's certainly more linear, but the need to track the mapping from directory name to directory element node adds back the cognitive load saved by eliminating the recursion. Ah well, it's another example.

Meanwhile, Dave Pawson had taken off with the example from yesterday and turned it into a full-fledged command-line utility, dirlist.py . It's long, so I posted it for download rather than in-line. Dave Pawson has more on his blog. Interesting journey, but thanks to Python, he was happy with the result.

[Uche Ogbuji]

via Copia

4Suite 1.0b1 via yum?

Dave Pawson was asking how to grab 4Suite using yum. I'm still yet to post a follow-up based on Dave's earlier question, and thanks to Eric Gaumer for carrying on the thread in some of the direction I'd planned, and I'll try to get back to that topic today. Anyway, Dave and I weren't really successful getting 4Suite 1.0b1 yum. I'm posting here for reference to our journey, and in the hopes that someone can help.

I use apt rather than yum, so i had to remember the right yum mojo again, but I started by looking at what I had on my system:

# rpm -q 4Suite
4Suite-1.0-3

OK. That's odd. 4Suite 1.0 is still in beta, so that's a strange version number. So I found out the real version number:

# rpm -ql 4Suite | grep "Xml/__packageInfo__.py$" | xargs grep

"^version" version = '1.0a3'

Ah. I see now. They omitted the "a" part. Well, it's one 4Suite release behind—not bad, but there are so many improvements in 4Suite 1.0b1 that you should really get the latest.

I went looking on google and found a promising candidate, 4Suite-1.0-8.b1.i386. This looks like it's in fedora-devel, so I tried looking at how to add that repository. I found help on aaltonen.us, where you can find the following yum repo spec:

[development] 
name=Fedora Core $releasever - Development Tree
#baseurl=http://download.fedora.redhat.com/pub/fedora/linux/core/development/$basearch/
mirrorlist=http://fedora.redhat.com/download/mirrors/fedora-core-rawhide
enabled=1
gpgcheck=1

I handed this off to Dave to try out (turned out the magic incantation is yum install 4Suite.i386). But the resulting chain of dependencies was way too far out on the bleeding edge. Dave was seeing updates to the likes of "perl, python, libxml, mysql kde, gnome, k3b the list goes on!":

I can't see that this is a true dependency from 4suite Uche?
Error: Missing Dependency: libdb_cxx-4.2.so is needed by package openoffice.org-libs
Error: Missing Dependency: libedataserver.so.3 is needed by package openoffice.org
Error: Missing Dependency: libebook.so.8 is needed by package openoffice.org
Error: Missing Dependency: gcc = 3.4.3-22.fc3 is needed by package gcc-g77

Oops. Ouch. The problem with the RPMs seems to be that fedora core is still testing the transition from 4Suite 1.0a3 to 1.0b1, and that's quite understandable. I look forward to seeing the more recent version in fedora core base.

At this point I advised David to ditch yum, just use the .src.rpm from the official 4Suite download and use rpmbuild to make himself a package. That also turned out to be a dead end: the spec file in the 1.0b1 release appears to be borked. Our fault. Ay ay ay. One of those days. I'll make sure it's fixed before the next release.

In the end Dave installed 4Suite from source, using "setup.py install", and all was well. I should have just told him to do that from the start.

Meanwhile, some notes from the fedora-devel 4Suite-1.0-8.b1 RPM.

The description is way out of date. I think it's 2 years old or more. For one thing 4Suite hasn't included 4DOM in aeons. I suggest the Fedora maintainers take the description from 4Suite.org.

Also, it requires "PyXML >= 0.7", but we dropped that requirement in the 4Suite 1.0b1 release.

Finally, it says "python-abi=2.4" is required. I suppose that might be FC3 maintainer preference, but I did want to mention that Python 2.2.3 is sufficient (though we do recommend 2.3.5).

[Uche Ogbuji]

via Copia

Python/XML community:

xmldiff 0.6.7
Picket

Xmldiff is a utility for extracting differences between two xml files. It returns a set of primitives to apply on source tree to obtain the destination tree.

LogiLab's Xmldiff is interesting for several reasons, including the fact that it uses XUpdate to represent the XMl differences. You can then use 4Suite's command-line XUpdate tool (or any other tool you like) to "patch" XML files with the diff. See Sylvain Thénault's announcement.

Picket is a CherryPy XSLT filter developed by Sylvain Hellegouarch.

The Picket filter is a simple CherryPy filter for processing XSLT as a template language. It uses 4Suite to do the job.

Nice. Preliminary inspection seems to recommend it as a good example of 4XSLT in server architecture in general. It makes good use of the API, and even implements processor object pooling (helps performance). As the CherryPy tutorial says,

A filter is an object that has a chance to work on a request as it goes through the usual CherryPy processing chain.

[Uche Ogbuji]

via Copia

XML recursive directory listing, part 1

Dave Pawson asked for help with using Python's os.walk() to emit a nested XML representation of a directory listing. The semantics of os.walk make this a bit awkward, and I have a good deal to say on the matter, but I first wanted to post some code for David and others with such a need before diving into fuller discussion of the matter. Here's the code.

import os
import sys

root = sys.argv[1]

from Ft.Xml import MarkupWriter
writer = MarkupWriter(indent=u"yes")

def recurse_dir(path):
    for cdir, subdirs, files in os.walk(path):
        writer.startElement(u'directory', attributes={u'name': unicode(cdir)})
        for f in files:
            writer.simpleElement(u'file', attributes={u'name': unicode(f)})
        for subdir in subdirs:
            recurse_dir(os.path.join(cdir, subdir))
        writer.endElement(u'directory')
        break

writer.startDocument()
recurse_dir(root)
writer.endDocument()

Save it as dirwalker.py or whatever. The following is sample usage (in UNIXese):

$ mkdir foo
$ mkdir foo/bar
$ touch foo/a.txt
$ touch foo/b.txt
$ touch foo/bar/c.txt
$ touch foo/bar/d.txt
$ python dirwalker.py foo/
<?xml version="1.0" encoding="UTF-8"?>
<directory name="foo/">
  <file name="a.txt"/>
  <file name="b.txt"/>
  <directory name="foo/bar">
    <file name="c.txt"/>
    <file name="d.txt"/>
  </directory>
</directory>[uogbuji@borgia tools]$ rm -rf foo
$

Notice that the code is really preempting the recursiveness of os.walk in order to impose its own recursion. This is the touchy issue I want to expand on. Check in later on today...

[Uche Ogbuji]

via Copia

A couple of Amara/CherryPy Demos

As I've mentioned, I've been playing with Amara/4Suite and CherryPy. Luis Miguel Morillas has been as well. We're both taking things slowly, pursuing it from different angles.

Luis has a "Web-based docbook browser and processor using CherryPy and Amara.". It's a very simple script for rendering as Web content an index and chapters of Mark Pilgrim's Dive into Python book as XML and XML+CSS (which seems to be creeping into the mainstream?).

I also have a demo as part of Amara, cherrypy-xml- inspector.py, which allows you to "inspect" an XML document, through a Web form using CherryPy and Amara. You can load any document off the Web and then enter in an amara expression, such as "doc.html.head.title" and get the result.

[Uche Ogbuji]

via Copia

Some 4Suite repository extension

If you just want to try out some handy XSLT extension modules for 4Suite's repository and skip all the blather, just scroll to the bottom of this item...

Akara is an extensible information gathering and presentation framework implemented in 4Suite.

As I describe it on the site:

In simple terms, you put notes into Akara (like a notepad). You put FAQ entries in (like a FAQ wizard). You put links and comments on those lings (like a Web log or bookmark manager). You put discussion logs in (like mailing list archives and instant messaging logs). You put code examples, articles, proposals, specifications, stories and reviews in (like a content manager). You put it all where it's convenient for the moment (like a Wiki). You can later on reorganize things relatively easily (like, ummm... like what?). You can see an example of Akara in action on my Akara site on XML processing in Python

I never really got it mature enough for release, in part because it's the project that finally left me gob-smacked with the sense that although 4Suite's core libraries are super-useful, the server framework is rather rickety and could do with a lot less wheel reinvention (I've discussed this matter with regard to my recent advocacy of CherryPy as a protocol server backbone for 4Suite after 1.0).

Anyways I'm rebuilding Akara to be a proof of concept of 4Suite repository/CherryPy integration. It's going slowly due to workload, and since many of the Akara XSLT extension modules are useful independently from Akara, I'm posting them here for now. They are:

  • cachetool.py—an extension for caching results of common and slow XSLT templates. I use this heavily to cache the XML results of Versa queries in 4Suite. It stores and manages the caches as XML resources in the repository, with a given time-to-live. There is also a method to invalidate a cached value.
  • calwidget.py—a widget that inserts an XHTML calendar into the XSLT output
  • emailftext.py—a widget for reading UNIX mailboxes and using XSLT dispatch to process the items, and to send messages. Not vetted for security
  • feedtools.py—an extension for RSS aggregation. Uses Mark Pilgrim's Universal Feed Parser to read a list of feeds given by URL and then write the result to the XSLT output as a consolidated RSS 1.0 feed. You probably want to use this together with cachetool.py so it's not retrieving feeds on every request.
  • akaraftext.py—parses Akara markup (a wiki-like language) and inserts XHTML into the output stream

[Uche Ogbuji]

via Copia

Clueful house

US House of Reps: FINAL VOTE RESULTS FOR ROLL CALL 161, via Derek Willis in private correspondence

OK, no, so I'm not so much of a political wonk that I'm doing the day's mathematics on the voting patterns behind "Making emergency supplemental appropriations for the fiscal year ending September 30, 2005, and for other purposes". No, what I'm interested in is the file extension of that URL: ".xml". View source, folks. Surely enough, our caveman congress is savvy enough that they are using XML, DTDs and XSLT in a ridiculously clueful manner. Knock me over with a feather. Next thing Tom DeLay will be running XQuery on these rolls so he could figure out which Dem he can play bogey-of-the-month with. Of course, that would put me in the odd position of feeling sorry for DeLay, for having to use XQuery...

Anyway, see also "Legislative Documents in XML at the United States House of Representatives". Willis says "they've been developing a system for votes and legislation (although that'll take some time to implement), and it deserves attention and support."

Vote for cloture on that, brother.

[Uche Ogbuji]

via Copia

Principles of XML design: When the order of XML elements matters

Principles of XML design: When the order of XML elements matters

Subtitle: When to be strict and when to be lax as you decide how to order child elements

Editor's synopsis: When multiple XML elements occur within another element, does element order matter? Whether it's the order in which the parser reports elements to applications, or the question of whether or not to mandate specific order in schema patterns, things are not always as simple as they may seem. In this article, Uche Ogbuji covers design and processing considerations related to the order of XML elements.

This is the latest in my series on XML design. The other installments are:

Also along these lines are my discussion of ERH's excellent book Effective XML, and my article "Keep your XML clean".

[Uche Ogbuji]

via Copia

CherryPy 2.0

CherryPy

After several months of hard work the first stable release of CherryPy2 is finally available. Downloads are available here and the ChangeLog can be viewed here.

Remi Delon announced the 2.0 release of CherryPy. It's my favorite entry in the the Python Web frameworks sweepstakes. It's very simple to learn and use, and it just makes sense. Very few surprising conventions. My own endorsement is among the many testimonials CherryPy has picked up

I'm also pulling for CherryPy to form the heart of the protocol server for the next generation of 4Suite. As I said on the CherryPy discussion board:

I have a nefarious agenda: I regret our having reinvented some wheels in 4Suite, and most especially the Web framework wheel. To be fair to us, the likes of CherryPy were not available at the time and it was pretty much Zope, Webware, mod_python or bust, and we didn't like any of those options. But now we're saddled with really not-that-great re-implementations of HTTP [server] framework, session management, etc, all too tightly coupled into the XML database for my liking. I'd like to move to a more open architecture that decouples core XML libraries from XML DBMS from protocol framework (with CherryPy ideally as the latter). That way, [someone] could get CherryPy, and if they liked, a simple XML processing plug in, and if they liked, an XML DB plug in, and so on. If I can get [something] working sweet as sugar with CherryPy, I bet I could convice my fellow 4Suite developers to leave the Web frameworks to the dedicated Web frameworks projects.

I've been plugging slowly away with these ideas, but it's been hard to get to it with all the other items in the work queue. Perhaps this announcement will spur me to get something into shape.

[Uche Ogbuji]

via Copia