XML recursive directory listing, part 2

In part 1 I started to talk about dueling iterations for the use-case of using Python's os.walk() to emit a nested XML representation of a directory listing. I presented a working, but unsatisfactory approach and left off until part 2. Eric Gaumer wasted no time covering one of the key angles, so go read his follow- up.

It's the classic approach of turning recursion into iteration by managing one's own stack, which adds a lot more flexibility at the expense of a bit more opaque code. In this case it's not so bad because there is the old os.path.walk() standby that subsumes the recursive call-back. Eric uses a closure, though he doesn't need to (it's a good choice, though, if just for modularity).

Another place to turn for a bit of assistance is the XML API. 4Suite's MarkupWriter is a streaming output API, and so you pretty much have to process the file in the order in which you'll write their output. It would be neat if it supported modes or bookmarks, where you could move a "cursor" around to produce different sections of output. I know some tools in other languages have such facilities, and I've often considered adding these to MarkupWriter, using the power of Python's generators. Maybe this discussion will spur me on to doing so.

But there is also the fall-back of a node-based output API. I discussed the contrast between stream and node-based XML writers in "Proper XML output with new APIs in 4Suite and Amara". The following is equivalent code using Amara :

import os
import sys
from amara import binderytools

root = sys.argv[1]

doc = binderytools.create_document()
name = unicode(root)
doc.xml_append(
    doc.xml_element(u'directory', attributes={u'name': name})
)
dirs = {root: doc.directory}

for cdir, subdirs, files in os.walk(root):
    cdir_elem = dirs[cdir]
    name = unicode(cdir)
    for f in files:
        name = unicode(f)
        cdir_elem.xml_append(
            doc.xml_element(u'file', attributes={u'name': name})
            )
    for subdir in subdirs:
        full_subdir = os.path.join(root, subdir)
        name = unicode(full_subdir)
        subdir_elem = doc.xml_element(u'directory',
                                      attributes={u'name': name})
        cdir_elem.xml_append(subdir_elem)
        dirs[full_subdir] = subdir_elem

print doc.xml(indent=u"yes")  #Print it

It's not actually as much of a simplification as I'd thought it would be while working it out in my head. It's certainly more linear, but the need to track the mapping from directory name to directory element node adds back the cognitive load saved by eliminating the recursion. Ah well, it's another example.

Meanwhile, Dave Pawson had taken off with the example from yesterday and turned it into a full-fledged command-line utility, dirlist.py . It's long, so I posted it for download rather than in-line. Dave Pawson has more on his blog. Interesting journey, but thanks to Python, he was happy with the result.

[Uche Ogbuji]

via Copia