XML recursive directory listing, part 1

Dave Pawson asked for help with using Python's os.walk() to emit a nested XML representation of a directory listing. The semantics of os.walk make this a bit awkward, and I have a good deal to say on the matter, but I first wanted to post some code for David and others with such a need before diving into fuller discussion of the matter. Here's the code.

import os
import sys

root = sys.argv[1]

from Ft.Xml import MarkupWriter
writer = MarkupWriter(indent=u"yes")

def recurse_dir(path):
    for cdir, subdirs, files in os.walk(path):
        writer.startElement(u'directory', attributes={u'name': unicode(cdir)})
        for f in files:
            writer.simpleElement(u'file', attributes={u'name': unicode(f)})
        for subdir in subdirs:
            recurse_dir(os.path.join(cdir, subdir))
        writer.endElement(u'directory')
        break

writer.startDocument()
recurse_dir(root)
writer.endDocument()

Save it as dirwalker.py or whatever. The following is sample usage (in UNIXese):

$ mkdir foo
$ mkdir foo/bar
$ touch foo/a.txt
$ touch foo/b.txt
$ touch foo/bar/c.txt
$ touch foo/bar/d.txt
$ python dirwalker.py foo/
<?xml version="1.0" encoding="UTF-8"?>
<directory name="foo/">
  <file name="a.txt"/>
  <file name="b.txt"/>
  <directory name="foo/bar">
    <file name="c.txt"/>
    <file name="d.txt"/>
  </directory>
</directory>[uogbuji@borgia tools]$ rm -rf foo
$

Notice that the code is really preempting the recursiveness of os.walk in order to impose its own recursion. This is the touchy issue I want to expand on. Check in later on today...

[Uche Ogbuji]

via Copia
2 responses
You could also push tokens on a stack and avoid the recursion. Same end result, you just control your own stack rather than rely on the call stack to unwind.



<pre>

import os

import sys



from Ft.Xml import MarkupWriter

writer = MarkupWriter(indent=u"yes")



root = sys.argv[1]

stack = []



def xml_tree(xml_root_dir):

def create(junk, dirpath, namelist):

writer.startElement(u'directory', attributes={u'name': unicode(dirpath)})

stack.append(u'directory')

for name in namelist:

if (os.path.isfile(os.path.join(dirpath,name))):

writer.simpleElement(u'file', attributes={u'name': unicode(name)})

os.path.walk(xml_root_dir, create, None)

for token in stack:

writer.endElement(token)



writer.startDocument()

xml_tree(root)

writer.endDocument()





gaumer@debiantosh:(~)$ python me.py foo/

<?xml version="1.0" encoding="UTF-8"?>

<directory name="foo/">

  <file name="a.txt"/>

  <file name="b.txt"/>

  <directory name="foo/bar">

  <file name="c.txt"/>

  <file name="d.txt"/>

  <directory name="foo/bar/dee">

  <file name="e.txt"/>

  <file name="f.txt"/>

  <directory name="foo/bar/dee/boo">

  <file name="g.txt"/>

  <file name="h.txt"/>

  <directory name="foo/bar/dee/boo/laa">

  <file name="i.txt"/>

  <file name="j.txt"/>

  </directory>

  </directory>

  </directory>

  </directory>

</directory>gaumer@debiantosh:(~)$



</pre>



Sorry... this may be a bit whitespace damaged. I added a few more levels just to make sure it worked.
Well that didn't work so well. I've posted the iterative code on my blog.