XML recursive directory listing, part 3

In parts 1 and 2 I discussed code to use Python to recursively walk a directory and emit a nested XML representation of the contents.

Dave Pawson built on my basic techniques and came up with dirlist.py, a fully tricked-out version with all sorts of options and amenities. Well, he wasn't even finished. He sent me a further version today in which he "tidied up [the] program, and added options [for file] date and size."

Cool. I've posted it here: dirlist2.py. If further versions are toward, I'll move it into my CVS. Dave is a self-confessed Python newbie. I had to make some quick fixes just to get it to work on my machine, but I haven't had time to carefully vet the entire program. Please let us know if you run into trouble (a comment here should suffice).

Usage example:

$ mkdir foo
$ mkdir foo/bar
$ touch foo/a.txt
$ touch foo/b.txt
$ touch foo/bar/c.txt
$ touch foo/bar/d.txt
$ python dirlist2.py foo/
Processing /home/uogbuji/foo
<?xml version="1.0" encoding="UTF-8"?>
<directory name="/home/uogbuji/foo">
  <file name="a.txt"/>
  <file name="b.txt"/>
  <directory name="/home/uogbuji/foo/bar">
    <file name="c.txt"/>
    <file name="d.txt"/>
  </directory>
</directory>

$ python dirlist2.py -d foo
Adding file dates
Processing /home/uogbuji/foo
<?xml version="1.0" encoding="UTF-8"?>
<directory name="/home/uogbuji/foo">
  <file date="2005-05-09" name="a.txt"/>
  <file date="2005-05-09" name="b.txt"/>
  <directory name="/home/uogbuji/foo/bar">
    <file date="2005-05-09" name="c.txt"/>
    <file date="2005-05-09" name="d.txt"/>
  </directory>
</directory>

$ python dirlist2.py foo/ foo.xml
Processing /home/uogbuji/foo
$ cat foo.xml
<?xml version="1.0" encoding="UTF-8"?>
<directory name="/home/uogbuji/foo">
  <file name="a.txt"/>
  <file name="b.txt"/>
  <directory name="/home/uogbuji/foo/bar">
    <file name="c.txt"/>
    <file name="d.txt"/>
  </directory>
</directory>

[Uche Ogbuji]

via Copia

4 responses

b / 1024*1024

Should be "b / (1024*1024)"

Watch those precedence rules!

— Amos Newcombe

Unless the intent was to display the size in bytes but rounded to the nearest kibibyte.

— John Cowan

Hi Uche,

This is a nice recipe but I was wondering, let's say that I want to generate an XML for my complete filesystem. I guess you can imagine that the rsulting XML would get huge, plus it would kill the performance of the script I think.

So instead of keeping the all tree in one file, couldn't you split in different files?

One simple way could be one file per directory. For example something like :

<directory name="/home/uogbuji/foo">

<file name="a.txt"/>

<file name="b.txt"/>

<directory ref="18z33dq9sdq" />

</directory>

The ref attribute would be simply the name of a sub-directory as an md5 sum maybe of the fullpath of that sub-directory.

Of course you could end up with an amazing number of files.

So I see two options that could be included in your script :

-one : would mean you only one huge file as you are doing now

-multiple : which would create as many files as directories you have

The issue is that there is no reason to keep a complete filesystem tree in one huge file. If you had to write a file browser, you wouldn't build a tree of the complete filesystem each time the user expand one node.

Therefore, you could also not recurse into the sub directories. But when reading the xml file, the client would have to call again the script with the new directory path :

<directory name="/home/uogbuji/foo">

<file name="a.txt"/>

<file name="b.txt"/>

<directory name="/home/uogbuji/foo/bar" />

</directory>

Then when the client reads this file, it will have to call again the script with "/home/uogbuji/foo/bar". It will load on demand.

My 2 cents :)

- Sylvain

— Sylvain Hellegouarch

Note the full line is:

return repr(b / 1024*1024)+"Mb"