Recipe: fast scan of an XML file for one field

If you have a huge XML file and you need to grab the first instance of a particular field in a fast and memory efficient manner, a simple one-liner with Amara's pushbind does the trick.

val = unicode(amara.pushbind("book.xml", "title").next())

Returns the text of the first title element in a book.xml (which could be Docbook or any other format with a title element), loading hardly any of the file into memory. It also doesn't parse the file beyond the target element. It would be a shade slower to get such an element at the end of a file. For example, the following line gets the title of a Docbook index.

val = unicode(amara.pushbind("book.xml", "index/title").next())

Even when finding an element near the end of a file it's very fast. All my use cases were pretty much instantaneous working with a 150MB file (I'm working on convincing the client to avoid such huge files).

If the target element is not found, you'll get a StopIteration exception.

[Uche Ogbuji]

via Copia