Domlette and Saxlette: huge performance boosts for 4Suite (and derived code)

For a while 4Suite has had an 80/20 DOM implementation completely in C: Domlette (formerly named cDomlette). Jeremy has been making a lot of performance tweaks to the C code, and current CVS is already 3-4 times faster than Domlette in 4Suite 1.0a4.

In addition, Jeremy stealthily introduced a new feature to 4Suite, Saxlette. Saxlette uses the same Expat C code Domlette uses, but exposes it as SAX. So we get SAX implemented completely in C. It follows the Python/SAX API normally, so for example the following code uses Saxlette to count the elements:

from xml import sax

furi = "file:ot.xml"

class element_counter(sax.ContentHandler):
    def startDocument(self):
        self.ecount = 0

    def startElementNS(self, name, qname, attribs):
        self.ecount += 1

parser = sax.make_parser(['Ft.Xml.Sax'])
handler = element_counter()
parser.setContentHandler(handler)
parser.parse(furi)
print "Elements counted:", handler.ecount

If you don't care about PySax compatibility, you can use the more specialized API, which involves the following lines in place of the equivalents above:

from Ft.Xml import Sax
...
class element_counter():
....
parser = Sax.CreateParser()

The code changes needed from the first listing above to regular PySax are minimal. As Jeremy puts it:

Unlike the distributed PySax drivers, Saxlette follows the SAX2 spec and defaults feature_namespaces to True and feature_namespace_prefixes to False both of which are not allowed to be changed (which is exactly what SAX2 says is required). Python/SAX defaults to SAX1 behavior and Saxlette defaults to SAX2 behavior.

The following is a PySax example:

from xml import sax

furi = "file:ot.xml"

#Handler has to derive from sax.ContentHandler,'
#or, in practice, implement all interfaces
class element_counter(sax.ContentHandler):
    def startDocument(self):
        self.ecount = 0

    #SAX1 startElement by default, rather than SAX2 startElementNS
    def startElement(self, name, attribs):
        self.ecount += 1

parser = sax.make_parser()
handler = element_counter()
parser.setContentHandler(handler)
parser.parse(furi)
print "Elements counted:", handler.ecount

The speed difference is huge. Jeremy did some testing with timeit.py (using more involved test code than the above), and in those limited tests Saxlette showed up as fast as, and in some cases a bit faster than cElementTree and libxml/Python (much, much faster than xml.sax in all cases). Interestingly, Domlette is now within 30%-40% of Saxlette in raw speed, which is impressive considering that it is building a fully functional DOM. As I've said in the past, I'm done with the silly benchmarks game, so someone else will have to pursue matters to further detail if they really can't do without their hot dog eating contests.

In another exciting development Saxlette has gained a generator mode using Expat's suspend/resume capability. This means you can have a Saxlette handler yield results from the SAX callbacks. It will allow me, for example, to have Amara's pushdom and pushbind work without threads, eliminating a huge drag on their performance (context switching is basically punishment). I'm working this capability into the code in the Amara 1.2 branch. So far the effects are dramatic.

[Uche Ogbuji]

via Copia

8 responses

You give out benchmark information with cElementTree and libxml2, saying the speed is now comparable, and then you say you're done with the "silly benchmarks game" and "hot dog eating contests". You can't have your cake (claim parity in benchmarks) and eat it (claim to be "above" benchmarks).

Benchmarking software can be useful. It enabled me to make lxml much faster, as I actually compared performance with ElementTree (this comparison is helped as the APIs are the same).

Regards,

Martijn

— Martijn Faassen

No Martijn, you should try reading what I actually said. The other day we were talking about nuance, but this time it isn't even a matter of nuance. You're flat out reading things in my message that aren't there. I said that Jeremy ran some tests, and I noted the performance figures he got. You should know very well that does not add up to a benchmark any more than running a program a few times to see that it does not crash does not add up to a test suite. Are you meaning to say that my brief mention does add up to "benchmark information"? Do you think such a quiddity is useful?

— Uche

Martijn,

Although I do agree the sentence "I'm done with the silly benchmarks game" has some weight considering Uche's old article on benchmarking, I find it hard to admit how people are so ready to jump on Uche as soon as he pronounces the word "benchmark".

To me, performances are important but not as important as the usefulness, ease of use and comprehensivness of a package. In this regard at first I can't say that 4Suite was a winner but neither was lxml (again to me). But thank god Amara arrived and now it is definitely the winner (even over ElementTree since it has such a limited support of XPath and other X sweet things). If Amara can be faster then great but as far as I'm concerned I don't care that it was a bit slower in the past to ElementTree or to lxml to parse into DOM a 3Mb XML file... how often do I load so big files? Hmmm almost never...

But on the other hand, how long do I need to think before getting the right way to do some xpath query with Amara? Well roughly 0s since I just have to do:

node.xml_xpath('/look_for_something')

I'm sorry but there again lxml doesn't appeal very much (http://codespeak.net/lxml/xpath.html).
/>

I don't mean to say that lxml is bad, just that maybe there are more important things than speed. You may benchmark a lot lxml, it will not make its API more user friendly to me.

I'm very happy Uche did realize that and did release Amara because it brought what the ElementTree API tried to do... make XML a shadow for Python users. And I'm looking forward Amara 1.2 very much.

Note: libxml2 has a bloated API IMO so it must be fairly hard for you and you already widely improved the way to access it from Python compare to the old libxml2 Python binding. However you won't make fix issues with libxml2 itself, such as for instance the RelaxNG package. I mean libxml2 only offers you with "valid or not against a RelaxNG schema". Well yeah it's nice but it'd be even better to tell me where it fails to be valid. 4Suite (and soon amara) does it and then make the use of RelaxNG something useful. (If libmxl2 does it too let me know and I'll be happy to retract my words)

I do realise my message might be a bit rude but I think I didn't see much use to your post either (for once since usually I appreciate your entries).

— Sylvain Hellegouarch

This is probably a fruitless debate, but..

Once upon a time in january I was happily, informally, benchmarking lxml against (c)ElementTree and reporting on it on my blog. Fredrik was also posting informal benchmarks results to his blog. It was a useful exercise for me, as I sped up lxml by quite a bit as a result with relatively limited effort.

Then an article appeared by your hand, distributed to a broad audience, saying

there were a lot of deceptive benchmarks in the

python XML world. You also said in the following discussion, and just now again, that you are pulling out of this benchmarking thing.

But you do this in the same paragraph as where you talk about comparing performance between 4Suite and various others. I'm fine with such statements, as I do so myself, and I was fine with Fredrik doing it. For you I think it's rather inconsistent though given your previous complaints about this kind of stuff going on.

It gives the impression that you're giving out the message "hey, my stuff is about as fast now on this limited benchmark" while at the same time you brook no discussion of it, as you're above such silly discussions.

Besides, it's not very nice to be described as someone who is doing the "silly benchmarks game" and as someone who "cannot do without their hot dog eating contests". I thought the name calling part of that discussion was better left buried.

Sylvain, doing an XPath query in lxml is:

node.xpath('/look_for_something')

The page on XPath extension functions you refer to is complicated, as it discusses a fairly complicated topic.

For the basic API you want to look at this:

http://codespeak.net/lxml/api.html

As to RelaxNG, the underlying libxml2 library does indeed report validation errors in more detail. I haven't had time to expose this to lxml yet, as exposing these messages is fairly complicated. I or someone else will eventually of course get around to it.

"This is probably a fruitless debate, but.."

Boy, you're not kidding.

Look, speed measurements are useful, but they are extremely context dependent. My problem with benchmarking is that if it's not very carefully constructed and deployed, you end up with specious claptrap for results. It's the oldest story in the IT industry. Vendors start with benchmarks that they cook up, flattering their product. A standard benchmark emerges, and vendors instead just cheat using benchmark-specific optimizations and the like.

As I did say before, if folks think they can come up with a standard Python/XML benchmark harness of some sort which allows users to check against their specific usage scenarios, I could possible be interested, but that's a different matter.

It's fine for you to say "I've done some testing and for what I tested lxml is 10x faster than fooxml". That is a properly qualified statement, and does not attempt to present a benchmark by any sane usage. I do have a problem with the case of first PyRXP, and then cET, where you have someone posting, without context, a set of measurements expressed in absolutes (and thus a formal benchmark), especially when you find out that the scenario being tested is not one likely to be useful to any user (unless one is meaning to heat up their CPU, why parse an XML document and then throw it away)? It just exacerbates things when it turns out that there are some truly apalling hacks employed by the tools which sacrifice good engineering to raw speed. That's what I mean by "hot dog eating contest".

But I'm done with this rot. I'll continue to make

notes in appropriate context about my observations with regard to speed. You seem to want to play "gotcha!!!! you're silly-benchmarking". I simply don't accept that. We'll have to agree to disagree.

Martijn,

I'm happy to see that I was wrong about lxml regarding Xath queries. My bad for not looking through the doc carefully enough.

Regarding RelaxNG validation error reporting, it'd be very useful to add it I must say but I can understand you can't be harshed when I see the libxml2 API:

http://xmlsoft.org/html/libxml-relaxng.html

So in the end, glad to know that it was my knowledge of lxml which was in fault and not lxml itself.

- Sylvain

My intent is not to say 'gotcha', and I would've accepted your statement on performance comparisons just fine if you hadn't tucked in the last line to that parapgraph. Anything I had to say about the whole benchmarking debacle before I said in that old discussion thread so I won't repeat that here.

Congrats on speeding up 4suite!