XTech: Mike Kay on XQuery and XSLT 2.0

As I mentioned in my more complete report Mike Kay's presentation was worth a further entry (besides, my note-taking discipline went to hell right after his talk, so I don't have as much to work with on the rest).

The title was Comparing XSLT and XQuery. Much of what Mike discussed applies to XSLT 1.0 as well as 2.0. He did spend some time talking about the role of XPath 2.0 as the basis of both XSLT 2.0 and XQuery. As he puts it XSLT 2.0 is a 2-language system. You call XPath from specific constructs within XSLT. XQuery on the other hand has XPath incorporated into basic language. The way I think of it, XSLT is a host language for XPath, while XQuery is a (greatly) extended version of XPath.

I think the most important contribution Mike made in this paper was a very sober appraisal of the barriers to learning XSLT and XQuery. The difficulties developers have with XSLT are well known: we've had some 6 years to discuss them them. Mike summarizes them as follows:

  1. XML fundamentals: encoding, entities, white space, namespaces, etc.
  2. Declarative programming: variables, recursion, paths, grouping
  3. Data model: the mental shift from the angle brackets they see to the abstraction of nodes
    • confusion between what devs see in the XML versus what their program sees
    • confusion over proper output, e.g. subtlety that creating an element in the output tree is not the same thing as creating text containing angle brackets
  4. Rule-based programming:
    • template dispatch, which forces a non-linear way of thinking about transforms. Mike Kay mentioned the parallels with GUI programming. (I tend to think this common comparison is generally right, but is just stretched enough to be unhelpful in determining how to get developers in the right mind-set).

Mike Kay had Ken Holman in the audience so he did the sensible thing in asking the foremost expert on XML-related training. Ken agreed: "Yep. That hits the high points of the first day of getting people to know what's going on in XSLT."

In my opinion, there is one more category of difficulty, which is capability limitations in XSLT 1.0 (most of which are addressed in EXSLT or XSLT 2.0). This includes frustrations such as the result tree fragment/node set split, the poor facilities for string manipulation, node set operations, date/time processing, etc.

Mike feels that XQuery only eliminates the 4th barrier (it has no templates). Reading between the lines, this is a powerful indictment of the idea of a separate XQuery. I think it's hard to argue that we need such a complex separate language purely from the pedagogical viewpoint (no, I'm not saying "andragogical").

Mike pointed out that people coming to XQuery from SQL tend to write everything in FLWOR expressions (rather than, say XPath with predicates). FLOWR is comfy and SQLey, but this just annoys me. I've pointed out in my bemoaning of SPARQL how unfortunate I think it is that SQL people insist on turning all other languages into some nasty mutation of SQL. I was suitably entertained by seeing Mike demonstrate how easy it is to get caught up in the subtle differences between SQL and FLOWR. Again I'm reading between the lines, but I got the sense that Mike was himself not unamused by the task of pointing out such trip-wires.

Mike finished up with a benchmark which he prefaced with an armload of caveats (a healthy practice, as I've learned from experiences in benchmarking). Saxon running XSLT trounced all processors except for MSXML on a certain task involving the XSLT analogue of a relational join, and with document sizes of 1MB, 4MB and 10MB. In a surprise result, Saxon running XSLT even beat Saxon running XQuery (As Mike said, "in the XQuery world implementors look to optimize joins"). All the XSLT processors suffered N^2 performance degradation with doc size. But strangely enough some of the XQuery tools did as well, including Galax. Qizx did show linear characteristics.

Kay then proved that there is no reason one cannot optimize joins for XSLT by writing a join optimizer for Saxon/XSLT. When he updated the benchmark result slide to show the fruit of this join optimization, we were all astonished to see how thoroughly Saxon ended up trouncing everything in the field at all three doc sizes. Now that, my friends, is the work of a superstar developer.

I'll be tinkering with how Amara handles some of Mike's XSLT and XQuery examples in a coming entry.

See also:

[Uche Ogbuji]

via Copia
2 responses
"Andragogy" being leading men around by the handle, I suppose?
It's one of those ghastly neologisms that smell like dead fish hauled up from the muck under the Pierian stream.  Long time ago, I think on alt.usage.english, there was a debate about the word "pedagogy" where folks were claiming that the Greek root "paida" meant the word was unfit for modern day use regarding adult education.  Seemed silly to me, but as I recall there was significant support behind the  "andragogy" term, which might make perfect sense in a parallel universe of Greek etymology, but is not really necessary in ours.