Since seeing Mike Kay's presentation at XTech 2005 I've been meaning to write up some Amara equivalents to the examples in the paper, "Comparing XSLT and XQuery". Here they are.
This is not meant to be an advocacy piece, but rather a set of useful examples. I think the Amara examples tend to be easier to follow for typical programmers (although they also expose some things I'd like to improve), but with XSLT and XQuery you get cleaner declarative semantics, and cross-language support.
It is by no means always true that an XSLT stylesheet (whether 1.0 or 2.0) is longer than the equivalent in XQuery. Consider the simple task: create a copy of a document that is identical to the original except that all NOTE attributes are omitted. Here is an XSLT stylesheet that does the job. It's a simple variation on the standard identity template that forms part of every XSLT developer's repertoire:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="*"> <xsl:copy> <xsl:copy-of select="@* except @NOTE"/> <xsl:apply-templates/> </xsl:copy> </xsl:template> </xsl:stylesheet>
In XQuery, lacking an apply-templates instruction and built-in template rules, the recursive descent has to be programmed by hand:
declare function local:copy($node as element()) { element {node-name($node)} { (@* except @NOTE, for $c in child::node return typeswitch($c) case $e as element() return local:copy($a) case $t as text() return $t case $c as comment() return $c case $p as processing-instruction return $p } }; local:copy(/*)
Here is Amara code to do the same thing:
def ident_except_note(doc): for elem in doc.xml_xpath(u'//*[@NOTE]'): del elem.NOTE print doc.xml()
Later on in the paper:
...nearly every FLWOR expression has a direct equivalent in XSLT. For example, to take a query from the XMark benchmark:
for $b in doc("auction.xml")/site/regions//item let $k := $b/name order by $k return <item name="{$k}">{ $b/location } </item>
is equivalent to the XSLT code:
<xsl:for-each select="doc('auction.xml')/site/regions//item"> <xsl:sort select="name"/> <item name="{name}" <xsl:value-of select="location"/> </item> </xsl:for-each>
In Amara:
def sort_by_name(): doc = binderytools.bind_file('auction.xml') newdoc = binderytools.create_document() items = doc.xml_xpath(u'/site/regions//item') items.sort() for item in items: newdoc.xml_append( newdoc.xml_element(u'item', content=item) ) newdoc.xml()
This is the first of a couple of examples from XMark. To understand the examples more fully you might want to browse the paper, "The XML Benchmark Project". This was the first I'd heard of XMark, and it seems a pretty useful benchmarking test case, except that it's very heavy on records-like XML (not much on prosy, narrative documents with mixed content, significant element order, and the like). As, such I think it could only ever be a sliver of one half of any comprehensive benchmarking framework.
I think the main thing this makes me wonder about Amara is whether there is any way to make the element creation API a bit simpler, but that's not a new point for me to ponder, and if I can think of anything nicer, I'll work on it post 1.0.
Kay's paper next takes on more complex example from XMark: "Q9: List the names of persons an the names of items they bought in Europe". In database terms this is a joins across person, closed_auction and item element sets. In XQuery:
for $p in doc("auction.xml")/site/people/person let $a := for $t in doc("auction.xml")/site/closed_auctions/closed_auction let $n := for $t2 in doc("auction.xml")/site/regions/europe/item where $t/itemref/@item = $t2/@id return $t2 where $p/@id = $t/buyer/@person return <item> {$n/name} </item> return <person name="{$p/name}">{ $a }</person>
Mike Kay's XSLT 2.0 equivalent.
<xsl:for-each select="doc('auction.xml')/site/people/person"> <xsl:variable name="p" select="."/> <xsl:variable name="a" as="element(item)*"> <xsl:for-each select="doc('auction.xml')/site/closed_auctions/closed_auction"> <xsl:variable name="t" select="."/> <xsl:variable name="n" select="doc('auction.xml')/site/regions/europe/item [$t/itemref/@item = @id]"/> <xsl:if test="$p/@id = $t/buyer/person"> <item><xsl:copy-of select="$n/name"/></item> </xsl:if> </xsl:variable> <person name="{$p/name}"> <xsl:copy-of select="$a"/> </person> </xsl:for-each>
In Amara:
def closed_auction_items_by_name(): doc = binderytools.bind_file('auction.xml') newdoc = binderytools.create_document() #Iterate over each person for person in doc.xml_xpath(u'/site/people/person'): #Prepare the wrapper element for each person person_elem = newdoc.xml_element( u'person', attributes={u'name': unicode(person.name)} ) newdoc.xml_append(person_elem) #Join to compute all the items this person bought in Europe items = [ unicode(item.name) for closed in doc.xml_xpath(u'/site/closed_auctions/closed_auction') for item in doc.xml_xpath(u'/site/regions/europe/item') if (item.id == closed.itemref.item and person.id == closed.buyer.person) ] #XML chunk with results of join for item in items: person_elem.xml_append( newdoc.xml_element(u'item', content=item) ) #All done. Print out the resulting document print newdoc.xml()
I think the central loop in this case is much clearer as a Python list comprehension than in either the XQuery or XSLT 2.0 case, but I think Amara suffers a bit from the less literal element creation syntax, and for the need to "cast" to Unicode. I would like to lay out cases where casts from bound XML structures to Unicode make sense, so I can get user feedback and implement accordingly. Kay's final example is as follows.
The following code, for example, replaces the text see [Kay, 93] with see Kay93.
<xsl:analyze-string select="$input" regex="\[(.*),(.*)\]"> <xsl:matching-substring> <citation> <author><xsl:value-of select="regex-group(1)"/></author> <year><xsl:value-of select="regex-group(2)"/></year> </citation> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string>
The only way of achieving this transformation using XQuery 1.0 is to write some fairly convoluted recursive functions.
Here is the Amara version:
import re PATTERN = re.compile(r'[(.*),(.*)]') def repl_func(m): citation = doc.xml_element(u'item') citation.xml_append(doc.xml_element(u'author', content=m.group (1))) citation.xml_append(doc.xml_element(u'year', content=m.group (2))) return citation.xml(omitXmlDeclaration=u'yes') text = u'see [Kay, 93]' print PATTERN.subn(repl_func, text)
I think this is very smooth, with the only possible rough patch again being the output generation syntax.
I should mention that Amara's output syntax isn't really bad. It's just verbose because of its Python idiom. XQuery and XSLT have the advantage that you can pretty much write XML in-line into the code (the templating approach), whereas Python's syntax doesn't allow for this. There has been a lot of discussion of more literal XML template syntax for Python and other languages, but I tend to think it's not worth it, even considering that it would simplify the XML generation syntax of tools such as Amara. Maybe it would be nice to have a lightweight templating system that allows you to write XSLT template chunks in-line with Amara code for data processing, but then, as with most such templating systems, you run into issues of poor model/presentation separation. Clearly this is a matter for much more pondering.