Amara equivalents of Mike Kay's XSLT 2.0, XQuery examples

Since seeing Mike Kay's presentation at XTech 2005 I've been meaning to write up some Amara equivalents to the examples in the paper, "Comparing XSLT and XQuery". Here they are.

This is not meant to be an advocacy piece, but rather a set of useful examples. I think the Amara examples tend to be easier to follow for typical programmers (although they also expose some things I'd like to improve), but with XSLT and XQuery you get cleaner declarative semantics, and cross-language support.

It is by no means always true that an XSLT stylesheet (whether 1.0 or 2.0) is longer than the equivalent in XQuery. Consider the simple task: create a copy of a document that is identical to the original except that all NOTE attributes are omitted. Here is an XSLT stylesheet that does the job. It's a simple variation on the standard identity template that forms part of every XSLT developer's repertoire:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@* except @NOTE"/>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

In XQuery, lacking an apply-templates instruction and built-in template rules, the recursive descent has to be programmed by hand:

declare function local:copy($node as element()) {
  element {node-name($node)} {
    (@* except @NOTE,
    for $c in child::node
    return typeswitch($c) 
      case $e as element() return local:copy($a)
      case $t as text() return $t
      case $c as comment() return $c
      case $p as processing-instruction return $p
  }
};

local:copy(/*)

Here is Amara code to do the same thing:

def ident_except_note(doc):
    for elem in doc.xml_xpath(u'//*[@NOTE]'):
        del elem.NOTE
    print doc.xml()

Later on in the paper:

...nearly every FLWOR expression has a direct equivalent in XSLT. For example, to take a query from the XMark benchmark:

for    $b in doc("auction.xml")/site/regions//item
let    $k := $b/name
order by $k
return <item name="{$k}">{ $b/location } </item>

is equivalent to the XSLT code:

<xsl:for-each select="doc('auction.xml')/site/regions//item">
  <xsl:sort select="name"/>
  <item name="{name}"
     <xsl:value-of select="location"/>
  </item>
</xsl:for-each>

In Amara:

def sort_by_name():
    doc = binderytools.bind_file('auction.xml')
    newdoc = binderytools.create_document()
    items = doc.xml_xpath(u'/site/regions//item')
    items.sort()
    for item in items:
        newdoc.xml_append(
            newdoc.xml_element(u'item', content=item)
        )
    newdoc.xml()

This is the first of a couple of examples from XMark. To understand the examples more fully you might want to browse the paper, "The XML Benchmark Project". This was the first I'd heard of XMark, and it seems a pretty useful benchmarking test case, except that it's very heavy on records-like XML (not much on prosy, narrative documents with mixed content, significant element order, and the like). As, such I think it could only ever be a sliver of one half of any comprehensive benchmarking framework.

I think the main thing this makes me wonder about Amara is whether there is any way to make the element creation API a bit simpler, but that's not a new point for me to ponder, and if I can think of anything nicer, I'll work on it post 1.0.

Kay's paper next takes on more complex example from XMark: "Q9: List the names of persons an the names of items they bought in Europe". In database terms this is a joins across person, closed_auction and item element sets. In XQuery:

for $p in doc("auction.xml")/site/people/person
let $a := 
   for $t in doc("auction.xml")/site/closed_auctions/closed_auction
   let $n := for $t2 in doc("auction.xml")/site/regions/europe/item
                       where  $t/itemref/@item = $t2/@id
                       return $t2
       where $p/@id = $t/buyer/@person
       return <item> {$n/name} </item>
return <person name="{$p/name}">{ $a }</person>

Mike Kay's XSLT 2.0 equivalent.

<xsl:for-each select="doc('auction.xml')/site/people/person">
  <xsl:variable name="p" select="."/>
  <xsl:variable name="a" as="element(item)*">
    <xsl:for-each 
        select="doc('auction.xml')/site/closed_auctions/closed_auction">
      <xsl:variable name="t" select="."/>
      <xsl:variable name="n" 
           select="doc('auction.xml')/site/regions/europe/item
                               [$t/itemref/@item = @id]"/>
      <xsl:if test="$p/@id = $t/buyer/person">
        <item><xsl:copy-of select="$n/name"/></item>
      </xsl:if>
  </xsl:variable>
  <person name="{$p/name}">
    <xsl:copy-of select="$a"/>
  </person>
</xsl:for-each>

In Amara:

def closed_auction_items_by_name():
    doc = binderytools.bind_file('auction.xml')
    newdoc = binderytools.create_document()
    #Iterate over each person
    for person in doc.xml_xpath(u'/site/people/person'):
        #Prepare the wrapper element for each person
        person_elem = newdoc.xml_element(
            u'person',
            attributes={u'name': unicode(person.name)}
        )
        newdoc.xml_append(person_elem)
        #Join to compute all the items this person bought in Europe
        items = [ unicode(item.name)
          for closed in doc.xml_xpath(u'/site/closed_auctions/closed_auction')
          for item in doc.xml_xpath(u'/site/regions/europe/item')
          if (item.id == closed.itemref.item
              and person.id == closed.buyer.person)
        ]
        #XML chunk with results of join
        for item in items:
            person_elem.xml_append(
                newdoc.xml_element(u'item', content=item)
            )
    #All done.  Print out the resulting document
    print newdoc.xml()

I think the central loop in this case is much clearer as a Python list comprehension than in either the XQuery or XSLT 2.0 case, but I think Amara suffers a bit from the less literal element creation syntax, and for the need to "cast" to Unicode. I would like to lay out cases where casts from bound XML structures to Unicode make sense, so I can get user feedback and implement accordingly. Kay's final example is as follows.

The following code, for example, replaces the text see [Kay, 93] with see Kay93.

<xsl:analyze-string select="$input" regex="\[(.*),(.*)\]">
<xsl:matching-substring>
  <citation>
    <author><xsl:value-of select="regex-group(1)"/></author>
    <year><xsl:value-of select="regex-group(2)"/></year>
  </citation>
</xsl:matching-substring>
<xsl:non-matching-substring>
  <xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>

The only way of achieving this transformation using XQuery 1.0 is to write some fairly convoluted recursive functions.

Here is the Amara version:

import re
PATTERN = re.compile(r'[(.*),(.*)]')
def repl_func(m):
    citation = doc.xml_element(u'item')
    citation.xml_append(doc.xml_element(u'author', content=m.group (1)))
    citation.xml_append(doc.xml_element(u'year', content=m.group (2)))
    return citation.xml(omitXmlDeclaration=u'yes')
text = u'see [Kay, 93]'
print PATTERN.subn(repl_func, text)

I think this is very smooth, with the only possible rough patch again being the output generation syntax.

I should mention that Amara's output syntax isn't really bad. It's just verbose because of its Python idiom. XQuery and XSLT have the advantage that you can pretty much write XML in-line into the code (the templating approach), whereas Python's syntax doesn't allow for this. There has been a lot of discussion of more literal XML template syntax for Python and other languages, but I tend to think it's not worth it, even considering that it would simplify the XML generation syntax of tools such as Amara. Maybe it would be nice to have a lightweight templating system that allows you to write XSLT template chunks in-line with Amara code for data processing, but then, as with most such templating systems, you run into issues of poor model/presentation separation. Clearly this is a matter for much more pondering.

[Uche Ogbuji]

via Copia
3 responses
Hi Uche,



This is nice but I really wonder what it will lead to.



As far as I'm concerned, the X technology has the HUGE advantage of being not only platform independant but language independant. My problem with starting doing stuff in Python instead of keeping XSLT/XPATH 2 is that it breaks the language independancy.



But on the other hand, when I look at some new functionnalities of XSLT/XPath 2, I am afraid since they make those languages too close to a real programming language. And developers might be tempted to do all their logic inside XSLT. Which as you says : "run into issues of poor model/presentation separation"



I don't know what to think of those evolution to be honest.



- Sylvain
Cool uche,



You've misspelled "items" in def sort_by_name():

should be:

...

  for item in items:

  newdoc.xml_append(

  newdoc.xml_element(u'item', content=item)

...



and last Amara code is bad parsed by pyblosxom





I think we need a simpler API in Amara too.
Sylvain,



You are spot on in your assessment.  This is precisely the tightrope we're all trying to walk here.



Luis,



Made the corrections you mention, thanks.  A better creation API is easier said than done (for example, none of the Python tool APIs come close to the clarity of the XQuery or XSLT, but again without template-like syntax in Python, this is hard to avoid.



If you have any ideas, do pipe up.