Sane template-like output for Amara

In an earlier entry I showed some Amara equivalents for XSLT 2 and XQuery examples. I think the main disadvantage of Amara in these cases was the somewhat clumsy XML output generation syntax. This is not an easy problem to fix. XSLT and XQuery basically work XML syntax directly into the language, to make output specification very seamless. This makes sense as long as they stick to the task of being very determinate black boxes taking one body of XML data and working it into another. But often you turn to a language like Python for XML processing because you want to blow the lid off the determinate black boxes a bit: you want to take up all the power of general-purpose computing.

With this power comes the need to streamline and modularize, and the usual first principle for such streamlining is the principle of separating the model from presentation. This is a much easier principle to state than to observe in real-life processing scenarios. We love template languages for XML and HTML generation because they are so convenient in solving real problems in the here and now. We look askance at them, however, because we know that they come with a tendency to mix model and presentation, and that we might regret the solution once it comes time to maintain it when (as inevitable) model processing requirements change or presentation requirements change.

Well, that was a longer preamble than I'd originally had in mind, but it's all boils down to my basic problem: how do I make Amara's output mechanism more readable without falling into the many pitfalls of template systems?

Here is one of the XSLT 2 examples:

<xsl:for-each select="doc('auction.xml')/site/people/person">
  <xsl:variable name="p" select="."/>
  <xsl:variable name="a" as="element(item)*">
    <xsl:for-each select="doc('auction.xml')/site/closed_auctions/closed_auction">
      <xsl:variable name="t" select="."/>
      <xsl:variable name="n" 
           select="doc('auction.xml')/site/regions/europe/item
                               [$t/itemref/@item = @id]"/>
      <xsl:if test="$p/@id = $t/buyer/person">
        <item><xsl:copy-of select="$n/name"/></item>
      </xsl:if>
  </xsl:variable>
  <person name="{$p/name}">
    <xsl:copy-of select="$a"/>
  </person>
</xsl:for-each>

In Amara 1.0b3 it goes something like:

def closed_auction_items_by_name():
    doc = binderytools.bind_file('auction.xml')
    newdoc = binderytools.create_document()
    #Iterate over each person
    for person in doc.xml_xpath(u'/site/people/person'):
        #Prepare the wrapper element for each person
        person_elem = newdoc.xml_element(
            u'person',
            attributes={u'name': unicode(person.name)}
        )
        newdoc.xml_append(person_elem)
        #Join to compute all the items this person bought in Europe
        items = [ unicode(item.name)
          for closed in doc.xml_xpath (u'/site/closed_auctions/closed_auction')
          for item in doc.xml_xpath(u'/site/regions/europe/item')
          if (item.id == closed.itemref.item
              and person.id == closed.buyer.person)
        ]
        #XML chunk with results of join
        for item in items:
            person_elem.xml_append(
                newdoc.xml_element(u'item', content=item)
            )
    #All done.  Print out the resulting document
    print newdoc.xml()

The following snippet is a good example:

person_elem = newdoc.xml_element(
            u'person',
            attributes={u'name': unicode(person.name)}
        )
        newdoc.xml_append(person_elem)

If I could turn all this into:

newdoc.xml_append_template("<person name='{person.name}'/>")

This would certainly be a huge win for readability. The curly brackets are borrowed from XSLT attribute value templates (AVTs), except that their contents are a Python expression rather than an XPath. The person element created is empty for now, but it becomes just part of the data binding and you can access it using the expected newdoc.person or newdoc.person.name.

One important note: this is very different from `"<person name='% s'/>"%(person.name)`. What I have in mind is a structured template that must be well-formed (it can have multiple root elements). The replacement occurs within the perfectly well-formed XML structure of the template. As with XSLT AVTs you can represent a literal curly bracket as {{ or }}.

The other output generation part in the example:

for item in items:
            person_elem.xml_append(
                newdoc.xml_element(u'item', content=item)
            )

Would become

for item in items:
            newdoc.person.xml_append_template("<item>{item}</item>")

This time we have the template substitution going on in the content rather than an attribute. Again I would want to restrict this entire idea to a very clean and layered template with proper XML semantics. There would be no tricks such as "<{element_name}>spam</{element_name}>". If you wanted that sort of thing you could use the existing API such as xml_element(element_name), or even use Python string operations directly.

The complete example using such a templating system would be:

def closed_auction_items_by_name():
    doc = binderytools.bind_file('auction.xml')
    newdoc = binderytools.create_document()
    #Iterate over each person
    for person in doc.xml_xpath(u'/site/people/person'):
        #Prepare the wrapper element for each person
        newdoc.xml_append_template("<person name='{person.name}'/>")
        #Join to compute all the items this person bought in Europe
        items = [ unicode(item.name)
          for closed in doc.xml_xpath (u'/site/closed_auctions/closed_auction')
          for item in doc.xml_xpath(u'/site/regions/europe/item')
          if (item.id == closed.itemref.item
              and person.id == closed.buyer.person)
        ]
        #XML chunk with results of join
        for item in items:
            newdoc.person.xml_append_template("<item>{item}</item>")
    #All done.  Print out the resulting document
    print newdoc.xml()

I think that this example is indeed more readable than the XSLT version.

One tempting thing about this idea is that all the building blocks are there. 4Suite already gives me the ability to parse and process this template very easily, and I could implement this logic without much trouble. But I also think that it deserves some serious thought (and, I hope, feedback from users). There's no hurry: I don't plan to add this capability in the Amara 1.0 cycle. I need to get Amara 1.0 out, and I'm expecting that 1.0b3 is the last stop before a release candidate.

So, things to ponder.

Security. Any time you support such arbitrary-code-in-template features the tainted string worry comes up: what happens if one is not careful with the expression that is used within a template? I think that this issue is not really Amara's responsibility. The developer using Amara should no more pass in untrusted Python expressions to a template than they would to an exec statement. They should be aware that Amara templates will execute arbitrary Python expressions, if they're passed in, and they should apply the usual precautions against tainting.

String or Unicode? Should the templates be specified as strings or Unicode? They are themselves well-formed XML, which makes me think they should be strings (XML is really defined in terms of encoded serialization, and the Unicode backbone is just an abstraction imposed on the actual encoded byte stream). But is this confusing to users? I've always preached that XML APIs should use Unicode, and my products reflect that, and for a user that doesn't understand the nuances, this could seem like a confusing exception. Then again, we already have this exception for 4Suite and Amara APIs that parse XML from strings. My leaning would be to have the template expressed as a string, but to have the results of expressions within templates coerced to Unicode. This is the right thing to do, and that's the strongest argument.

separation of model and presentation. The age-old question with such templates is whether they cause tangles that complicate maintenance. I think one can often make an empirical check for such problems by imagining what happens in a scenario where the data model operations need to change, and another scenario where the presentation needs to change.

As an example of a model change, imagine that the source for the item info was moved from an XML document to a database. I wouldn't need to change any of the templates as long as I could get the same values to pass in, and I think it's reasonable to assume I could do this. Basically, since my templates simply refer to host variables whose computation is nicely decoupled from the template code, the system passes the first test.

As an example of a presentation change, imagine that I now want to generate XHTML directly, rather than this <person><item>... business. I think the system passes this test as well. The templates themselves would have to change, but this change would be isolated from the computation of the host variables used by the templates. Some people might argue that I'm grading these tests too leniently, and that it's already problematic that the computation and presentation occurs so close together, in the same function in the same code file. I'm open to being convinced this is the case, but I'd want to hear of practical maintenance scenarios where this would be a definite problem.

So what do you think?

[Uche Ogbuji]

via Copia

4 responses

Hi Uche,

This looks great but I am a little confused about the all thing.

I feel like you are on about to re-write XSLT using Python only and I wonder why.

I mean for instance, one of the main reason I'm using XSLT in the first place is that whatever programming language I am using, I don't need to learn a new templating language over and over again. I can simply extend my knowledge of only one : XSLT, and then become better in that specific one.

It also really helps me making a difference between the presentation and the logic upon data since my the logic resides within the programming language itself, not the templating language.

Therefore, although your code looks great, I don't feel confident using it since it would go against what I just said above.

Obeviously it's a matter of taste :)

- Sylvain

— Sylvain

With Amara is it possible to pass a nodeset as a parameter to the stylesheet, meaning as a global parameter. Mainly just asking as I want to move an application over to Amara for demo purposes but that will be absolutely necessary for the demo to work.

— bryan

oops, posting error, anyway this is always assuming of course that one can use Amara to build up xml and then pass it directly to an xslt rather than using the quasi xslt structures Amara provides.

or would it make more sense to try to go from Amara to Ft.Xml.Xslt.Processor? are there any scenarios where the tools work together?

bryan,

I think the idea would indeed be to use Amara to build an XML and then pass the XML to an XSLT processor. I agree with Sylvain that Amara should not be used as a replacement for XSLT, but rather as a complement in the processing chain. Two things that you might find helpful:

1) I have prepared a test package which incoprporates Amara with all its prerequisites from 4Suite, for easier install. It's at:

ftp://ftp.4suite.org/pub/Amara/Amara-allinone-2...

Ft.Xml.Xslt is included in that, so you can basically just grab and install this package and be set for what you want to do. If you do try this, please send me any feedback. I'm planning to release Amara 1.0, maybe today, and it would include this convenience package.

2) Sylvain's Picket is a great way to simplify the XSLT processing API within the CherryPy XSLT Web server:

http://www.cherrypy.org/wiki/Picket

I've used it myself to whip up very quick XML -> XSLT dynamic Web sites.

HTH

— Uche