In an earlier entry I showed some Amara equivalents for XSLT 2 and XQuery examples. I think the main disadvantage of Amara in these cases was the somewhat clumsy XML output generation syntax. This is not an easy problem to fix. XSLT and XQuery basically work XML syntax directly into the language, to make output specification very seamless. This makes sense as long as they stick to the task of being very determinate black boxes taking one body of XML data and working it into another. But often you turn to a language like Python for XML processing because you want to blow the lid off the determinate black boxes a bit: you want to take up all the power of general-purpose computing.
With this power comes the need to streamline and modularize, and the usual first principle for such streamlining is the principle of separating the model from presentation. This is a much easier principle to state than to observe in real-life processing scenarios. We love template languages for XML and HTML generation because they are so convenient in solving real problems in the here and now. We look askance at them, however, because we know that they come with a tendency to mix model and presentation, and that we might regret the solution once it comes time to maintain it when (as inevitable) model processing requirements change or presentation requirements change.
Well, that was a longer preamble than I'd originally had in mind, but it's all boils down to my basic problem: how do I make Amara's output mechanism more readable without falling into the many pitfalls of template systems?
Here is one of the XSLT 2 examples:
<xsl:for-each select="doc('auction.xml')/site/people/person">
<xsl:variable name="p" select="."/>
<xsl:variable name="a" as="element(item)*">
<xsl:for-each select="doc('auction.xml')/site/closed_auctions/closed_auction">
<xsl:variable name="t" select="."/>
<xsl:variable name="n"
select="doc('auction.xml')/site/regions/europe/item
[$t/itemref/@item = @id]"/>
<xsl:if test="$p/@id = $t/buyer/person">
<item><xsl:copy-of select="$n/name"/></item>
</xsl:if>
</xsl:variable>
<person name="{$p/name}">
<xsl:copy-of select="$a"/>
</person>
</xsl:for-each>
In Amara 1.0b3 it goes something like:
def closed_auction_items_by_name():
doc = binderytools.bind_file('auction.xml')
newdoc = binderytools.create_document()
#Iterate over each person
for person in doc.xml_xpath(u'/site/people/person'):
#Prepare the wrapper element for each person
person_elem = newdoc.xml_element(
u'person',
attributes={u'name': unicode(person.name)}
)
newdoc.xml_append(person_elem)
#Join to compute all the items this person bought in Europe
items = [ unicode(item.name)
for closed in doc.xml_xpath (u'/site/closed_auctions/closed_auction')
for item in doc.xml_xpath(u'/site/regions/europe/item')
if (item.id == closed.itemref.item
and person.id == closed.buyer.person)
]
#XML chunk with results of join
for item in items:
person_elem.xml_append(
newdoc.xml_element(u'item', content=item)
)
#All done. Print out the resulting document
print newdoc.xml()
The following snippet is a good example:
person_elem = newdoc.xml_element(
u'person',
attributes={u'name': unicode(person.name)}
)
newdoc.xml_append(person_elem)
If I could turn all this into:
newdoc.xml_append_template("<person name='{person.name}'/>")
This would certainly be a huge win for readability. The curly brackets are borrowed from XSLT attribute value templates (AVTs), except that their contents are a Python expression rather than an XPath. The person element created is empty for now, but it becomes just part of the data binding and you can access it using the expected newdoc.person
or newdoc.person.name
.
One important note: this is very different from `"<person name='% s'/>"%(person.name)`. What I have in mind is a structured template that must be well-formed (it can have multiple root elements). The replacement occurs within the perfectly well-formed XML structure of the template. As with XSLT AVTs you can represent a literal curly bracket as {{
or }}
.
The other output generation part in the example:
for item in items:
person_elem.xml_append(
newdoc.xml_element(u'item', content=item)
)
Would become
for item in items:
newdoc.person.xml_append_template("<item>{item}</item>")
This time we have the template substitution going on in the content rather than an attribute. Again I would want to restrict this entire idea to a very clean and layered template with proper XML semantics. There would be no tricks such as "<{element_name}>spam</{element_name}>"
. If you wanted that sort of thing you could use the existing API such as xml_element(element_name), or even use Python string operations directly.
The complete example using such a templating system would be:
def closed_auction_items_by_name():
doc = binderytools.bind_file('auction.xml')
newdoc = binderytools.create_document()
#Iterate over each person
for person in doc.xml_xpath(u'/site/people/person'):
#Prepare the wrapper element for each person
newdoc.xml_append_template("<person name='{person.name}'/>")
#Join to compute all the items this person bought in Europe
items = [ unicode(item.name)
for closed in doc.xml_xpath (u'/site/closed_auctions/closed_auction')
for item in doc.xml_xpath(u'/site/regions/europe/item')
if (item.id == closed.itemref.item
and person.id == closed.buyer.person)
]
#XML chunk with results of join
for item in items:
newdoc.person.xml_append_template("<item>{item}</item>")
#All done. Print out the resulting document
print newdoc.xml()
I think that this example is indeed more readable than the XSLT version.
One tempting thing about this idea is that all the building blocks are there. 4Suite already gives me the ability to parse and process this template very easily, and I could implement this logic without much trouble. But I also think that it deserves some serious thought (and, I hope, feedback from users). There's no hurry: I don't plan to add this capability in the Amara 1.0 cycle. I need to get Amara 1.0 out, and I'm expecting that 1.0b3 is the last stop before a release candidate.
So, things to ponder.
Security. Any time you support such arbitrary-code-in-template features the tainted string worry comes up: what happens if one is not careful with the expression that is used within a template? I think that this issue is not really Amara's responsibility. The developer using Amara should no more pass in untrusted Python expressions to a template than they would to an exec
statement. They should be aware that Amara templates will execute arbitrary Python expressions, if they're passed in, and they should apply the usual precautions against tainting.
String or Unicode? Should the templates be specified as strings or Unicode? They are themselves well-formed XML, which makes me think they should be strings (XML is really defined in terms of encoded serialization, and the Unicode backbone is just an abstraction imposed on the actual encoded byte stream). But is this confusing to users? I've always preached that XML APIs should use Unicode, and my products reflect that, and for a user that doesn't understand the nuances, this could seem like a confusing exception. Then again, we already have this exception for 4Suite and Amara APIs that parse XML from strings. My leaning would be to have the template expressed as a string, but to have the results of expressions within templates coerced to Unicode. This is the right thing to do, and that's the strongest argument.
separation of model and presentation. The age-old question with such templates is whether they cause tangles that complicate maintenance. I think one can often make an empirical check for such problems by imagining what happens in a scenario where the data model operations need to change, and another scenario where the presentation needs to change.
As an example of a model change, imagine that the source for the item info was moved from an XML document to a database. I wouldn't need to change any of the templates as long as I could get the same values to pass in, and I think it's reasonable to assume I could do this. Basically, since my templates simply refer to host variables whose computation is nicely decoupled from the template code, the system passes the first test.
As an example of a presentation change, imagine that I now want to generate XHTML directly, rather than this <person><item>...
business. I think the system passes this test as well. The templates themselves would have to change, but this change would be isolated from the computation of the host variables used by the templates. Some people might argue that I'm grading these tests too leniently, and that it's already problematic that the computation and presentation occurs so close together, in the same function in the same code file. I'm open to being convinced this is the case, but I'd want to hear of practical maintenance scenarios where this would be a definite problem.
So what do you think?
[Uche Ogbuji]