Solution: simple XML output "templates" for Amara

A few months ago in "Sane template-like output for Amara" I discussed ideas for making the Amara output API a little bit more competitive with full-blown templating systems such as XSLT, without adopting all the madness of template frameworks.

I just checked in the simplest patch that does the trick. Here is an example from the previous article:

Amara 1.0 code:

person_elem = newdoc.xml_element(
        u'person',
        attributes={u'name': unicode(person.name)}
    )
newdoc.xml_append(person_elem)

Proposed Amara 1.2 code:

newdoc.xml_append_template("<person name='{person.name}'/>")

What I actually checked into CVS today for Amara 1.2:

newdoc.xml_append_fragment("<person name='%s'/>"%person.name)

That has the advantage of leaning as much as possible on an existing Python concept (formatted strings). As the method name indicates, this is conceptually no longer a template, but rather a fragment of XML in text form. The magic for Amara is in allowing one to dynamically create XML objects from such fragments. I think this is a unique capability (shared with 4Suite's MarkupWriter) for Python XML output APIs (I have no doubt you'll let me know if I'm wrong).

Also, I think the approach I settled on is best in light of the three "things to ponder" from the older article.

  • Security. Again I'm leaning on a well-known facility of Python, and not introducing any new holes. The original proposal would have opened up possible issues with tainted strings in the template expressions.
  • String or Unicode? I went with strings for the fragments. It's up to the developer to make sure that however he constructs the XML fragment, the result is a plain string and not a Unicode object.
  • separation of model and presentation. There is a very clear separation between Python operations to build a string XML fragment (these are usually the data model objects), and any transforms applied to the resulting XML binding objects (this is usually the separate presentation side). Sure a determined developer can write spaghetti, but I think that with xml_append_fragment it's possible and natural to have a clean separation. With most template systems, this is very hard to achieve.

One other thing to mention is that the dynamic incorporation of the new fragment into the XML binding makes this a potential building block for pipelined processing architecture.

def process_link(body, href, content):
    body.xml_append_fragment('%s'%(href, content))
    #Send the "a" element object that was just appended to
    #the next pipeline stage
    check_unique(body.a[-1])
    return

def check_unique(a_node):
    if not a_node.href in g_link_dict:
        #index the href to the link text (a element text content)
        g_link_dict[a_node.href] = unicode(a_node)
    return

[Uche Ogbuji]

via Copia
5 responses
>String or Unicode? I went with strings for the fragments.



A strange choice for someone who recommends unicode for all XML related Python work (and maybe more). No offense, but where is the advantage of strings here?

 

Maybe I am getting it wrong but is it not the best to do only I/O with strings (e.g. encoded in latin1 or whatever) and work with unicode inside the program. At least on the tiny and maybe also not very high-quality scripts I wrote over the last months this has always been the easiest.



For example even if I simply wanted to use my own name which is normally written with an umlaut (ö) I could not use



  newdoc.xml_append_fragment("<person name='%s'/>" % "höke")



or could I? Is it just the "template" part which should be a string? Does Amara does the magic behind the scenes? Does it actually also escape strings like "&" (single ampersand) and the like?

The % syntax looks very familiar but at the same time maybe too familiar and does it not make people forget what is actually happening? I would assume



  newdoc.xml_append_fragment("<person name='höke'/>")



should be the same but would not ?



I hope I did understand the stuff right at all as I have not tried Amara yet but am following your optimization efforts for some time.



And generally I think this is a nice shortcut, I actually saw a colleaque of mine using a similar thing (parsing XML bits) for an admittably quite rough script (and by the way in Java where even the Amara more verbose syntax would be much much more verbose ;)



sorry if I did get it all wrong

 

chris
I understand it's a bit confusing, but I'll try to explain the situation as well as I can.  Please bear with me.



Actually, my position is very consistent with regard to string versus Unicode in XML processing.  I insist on Unicode for internal processing of markup, but strings for serialized XML, whether input (parsing) or output (serialization).  This is pretty much required by the character of XML 1.0.  XML only defines entities in encoded form.  There is no such thing as an abstract Unicode document entity.  This is so for the document entity (main XML document) as well as external parsed entities.  The XML fragment you pass in to xml_append_fragment() is really just an EPE.



Think of it as a magic Unicode firewall.  One one side of the firewall is serialized XML, which is always in encoded strings.  On the inside of the wall are markup objects, which are always Unicode.  You seem to have this wall in the wrong place.  It's not whether or not you're using I/O that is the deciding factor.  It's whether or not you're dealing with serialized XML.



I mandate string for xml_append_fragment() because it would have been fundamentally incorrect to accept Unicode in this case.



For your first markup example:



newdoc.xml_append_fragment("<person name='%s'/>" % "höke")



Yes, this is fine, as long as that second string is defined in UTF-8.  If not, you have to do:



newdoc.xml_append_fragment("<person name='%s'/>" % "höke", encoding)



And yes,



  newdoc.xml_append_fragment("<person name='höke'/>")



Is the same, as long as the encoding is the same.



> Does [Amara] actually also escape strings like "&" (single ampersand) and the like?



The fragment is a simple XML EPE.  It must be well-formed.  e.g.



newdoc.xml_append_fragment("<person name='ben&amp;jerry'/>")
thanks Uche, I think I get it. Seeing xml_append_fragment() as actually being another form  of IO with serialized XML surely does make sense. (Maybe dumb, but what does EPE actually mean?)



Amara feels nice on first steps by the way. I just seem to need an updated 4suite as I got an "ImportError: cannot import name ParseFragment" trying to use xml_append_fragment...





BTW, I got an error during setup "ImportError: No module named amara" so I simply renamed lib to amara. Seems the kw['package_dir'] = {'amara': 'lib'} in setup.py does not seem to work (on WinXP, Python 2.4.1), which I find rather odd.



thanks

chris
Christof,



EPE means external parsed entity



I confess that in my recent examples I've been using latest CVS of both 4Suite and Amara.  I'm working today on packaging something up so that people can easily give that combo a test run and help shake out issues for the full releases (coming next week at the latest).



I'm not sure what's happening with your lib = amara issue.  I'll see if I can reproduce it.



Thanks.
Christof,



EPE means external parsed entity



I confess that in my recent examples I've been using latest CVS of both 4Suite and Amara.  I'm working today on packaging something up so that people can easily give that combo a test run and help shake out issues for the full releases (coming next week at the latest).



I'm not sure what's happening with your lib = amara issue.  I'll see if I can reproduce it.



Thanks.