XSLT 2.0 might be worth a second look, if...

XSLT 2.0 Is Way Cool, by Micah Dubinko

Micah. Kimber. Pawson. A handful of the folks who have, like me, turned up their nose at XSLT 2.0, are starting to reconsider. This is not a massive drugging campaign by XSLT 2.0 boosters: it seems all these folks still don't want anything to do with the oppressive type system of XPath and XSLT 2.0, and all balk at the stupendous complexity of the specifications. The key to me is that they see these specs as usable without choking on the types mess. Some folks were claiming this was possible 2 years ago or so, but when I checked, I wasn't convinced. Perhaps things have improved since then.

So I may be up for reconsidering my shunning of XSLT 2.0, but as Micah mentions, I'm not about to wade into 9 documents to work on implementation. (OK, so it would really be 4 or so, but those are 4 huge documents, compared to the 1.0 series, which was 2 modestly sized documents). If someone comes up with a coherent spec that omits the type info, it could somehow make its way into the 4Suite post 1.0.

Micah says, "XSLT 2.0 is a power tool. I don't think it will displace XSLT 1.0, which is remarkable for its power in a small package." For a while I've wanted to write a series of comparisons between XSLT 2.0 and Amara code (which includes XPath 1.0 support). Amara is my power tool, for when XSLT 1.0 + EXSLT is not enough, and I find it hard to imagine XSLT 2.0 as offering more power.

And I really need to get back to work on EXSLT. Folks are getting very restless with the fact that work on EXSLT has been fallow for most of 2005. I just wish I could count on some help. Part of what impedes me is a shrinking back from all the demands of the EXSLT community without many offers of help.

[Uche Ogbuji]

via Copia

A couple of Amara/CherryPy Demos

As I've mentioned, I've been playing with Amara/4Suite and CherryPy. Luis Miguel Morillas has been as well. We're both taking things slowly, pursuing it from different angles.

Luis has a "Web-based docbook browser and processor using CherryPy and Amara.". It's a very simple script for rendering as Web content an index and chapters of Mark Pilgrim's Dive into Python book as XML and XML+CSS (which seems to be creeping into the mainstream?).

I also have a demo as part of Amara, cherrypy-xml- inspector.py, which allows you to "inspect" an XML document, through a Web form using CherryPy and Amara. You can load any document off the Web and then enter in an amara expression, such as "doc.html.head.title" and get the result.

[Uche Ogbuji]

via Copia

TreeBind, and incidentally Nux

In my previous entry I said "When you compare the weary nature of, say Java XML data bindings, Amara is a nice advertisement for Python's dynamicism." Interestingly, Eric van der Vlist recently mentioned to me a project in which he attempts to address some of these deficiencies within Java. TreeBind, is "yet another XML <-> Java binding API." The TreeBind page says:

The difference between TreeBind and most of the other Java binding APIs is that we've tried to minimize the need for any type of schema or configuration file and to maximize the usage of introspection of Java classes in order to facilitate the integration with existing classes."

It's about time, but is Java's introspection really enough? It doesn't save you from welding to Java's type rigidity.

It makes me recall Wolfgang Hoschek's response to one of my Amara announcements on XML-DEV. The Amara example was:

The following is complete code for iterating through address labels in an XML document, while never loading more memory than needed to hold one label element:

from amara import binderytools
        for subtree in binderytools.pushbind('/labels/label',
        source='labels.xml'):
            print subtree.label.name, 'of', subtree.label.address.city

And Wolfgang responded:

Very handy!

FYI, analog example Java code for the current Nux version reads as follows:

StreamingTransform myTransform = new StreamingTransform() {
             public Nodes transform(Element subtree) {
                 System.out.println(XQueryUtil.xquery(subtree, "name").get(0).getValue());
                 System.out.println("of");
                 System.out.println(XQueryUtil.xquery(subtree, "address/city").get(0).getValue());
                 return new Nodes(); // mark current element as subject to garbage collection
             }
        };
        Builder builder = new Builder(new StreamingPathFilter("/labels/label", null).
            createNodeFactory(null, myTransform)); //[Line split by Uche for formatting reasons]
        builder.build(new File("labels.xml"));

It's not as compact as with Amara, but still quite handy...

This is certainly a great leap forward from Java/XML APIs I've seen, even from plain XOM. I'd expect a similar leap, albeit in a different direction, in TreeBind. But in my biased opinion, even such impressive leaps lose a lot of luster when compared to the gains in expressiveness provided by the Python example. Line count is just a bit of the picture. For overall idiom, and the amount of conceptual load buried in each construct, it's hard to even place the Python and Java examples on the same scale.

In here, I think, is the crux of where dynamic language advocates are unimpressed by the productivity gains claimed by XQuery advocates. Productivity gained through declarativity should complement rather than interfere with productivity gained through natural, expressive idiom. XQuery does not meet this test. By imposing a ponderous type framework over XML it provides productivity on one hand while stifling the power of dynamic languages. this is why in Amara I seek to harness the declarative power of unencumbered little languages such as XPath and XSLT patterns to the expressive power of Python. I think this gives us a huge head start over systems using XQuery and Java introspection to tame the chore of XML processing.

See also:

[Uche Ogbuji]

via Copia

Another on Amara

"XML Parsing with Python"--Derek Willis

Let’s face it, relational database types don’t like XML files. They’re structured, sure, but not in quite the way we’re used to. So pulling them apart is a chore for which there are many tools but few that seem to fit easily into the CAR Computer-Assisted Reporting] mindset. Enter [Python and the Amara toolkit. Amara builds on 4Suite, which processes XML and RDF, and it works in a very Pythonic way by essentially turning XML data into Python objects. If I have to parse XML into a relational database, Amara is my tool of choice.

One thing that I've especially appreciated about feedback on Amara is the way users cite it as an example of the essential power of Python, and why it is a draw from even outside of Python. This has always been my aim, more conventionally with 4Suite, and more subversively with Amara. When you compare the weary nature of, say Java XML data bindings, Amara is a nice advertisement for Python's dynamicism.

Later on Willis concludes:

CAR folks can think of [XML as processed through Amara] as calling field names, and instead of printing out elements you can insert them into a database. Nice and easy - the way everybody says XML should be.

And just the way I intended. Nice.

[Uche Ogbuji]

via Copia

Amara Appreciations

Last week I mentioned a kind message Sanjay Velamparambil sent to the 4Suite list. As I said, "it's always pleasing to me as a developer to hear a voice ring out from the noise clearly appreciating the value of the work we've done." This week, Amara gets some love.

In a message to the 4Suite list Wednesday Tom Lazar said:

i just wanted to chime in that just yesterday I had an urgent, real- world problem in where I needed to manipulate an XML Document
programmatically - grep/sed/awk on the textfile would have been too
difficult ("now you've got two problems"[tm]) and an XSLT alone
wouldn't have done it either.

using Amara i hacked a python script that did the whole job in
(literally) ten minutes. as I started the script I was a bit
apprehensive: afterall our script would pick out certain nodes and
assign new values to them (or delete them, depending) and then write
it back to the file - but it all worked without a hitch.

and looking at the script, you'd never think it was handling XML at
all ;-)

so thanks for making Amara and keep up the good work!

You're very welcome, Tom, and thanks for all the help with improving the bindery mutation API.

As if that wasn't enough, the same day I got a private message from another user. I haven't asked his permission so I won't identify him at the moment, but he actually put together a video clip of himself demonstrating Amara. In the clip he shows the eight or so custom Python modules for XML processing that were replaced with a one-liner using Amara. My only regret is that he and his team had to write all that other code in the first place, before finding Amara, but at least they don't have to maintain it any more.

Feedback like that makes all the long hours worthwhile.

[Uche Ogbuji]

via Copia

Deletion added to friendlier Amara mutation

As I've mentioned I added friendlier mutation API to Amara. Deletion didn't come up in the original discussion, but I just got around to addressing that as well. Now checked in are enhancements that support the following use cases:

Use case 10:

Source doc: spameggs
Code: del doc.a.b
Result: doc mutated to eggs

Use case 11:

Source doc: spameggs
Code: del doc.a.b[0]
Result: doc mutated to eggs

Use case 12:

Source doc: spameggs
Code: del doc.a.b[1]
Result: doc mutated to spam

Use case 13:

Source doc: spameggs
Code: del doc.a.b[2]
Result: IndexError

Use case 14:

Source doc: spam
Code: del doc.a.b
Result: doc mutated to spam

Of course there are oddities to go with the new convenience. Check out the following:

>>> from amara import binderytools
>>> doc = binderytools.bind_string("spamspam")
>>> unicode(doc.a.b)
u'spam'
>>> doc.a.b
<amara.bindery.b object at 0x685b2c>
>>> del doc.a.b
>>> unicode(doc.a.b)
u'spam'
>>> #Eh?  Still there, are ye?
...
>>> doc.a.b
<amara.bindery.b object at 0x685b8c>

Perfectly consistent with what the users seem to be saying, I think, but I'll be amazed if this doesn't trip up the odd fellow.

[Uche Ogbuji]

via Copia

Elements versus attributes in Amara

In the previous entry I discussed changes to Amara's mutation API. In the original discussion one of the things that came up was the old element/attribute conundrum. Take the following document:

Users like to be able to access both elements and attributes using friendly Python idiom, but here we have a name clash on the resulting a object.Right now Amara exposes the attribute as a.b and the element as a.b_, using name mangling to disambiguate.

The important thing to remember, however, is that such clashes are quite rare in practice, even when you throw in namespaces, so such mangling is rarely necessary, and I personally think Amara's current behavior makes sense. But I may just have a blind spot, so I've been paying attention to suggestions from others.

Jeremy Kloth suggested just always using different idioms. a.[u"b"] for the attribute and a.b for the element. This is not a bad idea, but I feel that given that clashes are rare, that it complicates the common case just to aid the rare case.

Luis Miguel Morillas had an idea I consider almost the opposite. Rather than completely separate element/attribute idioms, Luis suggests embracing how Amara has unified them. Right now Amara rolls up multiple elements of the same name in a convenient way:

Works such that a.b or a.b[0] yields the element with y and a.b[1] yields the element with z. Luis thinks that the following case should just be an extension of this:

And then a.b or a.b[0] would yields the attribute value (u"x"), a.b[1] would yield the element with y, and a.b[2] would yield the element with z. I kinda think of this idea as "so crazy it almost makes perfect sense", but it's way too big a change to introduce before Amara 1.0. I'd be curious to hear what others think of it. Luis actually brings it up in the context of mutation--see his original post (scroll to the bottom)--but I figure that the mutation API will follow naturally from the access API, so I'm focusing my thoughts a bit.

[Uche Ogbuji]

via Copia

Amara gets friendlier mutation

Tom Lazar asked for a friendlier idiom for mutating elements in Amara. I was reluctant at first because the simpler-on-the-surface idioms he wanted would require rather untidy idioms in the code. I relented to the argument that user convenience comes even before clean code. I finally got around to making and committing the changes today. I'd planned to release Amara 1.0b2 as soon as I'd made these changes, and the timing seems perfect since we've just released 4Suite 1.0b1, but the changes are intrusive enough that I think I'll give folks a chance to try things out from CVS and first see whether it craters for anyone. Please give it a go and give me feedback here or on the mailing list. Thanks.

Here are use cases illustrating the new idioms for Amara. I have added them to the test file mutation.py:

Use case 1:

Source doc: spam
Code: doc.a.b = u"eggs"
Result: doc mutated to eggs

Use case 2:

Source doc: spam
Code: doc.a.b[0] = u"eggs"
Result: doc mutated to eggs

Use case 3:

Source doc:
Code: doc.a.b = u"eggs"
Result: doc mutated to

Use case 4:

Source doc: spamspam
Code: doc.a.b = u"eggs"
Result: doc mutated to eggsspam

Use case 5:

Source doc: spamspam
Code: doc.a.b[0] = u"eggs"
Result: doc mutated to eggsspam

Use case 5:

Source doc: spamspam
Code: doc.a.b[1] = u"eggs"
Result: doc mutated to spameggs

Use case 6:

Source doc: spamspam
Code: doc.a.b[2] = u"eggs"
Result: IndexError

Use case 7:

Source doc: spam
Code: doc.a.b = u"eggs"
Result: doc mutated to spam

Note: attributes take precedence over same name elements in binding. See next use case.

Use case 8:

Source doc: spam
Code: doc.a.b_ = u"eggs"
Result: doc mutated to eggs

In a follow-up entry I'll talk about some other suggestions I've received on this matter.

[Uche Ogbuji]

via Copia