i18n for XSLT in 4Suite

Prodded by discussion on the CherryPy list I have implemented and checked in a 4Suite XSLT extension for internationalization using Python's gettext facilities for the underlying support. Here is how it works. Sample XML:


Sample XSLT:

<xsl:stylesheet version="1.0"
  <f:setup-translations domain="test"/>
  <xsl:template match="msg">

The f:setup-translations and f:gettext extension elements are the key. The former looks up and installs the domain "test" for use in your XSLT. Replace it with the domain used by your application. The latter extension evaluates its body to get a string value, and then looks up this string in the installed translation.

Assuming you have a test.mo installed in the right place, say that translates "hello" to "howdy" and "goodbye" to "so long".

$ 4xslt test.xml test.xsl
<?xml version="1.0" encoding="UTF-8"?>
  so long

I trimmed some white space for formatting, but you get the idea. The translations are applied automatically according to your locale.

This operates via Python's gettext facilities which means that it's much more efficient than, say, the docbook XSLT approach to i18n.

For those who want to give it a whirl, here's a quick step-by-step. All the needed files are available here on copia.

Create a sandbox locale directory:

mkdir -p /tmp/locale/en_US/LC_MESSAGES/

Copy in the catalog. You may need to create a different catalog for your own language if your system will not be selecting en_US as locale (remember that you can hack the locale via the environment)

cp en_US.mo /tmp/locale/en_US/LC_MESSAGES/test.mo

Your locale is probably not en_US. If not, you can:

  • temporarily override your locale to en_us using export LANG=en_US, or the equivalent command for your shell
  • create translations for your locale (just two strings to translate). I use poedit, which is makes dealing with .pos simple enough. Then replace en_US in all the above instructions with your own locale and the .mo file you created.

Anyway, the f:setup-translations and f:gettext extensions are now checked into 4Suite. You can either update to current 4Suite CVS, or just download the one changed file, Ft/Xml/Xslt/BuiltInExtElements.py and copy it into your 4Suite codebase. It works fine as a drop-in to 4Suite 1.0b1.

[Uche Ogbuji]

via Copia

Rafting the crazy Clear Creek

This weekend's adventure was white water rafting Clear Creek, between Idaho Springs, CO and Golden. I drive by that creek all the time on I-70 and Route 6 on the way to and from snowboarding, and I've never thought of it as a big deal, but we've had an great year for snow and rain and this is high season, so it turns out that there are class 3 through class 5 rapids to ride. I've been rafting before, but class 3 tops, so I figured it should be a blast, especially since I'm rather scared of water.

I got in a group with 12 of my friends, and we went for a full day trip Saturday. In the morning, we started off on the class 3s, and had a couple of incidents in class 4 areas. Melisse fell out of the boat entirely at one point and Noah and Philippe had to haul her out of the water. And then we wrapped our raft around a high rock, and had to do some very frantic "high-side" maneouvers to avoid flipping the entire rig. I lost my paddle a couple of times, usually catching it on a rock in a middle of a stroke, but once I ditched it when Dawn lunged to prevent me from pitching off the boat as it lurched to port, and then I had to lunge to hold her in when the boat lurched back to starboard.

It was wild fun, though. Our guide Sean started by making us yell "Yee- haw" after we survived each class 4 section (he's from West Virginia but he's rafted a lot on the Zambezi). We, being the group we are, decided that wasn't multi-culti enough and added a French "Hourah" and an Igbo "Chineke" (literally "GOD ALMIGHTY!") Zelda said "I think 'Chineke' is the most satisfying yell", and indeed, we all practiced it a good deal.

So what, after lunch, do we do after all that? Up it to class 4/5, of course. Melisse, Maggie and Noah had had enough and bailed, but the rest of us tackled the bottom, harder part of the course. Amazingly, though we went through ridiculously huge drops, spins and slides with names like "tornado turn", "guide ejector", "double knife", etc., we didn't eject anyone or flip the raft. A bloody good thing because looking at that churning mess, it would have been a pretty dire situation if someone ended up in the water. The guide company had a spotter/rescue dude with a mean hand at his kayak, but even he would have had difficulty getting to someone through all that.

I had a serious case of omni-pain (a term, all too familiar to first- time snowboarders) all that night and the next day from all the hard paddling and lurching about, but man was that a rush. I'll have to give it a go again, soon. Especially if we see epic white water like that again.

Big up to our guide Sean who kept us undrowned, and to High side adventures, the guide company.

More pictures on Flickr

[Uche Ogbuji]

via Copia

Pakistani comedic class terms

Via Language Log I came across this delightful conversation on some whimsical slang terms Pakistanis use to express class and class affectation. It's a hilarious exchange in its own right, and as a bonus it makes me think of similar terms in Nigeria (although I'm over a decade dated in my Naija slang).

[Mr. Fradia]: ...mummy-daddy refers to someone who is not [independent] enough etc. Burger is used more for ppl who are stuck up and wanna-be western types.

There are many terms for both in naija slang, but it makes me think of the term (originally Lagosian, I think) aje-butter, which refers to someone who is a soft, namby-pamby, mama's boy as a result of having lived too much of the supposed good life in the US, UK, etc. I know too well: I was viciously set upon as an aje-butter when we moved from Florida to Enugu, then Owerri, Nigeria in 1980.

[Zakiii]: I can understand someone wanting to Black/Latino but why English/American?

[Mr. Fradia (responding]: i suppose they want to be preppie rather than ghetto [<grin>]

This exchange intrigued me. As far as I can tell these folks are all living in Pakistan. I have been getting the sense recently that if US and UK culture seem to be universally soluble, that lately it's been urban Hip-Hop or yardie culture that has been filling the aspirational role for youngsters in developing nations. I was early to Hip-Hop, and while others in my class wanted to be like Madonna (Travolta was never really that big there, as I recall), I was aspiring more toward The Furious Five and the Treacherous Three. It looks like that dissonance was a microcosm of the trend that has culminated in statements such as "I can understand someone wanting to Black/Latino but why English/American?"

But is that a good thing, when it so often involves a gross distortion of what it really means to be Black or Latino in the US? I suppose Madonna as picture of America is no less a distortion.

[Mr. Fradia]: what teh diff between soemone pretending he is james dean versus someone pretending he is anil kapoor (god knows there are tons of them in karachi..or were rather) except that the james dean wanna be probably does not smell as bad.

[ravage]: Those who ape Anil Kapoor are known as arsewipes in our circle. Dunno if its a generally accepted term though.

Ouch. I was rolling in the aisles at this point. You can't get laughs like this on your local corner.

[ravage]: Mummy Daddy is a catalyst for burgerness, but one may be mummy daddy without strictly belonging to the latter class. For instance I have come across mummy-daddy abcds, Mummy Daddy paindoos, and Mummy Daddy Nawab sahabs.

And so it goes on through "galli ka londa types", "pindi walay" and always back to "burgher".

[sadzzz]: The term Burgher was applied during the period of Dutch rule to European nationals living in Sri Lanka... ...the so called burghars of india are called "anglos" [Hum Sa Ho To Samne Aaye (responding)]: Hey in Peshawar we call these kinda people "tommy" [<big grin>]

Back in Language Log Hobson Jobson is quoted as characterizing the term "burgher" (or "burghar") as follows.

The Dutch admitted people of mixt descent to a kind of citizenship, and these people were distinguished by this name from pure natives. The word now indicates any persons who claim to be of partly European descent, and is used in the same sense as 'halfcaste' and 'Eurasian' in India Proper.

I suppose that the two nuances of "burgher" in this entire thread tend to converge on the Hindi term "firanghi" (originally from Arabic, as I recall). And while I'm on "firanghi", I'm sure I'm not the only language geek that finds it hard to suppress a smile whenever the "Ferengi" show up on Star Trek TNG. The show was always cited for being culturally avant-garde, but not so often recognized for being culturally subversive.

[Uche Ogbuji]

via Copia

Test post from Drivel

This is the first attempt to post from the new Drivel 2.0 weblogging tool and it will probably be the last. For one thing, Drivel doesn't seem to support MetaWeblog, just Blogger 1.0 (I hope PyBlosxom gets Atom API supporrt soon). And then when I started it up for the first time I found myself staring at a "Sending/Receiving...Retrieving journal entries..." app-modal dialog for over three minutes while the progress indicator crawled along. Doesn't exactly fill me with confidence. I've got into such a groove posting via e-mail that I'd have to be wowed from the get-go before I make a switch at this point. Maybe Drivel 3.0?

[Uche Ogbuji]

via Copia

Onye ma Uche?

"The Mythology of Igbo Names", Uche Nworah

One of my many namesakes muses about the name, and other Igbo names. Interestingly enough, it seems that his name is really just "Uche". Mine is actually "Uchenna", and it's not very usual for one to be given the name "Uche"—. This bare form is much more common as a surname. "Uche" is an Igbo word that approximates English words such as "will", "desire", "plan", "counsel", "intelligence", "knowledge", etc. It's sort of sophia meets consilium meets in animo habere.

As Uche Nworah says:

There is uchenna, uchechukwu, and uchechi which a man or woman can bear.

Yes, and there's also "Uchendu" ("thinking about life"/"will for life", etc.), "Ucheoma" ("good will", "sound mind", etc.), and rare cases "Ucheji" ("will for yam", metonymic for "will for wealth") and "Uchegbum" ("Worries won't be the death of me"). Note: if you're wondering how Igbo packs so much meaning into such small packages, it's largely because of the tonality of the language. So for example, the way the "e" is pronounced in "Uchegbum" actually serves two purposes, one of which is to express the negative sense of the phrase.

"Uchenna" in my experience is by far the most common "Uche" name. I've probably known a hundred or more with that name. I'd say they're three quarters male. This makes it interesting that Nworah finds that people he encounters associate "Uche" with girls rather than boys.

Igbo names like most other names (non-Igbo) have symbolic meanings. These different versions of uche all mean the wishes or heart of God, As some people may think, uchenna does not mean the wishes or heart of the father of the child, Nna in this sense means God Almighty, if it meant the former, then feminists would argue and demand for the naming of children uchenne (the wishes of the mother). While there is no reason not to, I am yet to encounter nor hear of anybody bearing it, a task for modernists and feminists then, you may say.

It is always dangerous to make such generalizations about Igbo names. They are almost always loose formulations upon which a range of meanings can be attached, depending on circumstance. My own name is a counter- example to Nworah's assumption, with "Uchenna" literally meaning the will of my father, Dr. Ogbuji. My mother wanted me to be a girl, my father wanted me to be a boy, it turned out as my father wished, so I was named "Uchenna". Simple as that. I think the fact that you don't see "Uchenne" as a name has more to do with arbitrary convention than any specific code attached to "nna". After all, the name "Uchenna" predates the import of Christianity's single, male god into Igbo culture. The narrow meaning Nworah cites for "Uchenna" is often translated into the English name "Godswill", which feels very alien to me as a translation of my name.

Nworah later on mention "Obiageli" and "Ifeoma" (also "Iheoma") as names reserved for girls, even though there is nothing in their meaning thet has to do with female sex . Other such examples are "Nkechi" ("god's very own", "my spirit's own"), "Uloma" ("good house") and "Nkiruka" ("the future is bright", "the best is yet to come"). There are numerous examples the other way as well.

The rest of Nworah's article is interesting, but I wouldn't swallow it all whole. There is a great deal of generalization in it, and I think in many cases it papers over the huge complexity of Igbo culture whether in pre-colonial or modern times. He also laments a lack of Igbo scholarship over naming in our culture, which I think is very surprising. There is a metric tonne of scholarship on Igbo naming (as with every other aspect of Igbo culture, it seems). I often feel as if we have the most analyzed names on the planet, looking only at modern study. Just a casual poke at Google reveals a lot of material on Igbo names, and I've seen four or five books on the topic.

BTW, the title of this piece means "Who knows Uche?".

[Uche Ogbuji]

via Copia

New life for PyXPCOM?

Way back in the day I wrote about PyXPCOM, a means for using Python to script Mozilla browser. and the project had a lot of promise.

Mark Hammond was the considerable brains behind PyXPCOM, as well as the Win32 and .NET APIs through Python, and many other things. Indeed, he received the 2003 ActiveState Active Award in Python (the same year Mike Olson and I got one for XSLT). Unfortunately, he has been way below the radar for the bast couple of years, and no one has really picked up the torch on PyXPCOM. The project has been largely languishing for so long that it was quite exciting to see Brendan Eich, keeper of the Mozilla roadmap, including "Mozilla 2.0 platform must-haves":

8. Python support, perhaps via Mono (if so, along with other programming languages).

I'm not sure just how Mono would fit in. Would they build a little CLR sandbox into Mozilla so that Python.NET code could run?

Anyway, if you care about being able to script Mozilla through Python (and I think you should), please leave a comment on Brendan's article. Here's a note about some of the comments already in place on the matter:

#8 scares me only for the potentially huge installer file. If it were optional this would be incredibly cool. If it were optional developers would have a headache.

I think it should be enough for Mozilla to include the PyXPCOM stubs, and use the user's own installed Python, which should alleviate this fear.

Hmm. What do you think about Parrot (Perl 6) support? Soon, Parrot will be something like [stable], and the hope is that it will support a lot of languages, includes Python. I would give it a chance, sounds good.

From what I've followed about Parrot and its intended use as a basis for other languages such as Python, I'm not comfortable with such an approach.

Python support can be provided via Jython which is much older than the .NET python implementation.

It seems people want to offer up every VM incarnation on the planet as a possible base for Mozilla/Python, but I'm spoiled by the potential I saw through Hammond's work, and I really would want the project to at least try picking up from there. I was therefore glad to see Brendan's response:

We already have Python integrated with XPCOM, thanks to Mark Hammond and Active State. If nothing better comes along in the way of a unified runtime, we will fully integrate Mark's work so you can write <script type="application/x-python"> in XUL.

Whether Python support will be bundled in libxul or not, I'm pushing for a scheme that lets extension languages be loaded dynamically. So if you have connectivity or can deploy an extra file, you should be able to use Python as well as JS from XUL. That's my goal, at least.

See my next entry for the Mozilla 2.0 "managed code" virtual machine goals that any would-be universal runtime has to meet, or come close to meeting, to win.

This sounds just right, and I'll keep my eye open for the follow-up article he mentions. Another poster mentions:

I would like to see the ability to talk to Mozilla from outside Python code. A program I am writing allows importing contacts from various data sources. I can do Outlook and Evolution easily, but have given up on Mozilla contacts.

In theory I need to use XPCom with the PyXPCom wrapper but I challenge anyone to actually get that working on Windows, Linux and Mac and have a redistributable program. (There are no binaries of PyXPCom for example).

Yes, PyXPCOM does allow this in theory, and i think Brendan's entire point is that it's important for Mozilla developers to put in the work to address the problem stated in the second paragraph.

If you're trying to work with PyXPCOM, keep an eye on the mailing list. Folks have been posting their problems, and others have been sharing their recipes for getting PyXPCOM to work, including Matt Campbell and Scott Robertson and Jean-François Rameau, and Michael Thornhill (1 2).

[Uche Ogbuji]

via Copia

Wizard worries

"Just because..."—Sean McGrath

Just because everyone can now create an XML schema because of all the easy to use GUI tools, doesn't mean that everyone should create XML schemas.

Yes indeed. This has been a growing problem for quite a while now as people give over their XML tasks to bottled genies. As I concluded in "The worry about program wizards":

In the end, there is no substitute for programmer expertise and experience. Wizards do have their place, but it seems that their occasional convenience should not form a backbone consideration for the development of any technology. In particular, it is dangerous to lead developments in XML and Web services with a significant purpose of reviving the great age of wizards.

Sean again:

I can use AutoCAD but I wouldn't dream of designing a house because I don't know enough about houses or building or any of that stuff.

Very well put, as usual.

[Uche Ogbuji]

via Copia

Pythonic SPARQL API over rdflib

I've recently been investigating the possiblity of adapting an existing SPARQL parser/query engine on top of 4RDF - mostly for the eventual purpose of implementing a sparql-eval Versa extension function - was pleased to see there has already been some similar work done:

Although this isn't exactly what I had in mind (the more robust option would be to write an adaptor for Redland's model API and execute SPARQL queries via rasqal ), it provides an interesting pythonic analog to querying RDF.

Chimezie Ogbuji

via Copia

Amara equivalents of Mike Kay's XSLT 2.0, XQuery examples

Since seeing Mike Kay's presentation at XTech 2005 I've been meaning to write up some Amara equivalents to the examples in the paper, "Comparing XSLT and XQuery". Here they are.

This is not meant to be an advocacy piece, but rather a set of useful examples. I think the Amara examples tend to be easier to follow for typical programmers (although they also expose some things I'd like to improve), but with XSLT and XQuery you get cleaner declarative semantics, and cross-language support.

It is by no means always true that an XSLT stylesheet (whether 1.0 or 2.0) is longer than the equivalent in XQuery. Consider the simple task: create a copy of a document that is identical to the original except that all NOTE attributes are omitted. Here is an XSLT stylesheet that does the job. It's a simple variation on the standard identity template that forms part of every XSLT developer's repertoire:

<xsl:stylesheet version="1.0"

<xsl:template match="*">
    <xsl:copy-of select="@* except @NOTE"/>


In XQuery, lacking an apply-templates instruction and built-in template rules, the recursive descent has to be programmed by hand:

declare function local:copy($node as element()) {
  element {node-name($node)} {
    (@* except @NOTE,
    for $c in child::node
    return typeswitch($c) 
      case $e as element() return local:copy($a)
      case $t as text() return $t
      case $c as comment() return $c
      case $p as processing-instruction return $p


Here is Amara code to do the same thing:

def ident_except_note(doc):
    for elem in doc.xml_xpath(u'//*[@NOTE]'):
        del elem.NOTE
    print doc.xml()

Later on in the paper:

...nearly every FLWOR expression has a direct equivalent in XSLT. For example, to take a query from the XMark benchmark:

for    $b in doc("auction.xml")/site/regions//item
let    $k := $b/name
order by $k
return <item name="{$k}">{ $b/location } </item>

is equivalent to the XSLT code:

<xsl:for-each select="doc('auction.xml')/site/regions//item">
  <xsl:sort select="name"/>
  <item name="{name}"
     <xsl:value-of select="location"/>

In Amara:

def sort_by_name():
    doc = binderytools.bind_file('auction.xml')
    newdoc = binderytools.create_document()
    items = doc.xml_xpath(u'/site/regions//item')
    for item in items:
            newdoc.xml_element(u'item', content=item)

This is the first of a couple of examples from XMark. To understand the examples more fully you might want to browse the paper, "The XML Benchmark Project". This was the first I'd heard of XMark, and it seems a pretty useful benchmarking test case, except that it's very heavy on records-like XML (not much on prosy, narrative documents with mixed content, significant element order, and the like). As, such I think it could only ever be a sliver of one half of any comprehensive benchmarking framework.

I think the main thing this makes me wonder about Amara is whether there is any way to make the element creation API a bit simpler, but that's not a new point for me to ponder, and if I can think of anything nicer, I'll work on it post 1.0.

Kay's paper next takes on more complex example from XMark: "Q9: List the names of persons an the names of items they bought in Europe". In database terms this is a joins across person, closed_auction and item element sets. In XQuery:

for $p in doc("auction.xml")/site/people/person
let $a := 
   for $t in doc("auction.xml")/site/closed_auctions/closed_auction
   let $n := for $t2 in doc("auction.xml")/site/regions/europe/item
                       where  $t/itemref/@item = $t2/@id
                       return $t2
       where $p/@id = $t/buyer/@person
       return <item> {$n/name} </item>
return <person name="{$p/name}">{ $a }</person>

Mike Kay's XSLT 2.0 equivalent.

<xsl:for-each select="doc('auction.xml')/site/people/person">
  <xsl:variable name="p" select="."/>
  <xsl:variable name="a" as="element(item)*">
      <xsl:variable name="t" select="."/>
      <xsl:variable name="n" 
                               [$t/itemref/@item = @id]"/>
      <xsl:if test="$p/@id = $t/buyer/person">
        <item><xsl:copy-of select="$n/name"/></item>
  <person name="{$p/name}">
    <xsl:copy-of select="$a"/>

In Amara:

def closed_auction_items_by_name():
    doc = binderytools.bind_file('auction.xml')
    newdoc = binderytools.create_document()
    #Iterate over each person
    for person in doc.xml_xpath(u'/site/people/person'):
        #Prepare the wrapper element for each person
        person_elem = newdoc.xml_element(
            attributes={u'name': unicode(person.name)}
        #Join to compute all the items this person bought in Europe
        items = [ unicode(item.name)
          for closed in doc.xml_xpath(u'/site/closed_auctions/closed_auction')
          for item in doc.xml_xpath(u'/site/regions/europe/item')
          if (item.id == closed.itemref.item
              and person.id == closed.buyer.person)
        #XML chunk with results of join
        for item in items:
                newdoc.xml_element(u'item', content=item)
    #All done.  Print out the resulting document
    print newdoc.xml()

I think the central loop in this case is much clearer as a Python list comprehension than in either the XQuery or XSLT 2.0 case, but I think Amara suffers a bit from the less literal element creation syntax, and for the need to "cast" to Unicode. I would like to lay out cases where casts from bound XML structures to Unicode make sense, so I can get user feedback and implement accordingly. Kay's final example is as follows.

The following code, for example, replaces the text see [Kay, 93] with see Kay93.

<xsl:analyze-string select="$input" regex="\[(.*),(.*)\]">
    <author><xsl:value-of select="regex-group(1)"/></author>
    <year><xsl:value-of select="regex-group(2)"/></year>
  <xsl:value-of select="."/>

The only way of achieving this transformation using XQuery 1.0 is to write some fairly convoluted recursive functions.

Here is the Amara version:

import re
PATTERN = re.compile(r'[(.*),(.*)]')
def repl_func(m):
    citation = doc.xml_element(u'item')
    citation.xml_append(doc.xml_element(u'author', content=m.group (1)))
    citation.xml_append(doc.xml_element(u'year', content=m.group (2)))
    return citation.xml(omitXmlDeclaration=u'yes')
text = u'see [Kay, 93]'
print PATTERN.subn(repl_func, text)

I think this is very smooth, with the only possible rough patch again being the output generation syntax.

I should mention that Amara's output syntax isn't really bad. It's just verbose because of its Python idiom. XQuery and XSLT have the advantage that you can pretty much write XML in-line into the code (the templating approach), whereas Python's syntax doesn't allow for this. There has been a lot of discussion of more literal XML template syntax for Python and other languages, but I tend to think it's not worth it, even considering that it would simplify the XML generation syntax of tools such as Amara. Maybe it would be nice to have a lightweight templating system that allows you to write XSLT template chunks in-line with Amara code for data processing, but then, as with most such templating systems, you run into issues of poor model/presentation separation. Clearly this is a matter for much more pondering.

[Uche Ogbuji]

via Copia


Mountain winter defies the order,
Denies the bonding of elements.
The wooded snow and the falling wind
Force the repentance of birdsong.
Unbroken sun razes gooseflesh,
Floods snow, and drowns the senses,
Pitched in broken bottle rainbow battle
With trenchant ice-cold mountain streams.

—Uche Ogbuji—from "Mountain Summer"

Yesterday I happened to be going through some of my verse, and I noticed the date on which I wrote "Mountain Summer": 11 June 1995. A decade ago to the day, today. Remarkable coincidence. I wrote it on a road trip with best friend Arild (who just became the father of twins Thursday), as well as Rachel and Dagmara. A Nigerian, two Norwegians and a Pole: two guys, two gals, driving across the West. It was one of those magical trips that serve so many of us as a marker of our twenties. We were lolling about at Yosemite National Park, where, even though it was the heart of Summer, we sought and found a few snowy peaks (we'd already found a touch of Summer Zero by hiking up to St. Mary's Glacier in Colorado earlier on that trip). While up there we saw a mother and child riding a saucer in the snow, and I was moved to write.

This poem and two others written at about the same time were published in ELF: Eclectic Literary Forum, a respected but now defunct lit mag, in early 1996.

In another neat bit of coincidence, for me, today was also a long- planned white-water rafting trip with a lot of my more recent friends. I certainly got to sample a good deal of the "trenchant ice-cold mountain streams". In fact, I got soaked in it. It was a glorious adventure, and it further reminded me of that other glorious adventure a decade ago, and the writing to which I was inspired back then.

[Uche Ogbuji]

via Copia