Python/XML column #34 pubbed

"More Unicode Secrets"

In this month's Python and XML column, Uche Ogbuji continues his discussion of Unicode secrets with regard to XML processing in Python, especially BOMs and stream objects. [Jun. 15, 2005]

In the previous article I discussed Unicode compliance in XML tools, and discussed the Python APi for converting strings to Unicode objects, and vice versa. In this one i focus on file ans stream APIs, including a bit of Byte Order Mark (BOM) 101.

[Uche Ogbuji]

via Copia

Fedora and the repository politics (and exploding inkscape)

My Fedora Core 4 upgrade has been remarkably smooth so far. Here are the first two complaints about Fedora Core 4. One big one and one little one.

The little one is that inkscape is useless in FC4, it seems. Segfaults when you blink at it. So just use apt or yum to update, eh? Well, then there is the big problem.

Fedora needs to fix its repository politics. All the following is my perspective as a user, not as a packaging expert. Just anticipating the nitpickers, let me say that I might be wrong on the factual background of of some things I get from my impressions, but I know for sure what I go through as a user. Third party apt and yum repositories for Fedora Core comes in two divided worlds. On one hand there is Fedora extras, considered the official repository, with FreshRPMs and Livna loosely certified as compatible. On the other hand there is the group of repositories coming under the banner of RPMForge (Web site still under construction), led by Dag and including others such as ATRPMS and Dries. You usually cannot mix these two worlds without screwing up your system.

This is the sort of thing that makes Debian folks laugh their heads off, and they're right to do so. (Of course my experience with Debian was so miserable that I'm not in the least tempted to give it another try). Worse than the lack of repository integration is the fact that the various parties have spent energy flinging mud at each other that might have been better spent in integration.

Most of the time, this doesn't matter to me. I choose one side of the fence and chug along. Every Fedora Core release I give yum and the Fedora extras world a try for a couple of weeks. I can never stand it longer than that. Yum is terribly slow. Fedora Extras and friends are terribly slow to incorporate new software. As an example, when I run a script to count the number of RPMs I've for from Dag, AT or Dries because I can't get reasonable fresh versions from Fedora extras and friends, I come up with 89. This is a sure sign that Fedora extras needs to work better with RPMForge. If I were happy with being six months behind the software curve, I would have had one less problem with Debian (I could have stuck with "stable").

So I go on to Dag and friends, and actually, I'm fine from then. Those guys do an amazing job of keeping up on new and updated software without constantly breaking my system (the constant breakage was my other problem with Debian when I went with "testing"). This big repository split only really smacks me in the face on one occasion: at the point after upgrading Fedora when I've been trying yum and Fedora extras for a couple of weeks and realize it's time to jump to apt and RPMForge. At that point I have to do all the apt set-up for the right repositories and co., and deal with the initial wave of conflicts. I'm about at that point now, and hence this rant.

How does this schism serve anything except ego? Fedora extras and co say the other side is uncooperative and will not submit to their hard-core QA. Dag and co say say the other side is uncooperative and insist on stomping on his repository all the time. Couldn't something be worked out so that in effect Fedora Extras is the equivalent of Debian stable and RPMForge the equivalent of Debian testing? I don't know if that makes sense, but surely some form of compromise is possible. The message boards are full of confused users and something really must change.

[Uche Ogbuji]

via Copia

Principle vs. Ritual

The difference between principle and ritual is the reason or motivation for why you do things in the first place. Typically, ritualistic actions are done primarily out of fear of repercussion and judgement (by peers or supernatural entities) or just not knowing any other alternative. In contrast, actions based on principle are motivated by doing what is right (at the most basic, instinctive level).

[Uche Ogbuji]

via Copia

No religious conversion to XML

Sylvain Hellegouarch's comments always seem to require another full blog entry for further discussion (that's a good thing: he asks good questions). In response to "Why support template-like output in Amara?", he said:

Regarding the point of bringing developers who dislike XML into the X-technology world, I think it's useful but I hope you won't try too hard. Whatever tools you could bring to them and how hard you may try, if they have a bad feeling about XML & co., you won't be able to change their mind.

That's not really what I meant. I don't go for religious conversions. The issue is not that there are people out there who will never have anything to do with XML. That's fine. The issue is that some people hate XML but at the same time have no choice but to use XML. You hear a lot of comments such as "I hate that stupid XML, but my job requires me to use it". XML is everywhere (I certainly agree it's overused) and most developers cannot avoid XML even if they dislike it. The idea is to give them sound XML tools that feel right in Python, so that they don't shoot themselves in the foot with kludgery such as parsing with regex, or even the infamous:

print "<foo>", spam, "</foo>"

Aside: if anyone who has to deal with XML is not aware of all the myriad ways that the above will bite you in the arse, they should really read "Proper XML Output in Python". The idea is that tools like Amara don't all of a sudden make people like XML, but rather it makes XML safer and easier for people who hate it. Of course it also makes things easier for people who like it, like me.

I "categorise" people who don't like XML into three sections :

  • Those who never tried and simply judge-before-you-taste.
  • Those who tried XML but didn't use it for the right purpose. Some people only see XML as a language used by some dark J2SE application servers for their configuration file. They don't realise that XML is also a meta language that has brought some other fantastic tools to store, describe, transform, validate, query data.
  • Those who simply react to the hype XML had had in the last 5 years. A bit like when you here during months that a movie you haven't seen at the cinema is fantastic and that you should really watch it. You get so tired of hearing it that you don't want to watch it.

Nice classification. I think the good and the bad of XML is that it has brought so many areas of interest together. As I say in this Intel developer journal article:

XML was a development of the document management community: a way to put all their hard-won architectures on the wide, enticing Web, but when it burst on to the scene, the headlines proclaimed a new king of general-purpose data formats. Programmers pounced on XML to replace their frequent and expensive invention of one-off data formats, and the specialized parsers that handled these formats. Web architects seized XML as a suitable way to define content so that presentation changes could be made cleanly and easily. Database management experts adopted XML as a way to exchange data sets between systems as part of integration tasks. Supply-chain and business interchange professionals embraced XML as an inexpensive and flexible way to encode electronic transactions. When so many disparate disciplines find themselves around the same table, something special is bound to happen.

XML itself is not very special. It represents some refinement, some insight, and many important tradeoffs, but precious little innovation. It has, however, become one of the most important new developments in information systems in part because of the fact that so many groups have come to work with XML, but also because it has focused people's attention to important new ways of thinking about application development.

The reason XML is overhyped is because we live in the age of hype. People don't know how to say "X is useful" any more. They're rather say "it's the gods' solution to every plague released by Pandora" or they say "It's the plaything of the guardians of every circle of Hell". XML is neither, of course. It's useful because it happens to be one data format that is respectable in a wide variety of applications. But like any compromise solution, it is bound to have some weaknesses in each specific area.

[Uche Ogbuji]

via Copia

Why support template-like output in Amara?

When I posted "Sane template-like output for Amara", Sylvain Hellegouarch asked:

I feel like you are on about to re-write XSLT using Python only and I wonder why.

I mean for instance, one of the main reason I'm using XSLT in the first place is that whatever programming language I am using, I don't need to learn a new templating language over and over again. I can simply extend my knowledge of only one : XSLT, and then become better in that specific one.

It also really helps me making a difference between the presentation and the logic upon data since my the logic resides within the programming language itself, not the templating language.

Therefore, although your code looks great, I don't feel confident using it since it would go against what I just said above.

This is a question well worth further discussion.

The first thing is that I have always liked XSLT 1.0, and I still do. Nothing I'm doing in Amara is intended to replace XSLT 1.0 for me. However, there are several things to consider. Firstly, Python programmers seem to have a deep (and generally inexplicable) antipathy towards XML technology. I often hear Pythoneers saying things like "I hate XML because it's over-hyped and used all the time, even when it's not the best choice". Well, this applies to every popular software technology from SQL to Python itself. Nothing new in the case of XML. But never mind all that: Pythoneers don't like XML, and it very often drives them to abuse XML (perhaps if something you dislike is ubiquitous, using it poorly seems a small measure of revenge?) Anyway, someone who doesn't like XML is never going to even look slant-wise at XSLT. One reason for making it easy to do in Amara the sorts of things Sylvain and I may prefer to do in XSLT is to accommodate other preferences.

The second thing to consider is that even if you do like XSLT (1.0 or 2.0), there are places where it's best, and places where it's not. I think the cleanest flow for Web output pipelines can be described by the following diagram. I call this the rendering pipeline pattern (no not "pattern" as in big-hype-ooh-this-is-crazy-deep, but rather "pattern" as in I-do-this-sorta-thing-often-and-here's-what-it-looks-like).

Aside: I've been trying out 2.0 beta, and I'm not sure what's up with the wrong-spelling squiggly on the p word. I also wish I didn't have to use screen capture as a poor man's export to PNG.

Separation of model from view should be in addition to separation of content form presentation, and this flow covers both. For my purposes the source data can be a DBMS, flat files, or whatever. The output content is usually an XML format specially designed to describe the information to be presented (the view) in the idiom of the problem domain. The rendered output is usually HTML, RSS, PDF, etc.

In this case, I would use some such as the proposed output API for Amara in the first arrow. Python code would handle the model logic processing and Amara would provide for convenient generation of the output XML. If some of the source data is in XML, then Amara would help further by making that XML a cinch to parse and process. I would use XSLT for the second arrow, whether on the server or, when it is feasible, through the browser.

The summary is that XSLT is not suitable for all uses that result in XML output. In particular it is not generally suitable for model logic processing. Therefore, it is useful for the tools you do use for model logic processing to have good XML APIs, and one can borrow the best bits of XSLT for this purpose without seeking to completely replace XSLT. That's all I'm looking to do.

[Uche Ogbuji]

via Copia


"Aestheticae"—Peter Saint-André

Peter's centerpiece is a very rich quote from Alexander Baumgarten. Do certainly read Peter's entry in its entirety, but two thoughts struck me upon reading it. First a reaction to Baumgarten.

The Greek philosophers and the Church fathers have already carefully distinguished between things perceived [ αισθητα ] and things known [ νοητα ]. It is entirely evident that they did not equate things known with things of sense, since they honored with this name things also removed from sense (therefore, images). Therefore, things known are to be known by the superior faculty as the object of logic; things perceived are to be known by the inferior faculty, as the object of the science of perception, or aesthetic [ aestheticae ].

Dangerous for me to be second-guessing such a figure, but this seems rather pat. The Greeks are too often used as faceless symbols of steely rationality, and this doesn't do them any service. Clearly "The Greek philosophers" here is code for Aristotle-and-not-blinking-Plato (oversimplifying for my part), and although I'm probably more of an Aristotelian myself (I'd guess most classicist computer scientists are), I shrink in horror from the characterization of Idea [ Ιδέα ] as medium of an inferior faculty. And of course even within νοητα there is the spill-over of dianoia [ Διάνοια ], which is in effect a marker between the perceived and the known. Yes, yes, in Plato's discourse, the perception was a matter of empirical judgment belief rather than sensory response (i.e. relating to episteme rather than techne), but I think the point remains that noos is not so easy to pin down.

Also, a reaction to Peter.

[It] is arguable how much logic has truly contributed to the clarifcation of human concepts (personally I think we are more indebted to the agonistic pursuits of scientists than to the armchair theorizing of philosophers and logicians)

I don't know whether the "agonistic" there is meant restrictively, but I think a large proportion of scientific pursuits are not agonistic, and isn't theoretical science as important as experimental science? Applying logic, mathematical induction and yes, even philosophy to abstract models from the comfort of the armchair or bicycle, is, I think essential to efficient construction of experimentation.

[Uche Ogbuji]

via Copia


Rahab was scarlet—a jolly whale,
Pelagic sex goddess—life-shaper of shale;
Ever jealous Jehovah declared her a whore:
His militant faithful knew cadence no more.

Rahab reigned loudly—a jolly muse,
Broad icon of rhythm—grand matron of blues;
But shunned with her kin after hierophant war,
Left the conquered world knowing cadence no more.

—Uche Ogbuji—"Plaint"

I wrote "Plaint" 15 January 1996 at the Omaha airport on my way home from discharging a contract. I redacted it 1 February 2004 on the Centennial Express 6 chair lift, Beaver Creek, CO.

If you've been watching world events lately, you know the hierophants are still as bellicose as ever, and just as lacking in cadence.

[Uche Ogbuji]

via Copia

Another 4Suite sighting

" - A minimal cross-platform “podcatcher”"—Randi Mooney

I’ve been listening to a lot of podcasts recently[...]. The standard podcast reciever is iPodder, a very feature rich program that is just too bloated for my needs: I want a cross platform downloader that can be scheduled from UNIX cron and works from the command line.

I went hunting for an alternative client[...]. Of course, what I really wanted was a Python based podcast reciever.

So I created - a pure Python Podcast reciever. It depends on the excellent 4suite XML processing library to do all the hard XML processing[...].

[Uche Ogbuji]

via Copia

Omnium gatherum macaronicorum

"Macaronics"—John Cowan

John posts on one of my favorite subjects (BTW, if you're not reading John's blog, you're in deep slumber), Macaronics. The first one he posted is probably the most oft cited example of Engligh/Latin Macaronic verse, and with good reason. It's a wicked funny rhyme by the James Appleton Morgan my the favorite Macaronic piece, (it's ): by Morgan

Prope ripam fluvii solus
A senex silently sat;
Super capitum ecce his wig,
Et wig super, ecce his hat.

Another one I really like is Skelton's wry elegy:

Sepultus est among the weeds,
God forgive him his misdeeds,
With hey ho, rumbelo,
Per omnia saecula,
Saecula saeculorum.

Beyond English/Latin there is no end of brilliant stuff in macaronics of all sorts of languages, for example Charles Leland:

In cœlis wo die götter live, non semper est sereno,
Nor de wein ash goot ash decet in each spaccio di vino.

Lessee... Latin to German to English to Latin to Italian to English to German to Latin to English to Italian. Followed all that?

Afficionados (no pun intended) of Pepys's diary will remark his macaronic use of French and Spanish in a vain attempt to dignify some of his more salacious passages.

Macaronics are named after Maccheronea, an Italian renaissance work with passages of Italian/Latin macaronics.

And lest anyone wag their heads saying "people just aren't that clever any more" (for some value of "any more": Leland is of the 19th/20th century, Morgan of the 19th), some of the most clever macaronic language comes from modern singers reaching across cultures. Take the Renaud song from the early 80s:

When I have rencontred you
You was a jeune fille au pair
And I put a spell on you,
And you roule a pelle to me.

Together we go partout
On my mob il was super
It was friday on my mind,
It was story d'amour.

It is not because you are,
I love you because I do
C'est pas parc' que you are me,
qu'I am you, qu'I am you

You was really beautiful
In the middle of the foule.
Don't let me misunderstood
Don't let me sinon I boude.

My loving, my marshmallow,
You are belle and I are beau.
You give me all what You have
I say thank you, you are bien brave.

This is really French borrowing English for its macaronics, but regardless, gotta love "My loving, my marshmallow, you are belle and I are beau." Put that in rivum and bibe, senex.

I've written a bit of Macaronic verse myself. It's a fun exercise. More fun than regular composition, that's for sure.

[Uche Ogbuji]

via Copia

Python/XML community: Amara, lxml and Picket

Amara XML Toolkit 1.0b3
lxml 0.7
Picket 0.4

Amara XML Toolkit 1.0b3 "is a collection of Python tools for XML processing—not just tools that happen to be written in Python, but tools built from the ground up to use Python idioms and take advantage of the many advantages of Python. Amara builds on 4Suite [], but whereas 4Suite focuses more on literal implementation of XML standards in Python, Amara focuses on Pythonic idiom." In this release:

  • Add xmlsetattribute method to elements, in order to allow adding attributes with namespaces or with illegal Python names
  • Update manual source for markdown, and extensive improvements to the manual (with much help from Jamie Norrish)
  • Add xml_doc facility for nodes
  • Fix support for output parameters in xml()
  • Add support for rules to pushbind
  • Improve XSLT support for bindery objects (see demo/bindery/
  • Bug fixes

lxml 0.7 is an alternative, more Pythonic binding for the libxml2 and libxslt XML processing libraries. Martijn Faassen says "lxml 0.7 is a release with quite a few new features and bug fixes, including XPath expression parameters, XInclude support, XMLSchema validation support, more namespace prefix support, better encoding support, and more."

Sylvain Hellegouarch updated Picket, a simple CherryPy filter for processing XSLT as a template language. It uses 4Suite to do the job. This update is mostly in order to support CherryPy development snapshots that are soon to become CherryPy 2.1. A CherryPy "filter is an object that has a chance to work on a request as it goes through the usual CherryPy processing chain."

[Uche Ogbuji]

via Copia