Python/XML column #33 pubbed

"Unicode Secrets"

In his latest Python-XML column, Uche Ogbuji delves broadly and deeply into the world of Unicode, especially with regard to processing XML in Python.

In this one I started out talking about a quick spot check for Unicode compliance in XML tools, then went on to present some tips on Python's Unicode API. The intent was not to be comprehensive. I cherry-picked the particular Unicode facilities I tend to use the most. As one person mentioned in the comments, there are even more means at your disposal than I cover. I'll get to some of them in part 2, in the next column installment.

[Uche Ogbuji]

via Copia

Are XSLT 2.0 boosters the next XQuery boosters?

I've been reminded today that some folks have taken a very strident tone towards advocacy of XSLT 2.0. Just so there's no mis-connotation of "strident", I'll note that I'm a very enthusiastic booster of Python, and I exhort people to give it a try whenever they I can, but I don't think I go about with the notion that "YOU HAVE TO BE NUTS NOT TO USE PYTHON, DAMMIT". I'd rather show code examples and let them come to such a conclusion. I used to associate strident advocacy with XQuery boosters, reinforced earlier this year when one of them wandered into an XML-DEV thread recently with such a supercilious attitude (in essence: "why would a sane programmer use anything except for XQuery?"). I'm starting to wonder whether the same mechanics are developing with XSLT 2.0. I wonder whether it's not a reaction to the fact that XSLT 2.0 has met with some of the same hostility as XQuery (though as I admit, some people have been changing their minds), combined with the fact that XSLT 2.0 level of implementation seems to be slower in burgeoning than XQuery's. Of course, a counter-argument is that there is plenty of the right kind of advocacy as well, especially in the form of Bob Ducharme's XML.com columns.

I think people will have to understand that XML tools can no longer sue for universality in XML processing just because they come out of the XML oven, especially since the XML oven has been over-cooking its buns for a while now. W3C XML Schema, XQuery, and the SOAP Web services stack are just the egregious examples; all the committees seem to have turned into over-engineering shops. I think I've long ago taken the attitude that the 1.0 series of specs got us as far as we needed to go to tackle XML processing. We can easily get the rest done in "native" environments. This at least seems to be true of the Python and .NET camps. Sure XPath and XSLT 1.0 are limited, but mix them into an expressive enough language or rich enough platform, and I just have trouble seeing the need for the leaps in conceptual load that come with, say XPath 2.0

One thing's, for sure, we're all lucky we're so spoiled for choice.

[Uche Ogbuji]

via Copia

XML recursive directory listing, part 4

"Its hard to finish; or that Pythons tail's a long way away!", by Dave Pawson

Well, I posted Dave's dirlist.py as a little example, in part, of how quickly an XML expert/Python newbie could get something useful whipped up in 4Suite. Based on the very detail-oriented comments, it seems people in general have found it useful, and have run into limitations from the Python newbie side of that equation. Another example of people taking the code very seriously is Lars Trieloff's posting, "Your filesystem is an XML document"

As I mentioned in the posting, I have not put Dave's code through proper code review: I merely tweaked the command line code a bit to get it to work on my Linux box well enough for me to post an example of its workings. Dave has taken it all a bit to heart, but he shouldn't. He got very far in a short amount of time, and it's always the case in learning any new language or platform that the last 10% of polish is very hard won, and yet worth the experience.

I'm passing on all the comments to the other posting to Dave, and he's already sent me an updated version that fixes some issues. I'll post his version if he wishes, but I'll also give his code a proper, full review this weekend, and post that, for the folks who seem to want to use the code practically. The first thing I'll do it to make it conform to PEP 8.

[Uche Ogbuji]

via Copia

XSLT 2.0 might be worth a second look, if...

XSLT 2.0 Is Way Cool, by Micah Dubinko

Micah. Kimber. Pawson. A handful of the folks who have, like me, turned up their nose at XSLT 2.0, are starting to reconsider. This is not a massive drugging campaign by XSLT 2.0 boosters: it seems all these folks still don't want anything to do with the oppressive type system of XPath and XSLT 2.0, and all balk at the stupendous complexity of the specifications. The key to me is that they see these specs as usable without choking on the types mess. Some folks were claiming this was possible 2 years ago or so, but when I checked, I wasn't convinced. Perhaps things have improved since then.

So I may be up for reconsidering my shunning of XSLT 2.0, but as Micah mentions, I'm not about to wade into 9 documents to work on implementation. (OK, so it would really be 4 or so, but those are 4 huge documents, compared to the 1.0 series, which was 2 modestly sized documents). If someone comes up with a coherent spec that omits the type info, it could somehow make its way into the 4Suite post 1.0.

Micah says, "XSLT 2.0 is a power tool. I don't think it will displace XSLT 1.0, which is remarkable for its power in a small package." For a while I've wanted to write a series of comparisons between XSLT 2.0 and Amara code (which includes XPath 1.0 support). Amara is my power tool, for when XSLT 1.0 + EXSLT is not enough, and I find it hard to imagine XSLT 2.0 as offering more power.

And I really need to get back to work on EXSLT. Folks are getting very restless with the fact that work on EXSLT has been fallow for most of 2005. I just wish I could count on some help. Part of what impedes me is a shrinking back from all the demands of the EXSLT community without many offers of help.

[Uche Ogbuji]

via Copia

Python/XML community:

lxml 0.6.0
Picket (updated)

lxml 0.6.0 is an alternative, more Pythonic binding for the libxml2 and libxslt XML processing libraries. Martijn Faassen says "lxml 0.6 contains important bugfixes, in particular better namespace support while handling attributes, as well as a fix for what turned out to be totally broken behavior for etree.tostring(). An upgrade is recommended."

Sylvain Hellegouarch updated Picket, a simple CherryPy filter for processing XSLT as a template language. It uses 4Suite to do the job. He incorporated feedback, including my own thoughts on Processor object management. A CherryPy "filter is an object that has a chance to work on a request as it goes through the usual CherryPy processing chain."

[Uche Ogbuji]

via Copia

PyBlosxom plug-in: latest_comments.py

latest_comments.py

You may have noticed a new feature on Copia. This one was inspired by a feature from Burningbird (Shelley Powers' blog). Copia now lists the last ten comments posted, with links to the author and the referenced entry. This weekend I wrote another plug-in latest_comments.py, which implements this feature. from the doc string:

Generates a template variable, $latest_comments, which contains a listing of the most recent comments to the Weblog, in the form:

<div class="comment-link">
Author 1
on
Entry 1 title
</div>
<div class="comment-link">
Author 2
on
Entry 2 title
</div>

This plugin requires the comments plug-in (comments.py).

This module supports the following, optional config parameter:

latest_comment_count - the number of comments to include in the
                         output (default 5)

It's taken a beating over the past few days, and held up OK. James Governor exposed a Unicode bug when he tracked back to an entry with a title using high characters. That's all fixed now (it took down Copia for a little while).

I release it under Creative Commons Attribution-ShareAlike 2.0 License (I really need to iron out the CC licensing throughout Copia).)

Let me know what you think. I need to get all these plug-ins into CVS and into the PyBlosxom registry one of these days.

[Uche Ogbuji]

via Copia

XML recursive directory listing, part 3

In parts 1 and 2 I discussed code to use Python to recursively walk a directory and emit a nested XML representation of the contents.

Dave Pawson built on my basic techniques and came up with dirlist.py, a fully tricked-out version with all sorts of options and amenities. Well, he wasn't even finished. He sent me a further version today in which he "tidied up [the] program, and added options [for file] date and size."

Cool. I've posted it here: dirlist2.py. If further versions are toward, I'll move it into my CVS. Dave is a self-confessed Python newbie. I had to make some quick fixes just to get it to work on my machine, but I haven't had time to carefully vet the entire program. Please let us know if you run into trouble (a comment here should suffice).

Usage example:

$ mkdir foo
$ mkdir foo/bar
$ touch foo/a.txt
$ touch foo/b.txt
$ touch foo/bar/c.txt
$ touch foo/bar/d.txt
$ python dirlist2.py foo/
Processing /home/uogbuji/foo
<?xml version="1.0" encoding="UTF-8"?>
<directory name="/home/uogbuji/foo">
  <file name="a.txt"/>
  <file name="b.txt"/>
  <directory name="/home/uogbuji/foo/bar">
    <file name="c.txt"/>
    <file name="d.txt"/>
  </directory>
</directory>

$ python dirlist2.py -d foo
Adding file dates
Processing /home/uogbuji/foo
<?xml version="1.0" encoding="UTF-8"?>
<directory name="/home/uogbuji/foo">
  <file date="2005-05-09" name="a.txt"/>
  <file date="2005-05-09" name="b.txt"/>
  <directory name="/home/uogbuji/foo/bar">
    <file date="2005-05-09" name="c.txt"/>
    <file date="2005-05-09" name="d.txt"/>
  </directory>
</directory>

$ python dirlist2.py foo/ foo.xml
Processing /home/uogbuji/foo
$ cat foo.xml
<?xml version="1.0" encoding="UTF-8"?>
<directory name="/home/uogbuji/foo">
  <file name="a.txt"/>
  <file name="b.txt"/>
  <directory name="/home/uogbuji/foo/bar">
    <file name="c.txt"/>
    <file name="d.txt"/>
  </directory>
</directory>

[Uche Ogbuji]

via Copia

XTech 2005 is coming

XML Europe has become XTech in 2005 (it now covers Web technologies overall, not just XML, and there will be a heavy Mozilla presence). The event is just around the corner, with tutorials on May 24 and the main conference from the 25th through the 27th in Amsterdam](http://www.xtech-conference.org/). Chaired by the very capable Edd Dumbill, this has consistently been my favorite conference. I'll be there again, presenting "Matching Python idioms to XML idioms" on the 25th. Check out the XTech 2005 wiki for more info.

Come to XTech for the technology, but you'll remember it for the people. Here are some of mine from XML Europe's past:

Bob DuCharme's panoramic shots of RDF heads at XML Europe 2004 (Amsterdam):

James Clark and me at XML Europe 2003 (London):

Edd and me at XML Europe 2003 (London):

I'll be in Amsterdam from the 24th (I'll probably be sleeping all that day) to the 29th (leaving early that day), and I expect I look forward to having a good time with colleagues while there.

[Uche Ogbuji]

via Copia

Media type for .xslt

Apache uses /etc/mime.types by default to map file extensions to Internet media types (IMTs), a.k.a. mime types. Unfortunately, for most Linux distros this file does not have an entry for .xslt, just .xsl (Fedora Core 3 doesn't even cover .dtd, though SuSE does). I prefer the .xslt, although I admit that preference is a crufty one, dating from before .fo because mt edominant extension for XSL- FO. I ended up hacking my IMT mapping to make sure it has:

text/xml                        xml dtd xsl xslt

I had to restart apache2 for this to take. I expect Apache caches the mappings.

Of course, in that move I sidestepped the whole debate over XML media types ( 1 2 3 4 5 6 7 8), etc. In particular, I chose not to use application/xml or `application/xml +xslt`, in part because I was unsure of UA compatibility. And don't even ask about Microsoft's rogue text/xsl, lest I embark on a long polemic about corruption and pestilence.

[Uche Ogbuji]

via Copia

4Suite 1.0b1 for Fedora Core 4 Test 3

Hooray! The 4Suite RPM shipped with Fedora Core 4 Test 3 release is "4Suite-1.0-8.b1.i386.rpm", according to the RPM list. As Dave Pawson and I found out a few days ago, this is 4Suite 1.0b1. I was worried it may not make it all the way through FC quality control in time, but seems it did. I need to try out FC4T3 on one of my non-critical machines this weekend. I also need yet another non-critical machine so I can check out all the hype about Ubuntu.

[Uche Ogbuji]

via Copia