I finally broke down and made Copia safe for MSIE. When I first set up the site, I tweaked the IE look a bit, but it was such a frustrating exercise that I gave up once satisfied with its appearance on FireFox and Safari (I need to install Opera for testing). Last night I found this excellent Wiki resource and soon got things sorted out. In the process I was alerted to the fact that Copia gets rendered in quirks mode, which is not what we want. I think I know how to fix most of the problems, but some issues are buried in PyBlosxom and plug-ins code, I think, so it may have to wait until my next burst of energy before we can sport one of those fly "valid ?HTML ?" icons.
Well, I'm heading off to catch the flight to Amsterdam for XTech 2005. I'll blog as much as I can, and I have some FOSS work to do as well, on Amara, especially, to prep the 1.0b2 release.
We've had the spam comment folks doing their thing here, and so far I've been able to keep them mostly in check by deleting them soon after they appear. The trip will probably leave too big a hole for them, so for now I've turned on draft mode for comments. All comments will be held until explicitly approved. I apologize for any inconvenience. I've been tinkering on a more solid spam fighting system, building on the great work others have done on black-listing the punks.
In his latest Python-XML column, Uche Ogbuji delves broadly and deeply into the world of Unicode, especially with regard to processing XML in Python.
In this one I started out talking about a quick spot check for Unicode compliance in XML tools, then went on to present some tips on Python's Unicode API. The intent was not to be comprehensive. I cherry-picked the particular Unicode facilities I tend to use the most. As one person mentioned in the comments, there are even more means at your disposal than I cover. I'll get to some of them in part 2, in the next column installment.
I've been reminded today that some folks have taken a very strident tone towards advocacy of XSLT 2.0. Just so there's no mis-connotation of "strident", I'll note that I'm a very enthusiastic booster of Python, and I exhort people to give it a try whenever they I can, but I don't think I go about with the notion that "YOU HAVE TO BE NUTS NOT TO USE PYTHON, DAMMIT". I'd rather show code examples and let them come to such a conclusion. I used to associate strident advocacy with XQuery boosters, reinforced earlier this year when one of them wandered into an XML-DEV thread recently with such a supercilious attitude (in essence: "why would a sane programmer use anything except for XQuery?"). I'm starting to wonder whether the same mechanics are developing with XSLT 2.0. I wonder whether it's not a reaction to the fact that XSLT 2.0 has met with some of the same hostility as XQuery (though as I admit, some people have been changing their minds), combined with the fact that XSLT 2.0 level of implementation seems to be slower in burgeoning than XQuery's. Of course, a counter-argument is that there is plenty of the right kind of advocacy as well, especially in the form of Bob Ducharme's XML.com columns.
I think people will have to understand that XML tools can no longer sue for universality in XML processing just because they come out of the XML oven, especially since the XML oven has been over-cooking its buns for a while now. W3C XML Schema, XQuery, and the SOAP Web services stack are just the egregious examples; all the committees seem to have turned into over-engineering shops. I think I've long ago taken the attitude that the 1.0 series of specs got us as far as we needed to go to tackle XML processing. We can easily get the rest done in "native" environments. This at least seems to be true of the Python and .NET camps. Sure XPath and XSLT 1.0 are limited, but mix them into an expressive enough language or rich enough platform, and I just have trouble seeing the need for the leaps in conceptual load that come with, say XPath 2.0
One thing's, for sure, we're all lucky we're so spoiled for choice.
"Its hard to finish; or that Pythons tail's a long way away!", by Dave Pawson
Well, I posted Dave's dirlist.py as a little example, in part, of how quickly an XML expert/Python newbie could get something useful whipped up in 4Suite. Based on the very detail-oriented comments, it seems people in general have found it useful, and have run into limitations from the Python newbie side of that equation. Another example of people taking the code very seriously is Lars Trieloff's posting, "Your filesystem is an XML document"
As I mentioned in the posting, I have not put Dave's code through proper code review: I merely tweaked the command line code a bit to get it to work on my Linux box well enough for me to post an example of its workings. Dave has taken it all a bit to heart, but he shouldn't. He got very far in a short amount of time, and it's always the case in learning any new language or platform that the last 10% of polish is very hard won, and yet worth the experience.
I'm passing on all the comments to the other posting to Dave, and he's already sent me an updated version that fixes some issues. I'll post his version if he wishes, but I'll also give his code a proper, full review this weekend, and post that, for the folks who seem to want to use the code practically. The first thing I'll do it to make it conform to PEP 8.
XSLT 2.0 Is Way Cool, by Micah Dubinko
Micah. Kimber. Pawson. A handful of the folks who have, like me, turned up their nose at XSLT 2.0, are starting to reconsider. This is not a massive drugging campaign by XSLT 2.0 boosters: it seems all these folks still don't want anything to do with the oppressive type system of XPath and XSLT 2.0, and all balk at the stupendous complexity of the specifications. The key to me is that they see these specs as usable without choking on the types mess. Some folks were claiming this was possible 2 years ago or so, but when I checked, I wasn't convinced. Perhaps things have improved since then.
So I may be up for reconsidering my shunning of XSLT 2.0, but as Micah mentions, I'm not about to wade into 9 documents to work on implementation. (OK, so it would really be 4 or so, but those are 4 huge documents, compared to the 1.0 series, which was 2 modestly sized documents). If someone comes up with a coherent spec that omits the type info, it could somehow make its way into the 4Suite post 1.0.
Micah says, "XSLT 2.0 is a power tool. I don't think it will displace XSLT 1.0, which is remarkable for its power in a small package." For a while I've wanted to write a series of comparisons between XSLT 2.0 and Amara code (which includes XPath 1.0 support). Amara is my power tool, for when XSLT 1.0 + EXSLT is not enough, and I find it hard to imagine XSLT 2.0 as offering more power.
And I really need to get back to work on EXSLT. Folks are getting very restless with the fact that work on EXSLT has been fallow for most of 2005. I just wish I could count on some help. Part of what impedes me is a shrinking back from all the demands of the EXSLT community without many offers of help.
lxml 0.6.0 is an alternative, more Pythonic binding for the libxml2 and libxslt XML processing libraries. Martijn Faassen says "lxml 0.6 contains important bugfixes, in particular better namespace support while handling attributes, as well as a fix for what turned out to be totally broken behavior for etree.tostring(). An upgrade is recommended."
Sylvain Hellegouarch updated Picket, a simple CherryPy filter for processing XSLT as a template language. It uses 4Suite to do the job. He incorporated feedback, including my own thoughts on Processor object management. A CherryPy "filter is an object that has a chance to work on a request as it goes through the usual CherryPy processing chain."
You may have noticed a new feature on Copia. This one was inspired by a feature from Burningbird (Shelley Powers' blog). Copia now lists the last ten comments posted, with links to the author and the referenced entry. This weekend I wrote another plug-in latest_comments.py, which implements this feature. from the doc string:
Generates a template variable, $latest_comments, which contains a listing of the most recent comments to the Weblog, in the form:
<div class="comment-link"> Author 1 on Entry 1 title </div> <div class="comment-link"> Author 2 on Entry 2 title </div>
This plugin requires the comments plug-in (comments.py).
This module supports the following, optional config parameter:
latest_comment_count - the number of comments to include in the output (default 5)
It's taken a beating over the past few days, and held up OK. James Governor exposed a Unicode bug when he tracked back to an entry with a title using high characters. That's all fixed now (it took down Copia for a little while).
I release it under Creative Commons Attribution-ShareAlike 2.0 License (I really need to iron out the CC licensing throughout Copia).)
Let me know what you think. I need to get all these plug-ins into CVS and into the PyBlosxom registry one of these days.
In parts 1 and 2 I discussed code to use Python to recursively walk a directory and emit a nested XML representation of the contents.
Dave Pawson built on my basic techniques and came up with dirlist.py
, a fully tricked-out version with all sorts of options and amenities. Well, he wasn't even finished. He sent me a further version today in which he "tidied up [the] program, and added options [for file] date and size."
Cool. I've posted it here: dirlist2.py. If further versions are toward, I'll move it into my CVS. Dave is a self-confessed Python newbie. I had to make some quick fixes just to get it to work on my machine, but I haven't had time to carefully vet the entire program. Please let us know if you run into trouble (a comment here should suffice).
Usage example:
$ mkdir foo $ mkdir foo/bar $ touch foo/a.txt $ touch foo/b.txt $ touch foo/bar/c.txt $ touch foo/bar/d.txt $ python dirlist2.py foo/ Processing /home/uogbuji/foo <?xml version="1.0" encoding="UTF-8"?> <directory name="/home/uogbuji/foo"> <file name="a.txt"/> <file name="b.txt"/> <directory name="/home/uogbuji/foo/bar"> <file name="c.txt"/> <file name="d.txt"/> </directory> </directory> $ python dirlist2.py -d foo Adding file dates Processing /home/uogbuji/foo <?xml version="1.0" encoding="UTF-8"?> <directory name="/home/uogbuji/foo"> <file date="2005-05-09" name="a.txt"/> <file date="2005-05-09" name="b.txt"/> <directory name="/home/uogbuji/foo/bar"> <file date="2005-05-09" name="c.txt"/> <file date="2005-05-09" name="d.txt"/> </directory> </directory> $ python dirlist2.py foo/ foo.xml Processing /home/uogbuji/foo $ cat foo.xml <?xml version="1.0" encoding="UTF-8"?> <directory name="/home/uogbuji/foo"> <file name="a.txt"/> <file name="b.txt"/> <directory name="/home/uogbuji/foo/bar"> <file name="c.txt"/> <file name="d.txt"/> </directory> </directory>
XML Europe has become XTech in 2005 (it now covers Web technologies overall, not just XML, and there will be a heavy Mozilla presence). The event is just around the corner, with tutorials on May 24 and the main conference from the 25th through the 27th in Amsterdam](http://www.xtech-conference.org/). Chaired by the very capable Edd Dumbill, this has consistently been my favorite conference. I'll be there again, presenting "Matching Python idioms to XML idioms" on the 25th. Check out the XTech 2005 wiki for more info.
Come to XTech for the technology, but you'll remember it for the people. Here are some of mine from XML Europe's past:
Bob DuCharme's panoramic shots of RDF heads at XML Europe 2004 (Amsterdam):
James Clark and me at XML Europe 2003 (London):
Edd and me at XML Europe 2003 (London):
I'll be in Amsterdam from the 24th (I'll probably be sleeping all that day) to the 29th (leaving early that day), and I expect I look forward to having a good time with colleagues while there.