Thinking XML #31 pubbed

Thinking XML: Schema standardization for top-down semantic transparency

Subtitle: The state of the art in XML modeling includes reusing models designed by others

Synopsis: This installment continues the review of the many different approaches to semantic transparency, discussing what they mean to the developer using XML. One way to save resources on a long journey is to hitchhike. In XML, you can take advantage of countless open schema initiatives that, in effect, use schema standardization for top-down semantic transparency. But it's not all a free ride. In this article, Uche Ogbuji looks at the advantages and disadvantages of third-party schema reuse. He also takes a moment to discuss The Semantic Technology Conference 2005, and respond to some recent discussion on the difficulty of modeling people's names.

This is a continuation of Thinking XML: State of the art in XML modeling ("What do developers need to know about the various approaches to semantic transparency?"). One more to go in this sub-series, though I'm a bit worried I may not be able to squeeze all my ranged thoughts on semanitc anchors into one coherent article. We'll see. After that, back to the fun hacking, on Python + WordNet.

[Uche Ogbuji]

via Copia

Moving to MetaWebLog API

Getting BloGTK to play nice with PyBlosxom
Updated for PyBlosxom, as used on Copia

Eric Gaumer and Ted Leung continue to be my main support in getting up to speed with blogging in general and PyBlosxom in particular. Thanks, guys. Always easier to go a-hacking when you have fellow hackers in the trenches with you.

So Eric's entry was a nudge to finally get MetaWebLog going on Copia. First of all, I applied his patch to BloGTK. Then I set up But first I had to update it to follow my convention of storing entries within directories for each date. If anyone else would like to go with that convention, you might want to check out my custom version of, linked above.

I'll specify a trackback when posting, to see if it does the trick. I'll post a comment with the apparent results.

But first of all, I'll save my blog entry to a local file. I'll be doing that a lot while using BloGTK. It has exhibited a few weird, not-obviously-reproducible quirks from time to time that worry me a bit. Unfortunately, the author has apparently abandoned the 1.1. branch in favor of BloGTK2, so I guess there will be some more hacking to do. Lucky thing I know Python. Even luckier thing Eric's learning Python so rapidly. At least the BloGTK2 preview looks cleaner. I'll look forward to it.

[Uche Ogbuji]

via Copia


                       Now the hedgerow
Is blanched for an hour with transitory blossom
Of snow, a bloom more sudden
Than that of summer, neither budding nor fading,
Not in the scheme of generation.
Where is the summer, the unimaginable
Zero summer?

-- T.S. Eliot -- "Little Gidding"

I already posted a quote from the first movement of "Little Gidding". Leave it to Colorado to impishly reply with a reason to post more from that great work. I wouldn't be surprised if the unimaginable zero summer, when it did venture outside of Antarctica, teased a bit around the Front Range before returning to its home. Then again, winter never seems to have sure dominion here (300 annual days of sunshine and all that), so it's fitting that it gets to sneak up on us at odd times, and give us a smart blow.

[Uche Ogbuji]

via Copia


Six a.m. -- getting out of bed again
Can’t get back in -- ‘cause sleep ain’t gonna pay the rent
Day to day -- they've got you working like a slave
Taking credit for the work you gave and stealing your raise -- well I...
I know you’re down, when you gon’ get up
I see you're down, when you gon’ get up

-- Amel Larrieux -- "Get Up"

Ah, one of the best songs to wake up with (or fall asleep with, or just...). If anyone could be said to have a voice that caresses the ear, ex-Groove Theory chick has it on lock. Her voice is a soft, succulent marvel. And I love the mellow-but-odd stylings of the video. And "Get Up" just the intro to one of the best albums that came out at the turn of the century. Oh, you slept on it like everyone else? I see you ain't down. When you gon' get up?

[Uche Ogbuji]

via Copia

Cookbooked, indeed

I learned from James Kew's blog that one of my Python Cookbook contributions made the second dead tree edition. He even has a nice pic scanned in:

12.10 Merging Continuous Text Events with a SAX Filter

Pas mal, ça. I should point out that I have an even more souped up version of that code available as part of Amara (class amara.saxtools.normalize_text_filter).

Hey, best news of all is that I suppose that means I'll be getting a copy of that book sometime soon (others, including Kew, have reported already getting their copies, but mail trucks struggle a bit making the chug up the hills to the Front Range). Bet. I'm looking forward to it.

[Uche Ogbuji]

via Copia

Serving up UTF-8 in PyBlosxom

It's 2005, people. No room for blogs and other sites without the most basic underpinnings of i18n. Turns out serving up UTF-8 from PyBlosxom is not the complete slam dunk I expected. To be fair, getting it to send UTF-8 was easy, given instructions in the config file. You want to add to your config a line such as:

py['blog_encoding'] = "utf-8"

This does cause PyBlosxom to render UTF-8 pages. But you also have to be sure the browser knows the pages are in UTF-8, or most browsers I've found will default to ISO-8859-1 and thus garble non-ASCII characters. I tried using HTML meta, by adding the following to the head template for my flavor (head.copia):

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

But this was not enough. Browsers rightly prefer to believe the HTTP content type header, so the trick lies in the content_type template for your flavor. Most HTML templates you find out there contain only text/html. You want to expand this to

text/html; charset=UTF-8

Your browser probably gives you a "page info" display of some sort with which you can check that your content type header is right. In Firefox 1.0.2 it's Tools -> Page Info or Ctrl+I. See the following example dialog:

Firefox page info dialog for Copia

[Uche Ogbuji]

via Copia

Help with and autoping

Other PyBlosxomers have been nice enough to listen to my various gripes about plug-ins and extras, and to lend a hand (thanks guys: that was quick).

Ted Leung rolled in the various changes to that I'd accumulated from various other hackers. If you use trackbacks in your pyBlosxom install, get the latest from ted's site. Looks as if he also has an updated version of, so I copped that. He added a plug-in,, but I wasn't sure precisely what it did since it didn't have the standard plug-in doc header. I may pore through the code when I get a chance.

Eric Gaumer put a lot of work into the autoping plugin. He hasn't released his patched version, but I'll look out for it soon. He tried to use it to ping back my site, but apparently my trackback/pingback RDF was wrong. I think I've fixed it. If you try to ping Copia through a comment URL and it fails again, please let me know.

[Uche Ogbuji]

via Copia

Using post date for PyBlosxom file hierarchy

download customized
download patch to

Most PyBlosxom tools seem to like the pattern of piling all blog entry files into the top-level datadir, or using at most one level of hierarchy in the form of simple categories. Yuck. Here is an example of resulting ugliness: I like to post entries entitled Quotidie every day. It used to be in Copia that the first one gets lumped into top-level as Quotidie.txt. When I post the next one using BloGTK, the XML-RPC back end would detect a filename clash and choose an alternate file name such as Quotidie5gJS68ade.txt. It would be nice if it could pick a prettier means for disambiguation (e.g. Quotidie.txt). I understand this wouldn't be all that easy to do because of problems with race conditions, but of course, it would be nice if such clashes were just extremely rare in the first place.

IMO The obvious way to do this is to use the date for disambiguation. I'm sure others have done such things (there is at least one plug-in I've seen that uses an embedded date in the filename to preserve the posting date), but I couldn't find very much about this idea on the Net. I ended up just looking into what it would take to hack something for Copia. The goal is to have each entry in a hierarchy according to date so that yesterday's Quotidie would be found in:


and today's in


I use keywords rather than categories for PyBlosxom, so the directory structure has no semantic meaning for purposes of Copia. Since I usually post through blogging tools over xml-rpc, it turned out that it was enough to patch While working on this plug-in I recoiled at how the file naming code is thrown carelessly into the body of the newPost function. I broke that functionality out into a new function blog_document_name, which returns the name of the file for the blog entry. This makes it easier for people to hack their own naming algorithms. My specialization implements the date-based hierarchy described above.

It seems there is still some work to be done. For one thing BloGTK doesn't seem to be able to find the files within this hierarchy when looking for postings to edit or delete. For another, PyBlosxom seems to get confused because permalinks to entries within the date-based hierarchy pulls in other entries as well as the intended one. I guess I have to learn more about how PyBlosxom manages its $datadir.

Update: I figured out this problem. PyBlosxom looks for discrete dates and years in the request URL, in order to handle requests for "all entries on day D". Turns out that a URL component of the form "20050408" doesn't trip this algorithm, so I just went with a structure of the form:


See the top of this post for links to to my custom and a patch from the shipped in the PyBlosxom contrib-1.2 package. I hope it helps someone else.

[Uche Ogbuji]

via Copia