Today's XML Wot He Said

Yaay! Starting 2006 with a WHS rather than a WTF. An auspicious sign. OK, OK, the original does not actually mention XML, but it's very relevant. Via Bill de hÓra a good summary of why modeling people's names and related constructs is always trickier than you think.

In Thinking XML #31 I discussed this matter in the third section "Naming names", jumping off from a comment from John Cowan on the OpenDocument mailing list.

IMHO (and I've worked on the problem for some years), all attempts to structure names so that they work correctly across cultures (and with scholarship being international now, the problem comes up repeatedly) just don't work....

[Uche Ogbuji]

via Copia

Thinking XML #34: Search engine enhancement using the XML WordNet server system

Updated—Fixed link to "Serving up WordNet as XML"

"Thinking XML: Search engine enhancement using the XML WordNet server system"

Subtitle: Also, use XSLT to create an RDF/XML representation of the WordNet data
Synopsis: In previous installments of this column, Uche Ogbuji introduced the WordNet natural language database, and showed how to represent database nodes as XML and serve this XML though the Web. In this article, he shows how to convert this XML to an RDF representation, and how to use the WordNet XML server to enrich search engine technology.

This is the final part of a mini-series within the column. The previous articles are:

In this article I write my own flavor of RDF schema for WordNet, a transform for conversion from the XML format presented previously, and a little demo app that shows how you can use WordNet to enhance search with synonym capabilities (and this time it's a much faster approach).

I hope to publicly host the WordNet server I've developed in this series once I get my home page's CherryPy setup updated for 2.2.

See other articles in the column. Comments here on Copia or on the column's official discussion forum. Next up in Thinking XML, RDF equivalents for the WordNet/XML.

[Uche Ogbuji]

via Copia

" Process Atom 1.0 with XSLT"

"Process Atom 1.0 with XSLT"

Learn XSLT techniques for processing Atom documents. In this tutorial, author Uche Ogbuji shows how with real-world use cases. (free registration required)

Atom 1.0 is [the] Internet Engineering Task Force (IETF) standard for Web feeds -- information updates on Web site contents. Since Atom is an XML format, XSLT is a powerful tool for processing it. In this tutorial, Uche Ogbuji looks at XSLT techniques for processing Atom documents, addressing real-life use cases.

This tutorial shows you how to:

  • Navigate the basic structure of Atom 1.0 documents using XPath expressions
  • Use these expressions to drive XSLT transformations of Atom source files
  • Deal with the complications of text and markup embedded in Atom files You will also learn how to use XSLT templates to generate valid Atom files, and how to check the validity of the results.

A companion piece to my recent article "Handling Atom Text and Content Constructs", this is a task-driven tutorial, taking a more deliberate pace and focusing on XSLT.

developerWorks has had a lot to say about Atom lately, courtesy James Snell (who is also writing a lot of useful Atom extension drafts).

I guess how do you celebrate Atom's promotion to RFC 4287? Why by cooking up even more reading material.

[Uche Ogbuji]

via Copia

4Suite XML 1.0b3

I posted the 4Suite XML 1.0b3 announcement today. This was supposed to be 1.0rc1 but then Jeremy went and added this little feature. Yeah, 4Suite now has full DTD validation, written in C. Just use the ValidatingReader. PyXML is no longer necessary for any 4Suite feature. I just need to figure out whether Jeremy ever sleeps. I hope to move quickly on a 1.0rc1. Perhaps even in January. We'll see.

I've updated my on-line manual

[Uche Ogbuji]

via Copia

AJAX and the Back button

Sylvain and I have discussed recently his discomfort with Web browser state of the art in the age of AJAX (to use a grand term, even though I strongly believe that AJAX is nothing but an incremental gathering of conventions rather than anything new and special). He has gathered his thoughts in a blog posting "The chicken and egg problem". I posted a comment, but I thought I might copy the comment here as well.

[Let me summarize] in brief my reasons for thinking that the current system is not broken, and that we do not need to change anything fundamental about browsers.

First of all the basic semantic of "link history" in a Web browser has not changed since the Mosaic days for a very good reason: it is empirical to HTTP, REST and all that. At each point a browser is at a particular resource, and it moves from one resource to another according to actuation of simple REST verbs. Within each resource the browser can do all sorts of complex things, including showing animations (Shockwave, SVG, etc.), providing mini-applications to the user (Java applets, Flash, AJAX, etc.) and more, but the resource has not changed. The boundary of resource is defined by the service provider, and the browser simply reflects that in the history, URL bar and other features. I don't think the back and forward buttons should be overloaded for any operation within a resource. They should not be used as hot buttons in Flash apps or in AJAX apps. This violates the layering that is so important to the success of the Web.

If service providers want to provide navigation within a particular resource, they should do so within the application, and not at the REST level. I want my Front office app to have an "Undo" button (which makes much more sense than "Back"). [Why do I need chameleon browser chrome when I can just do <xforms:button id="undo"><xforms:caption>Undo</xforms:caption>...</xforms:button>?] When I click browser "Back" I want that to exit the application and go to the previous resource.

IMO People think they have trouble with the back button and Ajax because they do not appreciate protocol layering very well, and because the AJAX tools do not yet help in this understanding. I think a better understanding of this layering and better tools are what's needed, not a major redesign of the browser idiom.

[Uche Ogbuji]

via Copia

Agile Web #2: "Handling Atom Text and Content Constructs"

"Handling Atom Text and Content Constructs"

Uche Ogbuji's Agile Web column returns with a look at handling some of the trickier issues in the Atom Syndication Format, which has recently become RFC 4287, an internet standard.

Second article in my new column is out. In this one I focus on Atom text and content constructs. I spent more time on the Atom examples and less on the sample processing code, but I thought more of the former would be especially useful. I've been working with and writing about Atom a lot lately, and in fact I have an IBM developerWorks tutorial for Atom processing in XSLT in production. It should be live some time today.

Joe Gregorio has been working the other half of the Atom pie (old joke for folks who've been following Atom), and he has a very timely new article out: "Catching Up with the Atom Publishing Protocol".

And once again, if you'd like to discuss Atom (syntax or publishing protocol), please do join us on the #atom channel on

[Uche Ogbuji]

via Copia

XSLT for converting from OPML to XBEL and XOXO

In all this Web feed hacking I've been working with my list originally exported from Lektora in OPML format. I wrote XSLT to convert from OPML to XBEL and XOXO. In the case of XOXO I really couldn't figure out any common conventions for Web feeds so I made up my own for now. The resulting XBEL looks a lot easier to work with, so I'm propose extensions for feed URL / site URL coupling in the renewal of XBEL. I figured my XSLT might be useful to others, so here are the links:

Going from XBEL to OPML, I've been using Dan MacTough's XSLT. (He also has an XBEL to XHTML transform). I sometimes have to tweak the resulting attributes to deal with xmlUrl/url and title/text type OPML madness.

I've also posted my Web feed list in XBEL form. It uses old school XBEL 1.0, and not any of the metadata additions I'm hoping to see in 1.2. As such, it's only a list of Web feeds and doesn't include the corresponding Weblog home pages.

[Uche Ogbuji]

via Copia

XML Bookmark Exchange Language (XBEL) gets a proper home

XML Bookmark Exchange Language (XBEL)

The Python XML SIG has had some really great times in its history. One of the highlights is the development of XML Bookmark Exchange Language (XBEL). In September of 1998, just as I was joining the group, they were developing this bookmarks exchange language that's still used in more browsers and bookmark management projects than any other particular format. The XML-SIG has fallen on quiet times, and one of the side effects of this is that additional work on XBEL has been neglected.

Earlier this year we agreed on the SIG to give XBEL its own home on SourceForge, but no one stepped up to make it happen, until John L. Clark got to it last week (thanks, John).

XBEL's new home is The old home is still up, but I think we should move it to, with some updates and maybe a design update (maybe make the page XHTML). We'll be discussing such things on the new XBEL mailing list, so please come join us. The main goal is to add more features to XBEL needed for its original role in browser bookmarks exchange, but I'm also interested in making it a useful format for general Web resource lists such as feed lists (e.g. a superior alternative to OPML).

John wrote up a good summary of recent discussions of XBEL.

I'll have more on our efforts summarized here on Copia as we progress.

[Uche Ogbuji]

via Copia