i18n for XSLT in 4Suite

Prodded by discussion on the CherryPy list I have implemented and checked in a 4Suite XSLT extension for internationalization using Python's gettext facilities for the underlying support. Here is how it works. Sample XML:

<doc>
  <msg>hello</msg>
  <msg>goodbye</msg>
</doc>

Sample XSLT:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:f="http://xmlns.4suite.org/ext"
  extension-element-prefixes="f"
>
  <f:setup-translations domain="test"/>
  <xsl:template match="msg">
    <f:gettext><xsl:apply-templates/></f:gettext>
  </xsl:template>
</xsl:stylesheet>

The f:setup-translations and f:gettext extension elements are the key. The former looks up and installs the domain "test" for use in your XSLT. Replace it with the domain used by your application. The latter extension evaluates its body to get a string value, and then looks up this string in the installed translation.

Assuming you have a test.mo installed in the right place, say that translates "hello" to "howdy" and "goodbye" to "so long".

$ 4xslt test.xml test.xsl
<?xml version="1.0" encoding="UTF-8"?>
howdy
  so long

I trimmed some white space for formatting, but you get the idea. The translations are applied automatically according to your locale.

This operates via Python's gettext facilities which means that it's much more efficient than, say, the docbook XSLT approach to i18n.

For those who want to give it a whirl, here's a quick step-by-step. All the needed files are available here on copia.

Create a sandbox locale directory:

mkdir -p /tmp/locale/en_US/LC_MESSAGES/

Copy in the catalog. You may need to create a different catalog for your own language if your system will not be selecting en_US as locale (remember that you can hack the locale via the environment)

cp en_US.mo /tmp/locale/en_US/LC_MESSAGES/test.mo

Your locale is probably not en_US. If not, you can:

  • temporarily override your locale to en_us using export LANG=en_US, or the equivalent command for your shell
  • create translations for your locale (just two strings to translate). I use poedit, which is makes dealing with .pos simple enough. Then replace en_US in all the above instructions with your own locale and the .mo file you created.

Anyway, the f:setup-translations and f:gettext extensions are now checked into 4Suite. You can either update to current 4Suite CVS, or just download the one changed file, Ft/Xml/Xslt/BuiltInExtElements.py and copy it into your 4Suite codebase. It works fine as a drop-in to 4Suite 1.0b1.

[Uche Ogbuji]

via Copia

Wizard worries

"Just because..."—Sean McGrath

Just because everyone can now create an XML schema because of all the easy to use GUI tools, doesn't mean that everyone should create XML schemas.

Yes indeed. This has been a growing problem for quite a while now as people give over their XML tasks to bottled genies. As I concluded in "The worry about program wizards":

In the end, there is no substitute for programmer expertise and experience. Wizards do have their place, but it seems that their occasional convenience should not form a backbone consideration for the development of any technology. In particular, it is dangerous to lead developments in XML and Web services with a significant purpose of reviving the great age of wizards.

Sean again:

I can use AutoCAD but I wouldn't dream of designing a house because I don't know enough about houses or building or any of that stuff.

Very well put, as usual.

[Uche Ogbuji]

via Copia

Amara equivalents of Mike Kay's XSLT 2.0, XQuery examples

Since seeing Mike Kay's presentation at XTech 2005 I've been meaning to write up some Amara equivalents to the examples in the paper, "Comparing XSLT and XQuery". Here they are.

This is not meant to be an advocacy piece, but rather a set of useful examples. I think the Amara examples tend to be easier to follow for typical programmers (although they also expose some things I'd like to improve), but with XSLT and XQuery you get cleaner declarative semantics, and cross-language support.

It is by no means always true that an XSLT stylesheet (whether 1.0 or 2.0) is longer than the equivalent in XQuery. Consider the simple task: create a copy of a document that is identical to the original except that all NOTE attributes are omitted. Here is an XSLT stylesheet that does the job. It's a simple variation on the standard identity template that forms part of every XSLT developer's repertoire:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@* except @NOTE"/>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

In XQuery, lacking an apply-templates instruction and built-in template rules, the recursive descent has to be programmed by hand:

declare function local:copy($node as element()) {
  element {node-name($node)} {
    (@* except @NOTE,
    for $c in child::node
    return typeswitch($c) 
      case $e as element() return local:copy($a)
      case $t as text() return $t
      case $c as comment() return $c
      case $p as processing-instruction return $p
  }
};

local:copy(/*)

Here is Amara code to do the same thing:

def ident_except_note(doc):
    for elem in doc.xml_xpath(u'//*[@NOTE]'):
        del elem.NOTE
    print doc.xml()

Later on in the paper:

...nearly every FLWOR expression has a direct equivalent in XSLT. For example, to take a query from the XMark benchmark:

for    $b in doc("auction.xml")/site/regions//item
let    $k := $b/name
order by $k
return <item name="{$k}">{ $b/location } </item>

is equivalent to the XSLT code:

<xsl:for-each select="doc('auction.xml')/site/regions//item">
  <xsl:sort select="name"/>
  <item name="{name}"
     <xsl:value-of select="location"/>
  </item>
</xsl:for-each>

In Amara:

def sort_by_name():
    doc = binderytools.bind_file('auction.xml')
    newdoc = binderytools.create_document()
    items = doc.xml_xpath(u'/site/regions//item')
    items.sort()
    for item in items:
        newdoc.xml_append(
            newdoc.xml_element(u'item', content=item)
        )
    newdoc.xml()

This is the first of a couple of examples from XMark. To understand the examples more fully you might want to browse the paper, "The XML Benchmark Project". This was the first I'd heard of XMark, and it seems a pretty useful benchmarking test case, except that it's very heavy on records-like XML (not much on prosy, narrative documents with mixed content, significant element order, and the like). As, such I think it could only ever be a sliver of one half of any comprehensive benchmarking framework.

I think the main thing this makes me wonder about Amara is whether there is any way to make the element creation API a bit simpler, but that's not a new point for me to ponder, and if I can think of anything nicer, I'll work on it post 1.0.

Kay's paper next takes on more complex example from XMark: "Q9: List the names of persons an the names of items they bought in Europe". In database terms this is a joins across person, closed_auction and item element sets. In XQuery:

for $p in doc("auction.xml")/site/people/person
let $a := 
   for $t in doc("auction.xml")/site/closed_auctions/closed_auction
   let $n := for $t2 in doc("auction.xml")/site/regions/europe/item
                       where  $t/itemref/@item = $t2/@id
                       return $t2
       where $p/@id = $t/buyer/@person
       return <item> {$n/name} </item>
return <person name="{$p/name}">{ $a }</person>

Mike Kay's XSLT 2.0 equivalent.

<xsl:for-each select="doc('auction.xml')/site/people/person">
  <xsl:variable name="p" select="."/>
  <xsl:variable name="a" as="element(item)*">
    <xsl:for-each 
        select="doc('auction.xml')/site/closed_auctions/closed_auction">
      <xsl:variable name="t" select="."/>
      <xsl:variable name="n" 
           select="doc('auction.xml')/site/regions/europe/item
                               [$t/itemref/@item = @id]"/>
      <xsl:if test="$p/@id = $t/buyer/person">
        <item><xsl:copy-of select="$n/name"/></item>
      </xsl:if>
  </xsl:variable>
  <person name="{$p/name}">
    <xsl:copy-of select="$a"/>
  </person>
</xsl:for-each>

In Amara:

def closed_auction_items_by_name():
    doc = binderytools.bind_file('auction.xml')
    newdoc = binderytools.create_document()
    #Iterate over each person
    for person in doc.xml_xpath(u'/site/people/person'):
        #Prepare the wrapper element for each person
        person_elem = newdoc.xml_element(
            u'person',
            attributes={u'name': unicode(person.name)}
        )
        newdoc.xml_append(person_elem)
        #Join to compute all the items this person bought in Europe
        items = [ unicode(item.name)
          for closed in doc.xml_xpath(u'/site/closed_auctions/closed_auction')
          for item in doc.xml_xpath(u'/site/regions/europe/item')
          if (item.id == closed.itemref.item
              and person.id == closed.buyer.person)
        ]
        #XML chunk with results of join
        for item in items:
            person_elem.xml_append(
                newdoc.xml_element(u'item', content=item)
            )
    #All done.  Print out the resulting document
    print newdoc.xml()

I think the central loop in this case is much clearer as a Python list comprehension than in either the XQuery or XSLT 2.0 case, but I think Amara suffers a bit from the less literal element creation syntax, and for the need to "cast" to Unicode. I would like to lay out cases where casts from bound XML structures to Unicode make sense, so I can get user feedback and implement accordingly. Kay's final example is as follows.

The following code, for example, replaces the text see [Kay, 93] with see Kay93.

<xsl:analyze-string select="$input" regex="\[(.*),(.*)\]">
<xsl:matching-substring>
  <citation>
    <author><xsl:value-of select="regex-group(1)"/></author>
    <year><xsl:value-of select="regex-group(2)"/></year>
  </citation>
</xsl:matching-substring>
<xsl:non-matching-substring>
  <xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>

The only way of achieving this transformation using XQuery 1.0 is to write some fairly convoluted recursive functions.

Here is the Amara version:

import re
PATTERN = re.compile(r'[(.*),(.*)]')
def repl_func(m):
    citation = doc.xml_element(u'item')
    citation.xml_append(doc.xml_element(u'author', content=m.group (1)))
    citation.xml_append(doc.xml_element(u'year', content=m.group (2)))
    return citation.xml(omitXmlDeclaration=u'yes')
text = u'see [Kay, 93]'
print PATTERN.subn(repl_func, text)

I think this is very smooth, with the only possible rough patch again being the output generation syntax.

I should mention that Amara's output syntax isn't really bad. It's just verbose because of its Python idiom. XQuery and XSLT have the advantage that you can pretty much write XML in-line into the code (the templating approach), whereas Python's syntax doesn't allow for this. There has been a lot of discussion of more literal XML template syntax for Python and other languages, but I tend to think it's not worth it, even considering that it would simplify the XML generation syntax of tools such as Amara. Maybe it would be nice to have a lightweight templating system that allows you to write XSLT template chunks in-line with Amara code for data processing, but then, as with most such templating systems, you run into issues of poor model/presentation separation. Clearly this is a matter for much more pondering.

[Uche Ogbuji]

via Copia

Push vs pull XSLT

I often have to explain or refer to this distinction in XSLT programming style, and I've wanted a sort of all-in-one index of useful writing on the matter.

  • The discussion I most often refer people to is "2. Getting started with XSLT and XPath (cont'd)" by XSL tutor extraordinaire Ken Holman of Crane Softwrights Ltd.. It's part of his long article "What is XSLT?". In this article subsection he starts by discussing the difference between "explicit" and "implicit" stylesheets. The latter are what the XSLT 1.0 spec calls "literal result element as root", and it's such a rare pattern in my observation that you might want to just skip to section 2.2.6, which is the relevant part for push vs. pull.
  • Dave Pawson's XSLT FAQ has one of the earliest attempts to gather notes on the push/pull distinction
  • R. Alexander Milowski has a nice slide set, "How Not to Use XSLT". There are some other nice nuggest here besides illustration fo the push/pull distinction.
  • Hack #51 in XML Hacks is "Write Push and Pull Stylesheets", and is a decent discussion of the topic.
  • I reluctantly include "XML for Data: XSL style sheets: push or pull?", by Kevin Williams in this round-up. I personally am heavily biased towards push-style XSLT (as are most of my XSLT expert colleagues) and Kevin seems to be heavily biased towards pull. I think his reasoning about the readability of pull is quite shallow, and might make sense to readers when dealing with simplistic transforms, but not with ones of more typical size and complexity.

Way back in 2001 or so I wrote my reasons for preferring push over pull. Reviewing what I wrote I think it still covers my point of view:

Pull is a bad idea from the didactic POV. If one wants people to learn how to generate HTML and other simple documents as quickly as possible, there is no doubt that most people with any background in the more popular computer languages would catch on to pull more quickly than push.

But it's a false simplicity. Pull is easy when the problem space is simple, as is the case with so many toy examples necessary when teaching beginners. But programming difficulty scales at an alarming rate with the complexity of the problem space. It doesn't take long to run into real-world examples where pull is nearly impossible to program correctly.

Push on the other hand, while for some people more difficult at first, is a much more powerful approach for solving complex problems. And in almost all cases it is less prone to defect and easier to maintain.

This is not functional programming bigotry for its own sake. Since the invasion of webmasters and amateurs of scripting, it is easy to forget that document processing is one of the most delicate areas of inquiry in computer science, and it has called for elegant solutions from Knuth's TeX to Clark & co's DSSSL, to XSLT. As Paul Tchistopolskii explained here. XSLT at its best is about pipes and filters. XSLT's weakest points are where this model breaks down.

Whether your favorite conceptual module is pipes and filters, tuple spaces, or just good ol' lambdas, a fundamental understanding of push techniques is essential if you want to ever do any serious development in XSLT. New arrivals to this field take short-cuts only to get lost later. From a purely practical point of view, I think it's important to teach apply-templates, modes and friends well before for-each, and bitchin' value-of tricks.

If anyone wants to incorporate this stuff into Wikipedia, XSLT FAQ, or whatever, go right ahead: as with almost everything on Copia, it's CC attribution-sharealike (main task this weekend is to actually assert that properly in the templates).

[Uche Ogbuji]

via Copia

Python community: XIST

XIST 2.10

XIST (simple, free license) is a very capable package for XML and HTML processing and generation. In the maintainers' words:

XIST is an extensible HTML/XML generator written in Python. XIST is also a DOM parser (built on top of SAX2) with a very simple and Pythonesque tree API. Every XML element type corresponds to a Python class, and these Python classes provide a conversion method to transform the XML tree (e.g. into HTML). XIST can be considered "object oriented XSL".

I covered it recently in "Writing and Reading XML with XIST". There are some API tweaks and bug fixes as well as some test suite infrastructure changes. The full, long list of changes is given in Walter Dörwald's announcement.

[Uche Ogbuji]

via Copia

What's up with the dc:type value recommendations?

In my work at Sun we've been looking for better ways to rationalize content purpose metadata for management of aggregated XML records. I had occasion to look at the DCMI Type Vocabulary. DCMI Recommendation. This is an ancient document, and was not sure what to make of it. One thing for sure is that we can't use it, or anything like it. We'll have to come up with our own values. I do wonder about the rationale behind that list. It seems quite the hotch-potch:

Now the definition of dc:type is: "The nature or genre of the content of the resource". I can see how one could fit parts of the above list into this definition, but when I read this definition before seeing the list, I assumed I'd see things such as "poem", "short story", "essay", "news report", etc. From the business point of view, I'd be looking for "brochure", "white paper", "ad copy", "memo", etc. I tend to think this would be more generally useful (if much harder to standardize). Maybe ease of standardization was the rationale for the above? But even if so, it seems an odd mix. I've run out of time for now to ponder the matter further (gotta get back to that client work), but do I wonder whether there are recommendations for dc:type that more closely meet my expectations.

[Uche Ogbuji]

via Copia

XTech: Mike Kay on XQuery and XSLT 2.0

As I mentioned in my more complete report Mike Kay's presentation was worth a further entry (besides, my note-taking discipline went to hell right after his talk, so I don't have as much to work with on the rest).

The title was Comparing XSLT and XQuery. Much of what Mike discussed applies to XSLT 1.0 as well as 2.0. He did spend some time talking about the role of XPath 2.0 as the basis of both XSLT 2.0 and XQuery. As he puts it XSLT 2.0 is a 2-language system. You call XPath from specific constructs within XSLT. XQuery on the other hand has XPath incorporated into basic language. The way I think of it, XSLT is a host language for XPath, while XQuery is a (greatly) extended version of XPath.

I think the most important contribution Mike made in this paper was a very sober appraisal of the barriers to learning XSLT and XQuery. The difficulties developers have with XSLT are well known: we've had some 6 years to discuss them them. Mike summarizes them as follows:

  1. XML fundamentals: encoding, entities, white space, namespaces, etc.
  2. Declarative programming: variables, recursion, paths, grouping
  3. Data model: the mental shift from the angle brackets they see to the abstraction of nodes
    • confusion between what devs see in the XML versus what their program sees
    • confusion over proper output, e.g. subtlety that creating an element in the output tree is not the same thing as creating text containing angle brackets
  4. Rule-based programming:
    • template dispatch, which forces a non-linear way of thinking about transforms. Mike Kay mentioned the parallels with GUI programming. (I tend to think this common comparison is generally right, but is just stretched enough to be unhelpful in determining how to get developers in the right mind-set).

Mike Kay had Ken Holman in the audience so he did the sensible thing in asking the foremost expert on XML-related training. Ken agreed: "Yep. That hits the high points of the first day of getting people to know what's going on in XSLT."

In my opinion, there is one more category of difficulty, which is capability limitations in XSLT 1.0 (most of which are addressed in EXSLT or XSLT 2.0). This includes frustrations such as the result tree fragment/node set split, the poor facilities for string manipulation, node set operations, date/time processing, etc.

Mike feels that XQuery only eliminates the 4th barrier (it has no templates). Reading between the lines, this is a powerful indictment of the idea of a separate XQuery. I think it's hard to argue that we need such a complex separate language purely from the pedagogical viewpoint (no, I'm not saying "andragogical").

Mike pointed out that people coming to XQuery from SQL tend to write everything in FLWOR expressions (rather than, say XPath with predicates). FLOWR is comfy and SQLey, but this just annoys me. I've pointed out in my bemoaning of SPARQL how unfortunate I think it is that SQL people insist on turning all other languages into some nasty mutation of SQL. I was suitably entertained by seeing Mike demonstrate how easy it is to get caught up in the subtle differences between SQL and FLOWR. Again I'm reading between the lines, but I got the sense that Mike was himself not unamused by the task of pointing out such trip-wires.

Mike finished up with a benchmark which he prefaced with an armload of caveats (a healthy practice, as I've learned from experiences in benchmarking). Saxon running XSLT trounced all processors except for MSXML on a certain task involving the XSLT analogue of a relational join, and with document sizes of 1MB, 4MB and 10MB. In a surprise result, Saxon running XSLT even beat Saxon running XQuery (As Mike said, "in the XQuery world implementors look to optimize joins"). All the XSLT processors suffered N^2 performance degradation with doc size. But strangely enough some of the XQuery tools did as well, including Galax. Qizx did show linear characteristics.

Kay then proved that there is no reason one cannot optimize joins for XSLT by writing a join optimizer for Saxon/XSLT. When he updated the benchmark result slide to show the fruit of this join optimization, we were all astonished to see how thoroughly Saxon ended up trouncing everything in the field at all three doc sizes. Now that, my friends, is the work of a superstar developer.

I'll be tinkering with how Amara handles some of Mike's XSLT and XQuery examples in a coming entry.

See also:

[Uche Ogbuji]

via Copia

Scattered notes from XTech

XTech 2005. Amsterdam. Lovely time. But first of all, I went for a conference. Edd Dumbill outdid himself this time. The first coup de maître was sculpting the tracks to increase the interdisciplinary energy of the meet. The browser track brought out a lot of new faces and provided a jolt of energy. There did seem to be a bit of a divide between the browser types and the XML types, but only as much as one would expect from the fact that XML types tend to know each other, and ditto browser types. There was plenty of crosstalk between the disciplines as well.

Second touch: focus on open data, and all the excitement in that area (Creative Commons, remixing/mash-ups, picture sharing, multimedia sharing, microformats, Weblogging, content syndication, Semantic technology, podcasting, screencasting, personal information spaces, corporate info spaces, public info spaces, etc.) and watch the BBC take over (with they bad selves). And don't fret: "damn, maybe we should lighten up on the BBC bias it he speakers". No, just go with it. Recognize that they are putting forth great topics, and that everyone is amped about how the BBC is leading the way on so many information technology and policy fronts.

Third touch: foster collaboration. Put up a Wiki, encourage folks to an IRC channel, aggregate people's Weblog postings and snapshots into one place, Planet XTech, and cook up a fun little challenge to go with the theme of open data. For that last bit Edd put out an XML representation of the conference schedule and asked folks to do something cool with it. I didn't do as much with it as I'd hoped. When I finally got my presentation going I used the posted grid.xml as a demo file for playing with Amara, but I wished it had more content, especially mixed content (it's very attribute heavy). I've suggested on the XTech Wiki that if Edd does the same thing next time, that he work in paper abstracts, or something like that, to get in more text content.

I said "When I finally got my presentation going", which hints at the story of my RAI (venue for XTech) jinx. Last year in Amsterdam I couldn't get my Dell 8600 running Fedora Core 3 to agree with the projectors at the RAI. As Elliotte Rusty Harold understates in his notes from the 2004 conference:

After some technical glitches, Uche Ogbuji is talking about XML good practices and antipatterns in a talk entitled "XML Design Principles for Form and Function"

In fact I ended up having to switch to OpenOffice on Windows, and the attendees endured a font only a hippie could love (Apparently Luxi Sans is not installed by default on Windows and OO/Win has a very strange way of finding a substitute). I'm vain enough not to miss quoting another bit about my talk from Elliotte:

A very good talk. I look forward to reading the paper. FYI, he's a wonderful speaker; probably the best I've heard here yet.

Gratifying to know I managed a good talk under pressure. I hope I did so again this time, because the RAI projectors were no more friendly. The topic was "Matching Python idioms to XML idioms". Remembering the last year's headache I asked about a projector to use to test out my presentation (I was on the first day, Weds). Usually conference speakers' rooms have a spare projector for this purpose, but it looks as if the RAI couldn't supply one. I crossed my fingers and arrived for my talk the dutiful 15 minutes early. Eric van der Vlist was up before me in the block. The AV guy came along and they spent quite a while struggling with Eric's laptop (Several speakers had trouble with the RAI projectors). They finally worked out a 640x480 arrangement that caused him to have to pan around his screen. This took a while, and the AV guy bolted right afterward and was not there to help me set up my own laptop. Naturally, neither I nor the very helpful Michel Biezunski (our session chair) were able to get it to work, and we had to turn things over to Eric to start his talk.

We then both went in search of the AV guy, and it took forever to find him. No, they didn't have a spare projector that we could use to set up my laptop in time for my talk. We'd just have to wait for Eric to finish and hope for the best (insert choice sailor's vocabulary here). My time slot came and we spent 20 minutes trying every setting on my laptop and every setting on their projector. The AV guys (yeah, when it was crisis time, they actually found they had more than one) muttered taunts about Linux, and it's a lucky thing I was bent on staying calm. I present quite often, and I do usually have to try out a few settings to get things to work, but in my encounters it's only the RAI projectors that seem completely incapable to project from my Linux laptop. In all, I witnessed 4 speakers (3 on Linux and surprisingly one on Mac OS X) who had big problems with the RAI projectors, including one of the keynote speakers. I suspect others had problems as well.

I couldn't take the obvious escape route of borrowing someone else's laptop because the crux of my talk was a demo of Amara and I'd have to install all that as well (Several kind volunteers including Michel had 4Suite installed, but not Amara). After 20 minutes, we agreed that I'd go on with my talk on Michel's computer (Thinkpad running Red Hat 9 and it worked with the projector immediately!), skip the demo, and we'd find another time slot for me to give the entire talk the next day. Quite a few people stuck around through this mess and I'm grateful to them.

The next day we installed Amara on Michel's computer and I gave the presentation in its proper form right after lunch. There was great attendance for this reprise, considering everything. The Amara demo went fine, except that the grid.xml I was using as a sample gave too few opportunities to show off text manipulation. I'll post a bit later on thoughts relating to Amara, stemming from the conference. Norm Walsh was especially kind and encouraging about my presentation woes, and he has also been kind in his notes on XTech 2005:

The presentation [deities] did not smile on Uche Ogbuji. He struggled mightily to get his presentation of Matching Python Idioms to XML Idioms off the ground. In vain, as it turned out (AV problems were all too common for a modern conference center), but he was generous enough try again the next day and it was worth it (thanks Uche!). I'm slowly becoming a Python convert and some of the excellent work that he's done at Fourthought to provide Python access to standard XML in ways that feel natural in Python is part of the appeal.

That's the precise idea. A tool for processing XML that works for both Python heads and XML heads. The whole point of my presentation was how hard this is to accomplish, and how just about every Python tool (including earlier releases of 4Suite) accommodates one side and not the other. The response to Amara from both the Python heads and XML heads makes me feel I've finally struck the right balance.

I got a lot out of the other XTech talks. Read Norm on the keynotes: he pretty much had the same impressions as I did. Props to Michael Kay for his great presentation comparing XSLT 2.0 and XML Query. I took enough notes at that one for a separate entry, which will follow this one. I missed a lot of the talks between Kay's and my own while I was trying (unsuccessfully) to head off the AV gremlins.

Other talks to highlight: Jon Trowbridge's on Beagle (who, you guessed it, had AV problems that ate up a chunk of his time slot). From the project Wiki:

Beagle is a search tool that ransacks your personal information space to find whatever you're looking for. Beagle can search in many different domains: documents, emails, web history, IM/IRC conversation, source code, images, music files, applications and [much more]

Edd had already introduced me to Beagle, but it was really cool to see it in action. I'll have to check it out. Jon also pointed out TomBoy, "a desktop note-taking application for Linux and Unix. Simple and easy to use, but with potential to help you organize the ideas and information you deal with every day." Two projects I'll have to give a spin. Props to Jon for shrugging off the AV woes and giving a fun and relaxed talk.

Robert O'Callahan's talk on the new canvas tag for Mozilla and Safari was memorable if for nothing else than the experience of surfing Google at a 45° angle, with no apparent loss in snappiness. This canvas thingie looks wicked cool, and it's good to see them working to incorporate SVG. I've heard a lot of grumbling from W3C types about canvas, and all we poor browser users in the middle can hope for is some rapid conversion of cool technologies such as XAML, XUL, canvas, SVG, etc. Others have blogged about the opportunities and anxieties opened up by the WHATWG, which one commentator said should have been the "WHAT Task Force" because "WTF" would have been a better acronym. I'm a neutral in these matters, except that I really do with browser folks would do what they can to push people along to XHTML 2.0 rather than cooking up HTML 5.0 and such.

Matt Biddulph was one of the BBC Massive on hand, and his talk "The Application of Weblike Design to Data - Designing Data for Reuse" offered a lot of practical tips on how to usefully open up a large body of data from a large organization.

Dominique Hazaël-Massieux gave a talk on GRDDL (O most unfortunate project name), which was my first hearing of the technology. My brief characterization of GRDDL is as an attempt to pull the Wild West ethos of microformats into the rather more controlled sphere of RDF. It touches on topics in which I've been active for years, including tools for mapping XML to RDF. I've argued all these years that RDF folks will have to embrace general XML in place of the RDF/XML vocabulary if they are to make much progress. They will have to foster tools that make extracting RDF model data from XML a no-brainer. It's great to see the W3C finally stirring in this direction. Dom's presented very well. I asked about the use of other systems, such as schema annotation, for the XML to RDF mapping. It seemed to me that GRDDL was mostly geared towards XSLT. Dom said it is meant to be independent of the mapping mechanism, but in my post-conference reading I'm not so sure. I'll have to ponder this matter more and perhaps post my thoughts. Dom also mentioned PiggyBank, "the Semantic Web extension for Firefox". Kingsley Idehen has a nice blurb on this software. I do hesitate to use it because someone mentioned to me how PiggyBank had gone into crazy thrash mode at one point. I don't muck with my FireFox set-up lightly.

Rick Jelliffe showed off Topologi's lightweight browser TreeWorld, which is XML-oriented and suitable for embedding into other applications.

Others have blogged Jean Paoli's closing keynote (http://glazman.org/weblog/dotclear/index.php?2005/05/29/1059-adam-3">Leigh, etc.). Seems I'm not the only one who was put off by the straight-up product pitch. At least he did a bit of a service by clearly saying "Binary XML: No please". Check out more quotes from XTech.

The conference was superb. Do be sure not to miss it next year. It's looking like Amsterdam will be the venue again. And what of Amsterdam? Besides the conference I had a great time with friends. I'll post on that later.

For the most comprehensive report I've seen to date, see Micah Dubinko's article.

[Uche Ogbuji]

via Copia

Towards EXSLT "1.0"

For a backgrounder on EXSLT, see my article "EXSLT by example". EXSLT was born of general contribution of the community on the XSL mailing list. Jim Fuller, Dave Pawson and Jeni Tennison started off a private thread to turn the list discussion into action, and I soon joined them. The result was the exslt web site, mailing list and initial set of extension specifications. We all agree that Jeni put the most work into it, and is the leading light of EXSLT, but she has been very busy lately (genius tends to occupy itself in overwhelming volume), and the rest of us have also had a hard time giving EXSLT the effort it desires. Last week, however, Jim Fuller and I decided to take some steps towards getting EXSLT back on course. In part, this is because some people want to get things moving on EXSLT for XSLT 2.0.

Here is a list of the things we're considering taking on in order to get to something we can call "EXSLT 1.0".

Clearing up licensing

I have proposed switching from the intended (but unstated) public domain to a CC attribution license for all EXSLT work products. It seems everyone accepts this, but the main remaining question is assignment of copyright. Some of the suggestions:

  • Assign to all the four managers. But do copyright decisions then have to be unanimous amongst us? Is it a problem that we reside in different countries?
  • Assign to Jeni Tennison alone. But does she have the bandwidth to dispatch all copyright matters?
  • Assign to the EXSLT.org domain. I can't now find who suggested this to me, but my main question is the legality of such a move.

Creating some caretaker foundation for EXSLT is not really an option, because no one (I think) has time for all the work that entails.

Improving the information content of the EXSLT Web site

We need a good overview. We need news that we can keep up to date without too much effort, including references to the many good articles on EXSLT. We need better documentation for almost all modules (the perennial example is that users get confused as to whether or not they have to download stuff from EXSLT.org in order to use the extensions). We need a better way to manage information about implementations. We need a FAQ. We should address outdated URLs from EXSLT specs (e.g. http://lists.fourthought.com/pipermail/exslt/2005-January/002169.html).

>

We also need to think about the confusion between the namespaces URLs used in EXSLT. The extension URIs are different from the specification URLs. I understand how this distinction came about, but I think it has proven too confusing in practice. I suggest that we should put a RDDL gloss at all the namespace URL end locations, including a summary of the extension module and a pointer to the specifications.

Packaging EXSLT

We need to provide a package we can call "EXSLT 1.0". Something clear, recognizable and ready for download. It should include at least documentation on modules that proved useful over time. We should update the reference stylesheet implementations and examples (see e.g. http://lists.fourthought.com/pipermail/exslt/2005-February/002174.html).

>

There have also been suggestions and proposals for new modules on the list, including:

My feeling is that we should go to 1.0 with the stuff that's already established on the site. We can add modules for a 1.1. release.

And then we would set up for EXSLT 2.0 extensions, which I understand people are itching to seed (I'll probably watch from the side lines as I work through my overall opinions on XSLT 2.0).

Jim has promised to jump-start all these tasks. I'll keep folks posted.

I welcome comments on this topic here on Copia, but if you really want to help, or want to engage in discussion with the overall community, please do join the mailing list and chip in.

[Uche Ogbuji]

via Copia

Off to Amsterdam (XTech), and a note about comments on Copia

Well, I'm heading off to catch the flight to Amsterdam for XTech 2005. I'll blog as much as I can, and I have some FOSS work to do as well, on Amara, especially, to prep the 1.0b2 release.

We've had the spam comment folks doing their thing here, and so far I've been able to keep them mostly in check by deleting them soon after they appear. The trip will probably leave too big a hole for them, so for now I've turned on draft mode for comments. All comments will be held until explicitly approved. I apologize for any inconvenience. I've been tinkering on a more solid spam fighting system, building on the great work others have done on black-listing the punks.

[Uche Ogbuji]

via Copia