XML Universal names (namespaces): To fuse or not to fuse

I ran into Ken MacLeod on the Atom IRC channel today. Actually I think I've chatted with him before but I didn't know the nick I was responding to was Ken. Certainly a fortuitous discovery, but more importantly Ken drew my attention to an old Weblog posting I'd somehow missed. In it he makes two separate points that I think overrun in perhaps a confusing way. Firstly he advocates that XML APIs strictly treat a node's universal (i.e. namespace-qualified) name as a tightly bound unit. I (arbitrarily) call this fusing the universal name. An example is APIs such as ElementTree that use James Clark notation for namespaces.

The second point is that some XML APIs have really ungainly syntax for handling namespaces, with SAX and DOM being the worst offenders (we both agree that the Java/IDL heritage of these APIs is the worst problem).

To take the first point first, I disagree that APIs based on fused names are superior. Yes an XML universal name should conceptually be a unit, but in practice it is not, and people often have a real need to work severally with either piece of the tuple. I used the analogy of complex numbers in the IRC discussion. The complex number 3 + 2j is a single number, but there is nothing wrong with an API's making it easy for a user to manipulate its real and imaginary parts. It's up to the developer not to somehow abuse this flexibility.

The second point is well taken, but I strongly believe that it is not the lack of fused names that makes an API poor. SAX (SAX 2, to be precise) is poor because of the redundancy and odd structural conventions in reporting information such as prefixes. DOM (DOM Level 2, to be precise) is poor because of the bewildering decision to maintain namespace declarations as separate attribute objects in addition to the redundant information offered as node object properties. Both owe much of their weakness to concerns for backwards compatability with non-namespace-aware APIs.

For the most part APIs that were born and bred in the era of namespaces are much less tortured, regardless of how tightly or loosely they expose the parts of a universal name. If anything, I believe that it's better for the API to make it easy for the developer to separate namespace name, local name and prefix. Yes, even prefix, because the golden world in which prefixes are irrelevant does not exist. It was ruined by the advent of QNames in content, or "hidden namespaces". Yes, this happens to be one negative side-effect of the success of XSLT and XPath, which were great specs for the most part, but also represented the first real triumph of the hidden namespaces idea that has left such a mess in its wake.

Ken ended his Weblog post with a proposed notation for fused names, which is just like one I had mulled and discarded for Amara, He also showed me a derived convention for his Orchard software, which I didn't know was still in development (it looks very interesting). This convention looks just like the mapping accessors I somewhat grudgingly added to Amara in February. I'm still open to changing this until Amara 1.2, so I'll give the whole matter some thought (including the Orchard approach). Feedback is welcome. I think the fact that Ken and I independently came up with a such a series of similar ideas makes me think we're on the right track.

I do want to make sure it's clear that giving the user a better API for namespaces is not bound to insistence on fused names.

[Uche Ogbuji]

via Copia