Some thoughts on QNames in content (including proposal for a better, ahem, name)

I'll cut your ass in half and leave you with a semi-colon

—Mr. Man

QNames in content have been on my brain today. See a follow up posting for more on why.

First of all I think we should find a new name for this phenomenon, because I don't think QNames qua QNames are key to the problem. For one thing, you have the problem even if you only use a prefix in content, as XSLT does in, say the extension-element-prefixes, or XPath in html:*/html:span (the second step is a QName, but not the first). I think a better name for this problem is "hidden namespaces" because that's exactly the problem: the document depends on a construct that is hiding a namespace in a separate layer where generic processing cannot see it.

Whatever the name, I re-read today a couple of important documents regarding the issue. First of all there is the TAG finding on QNames, which is unfortunately not much more than an agglomeration of existing wisdom. Norm Walsh, the editor of that document wrote of a more radical direction as part of his "XML 2.0" article. I like his ideas (though I'm partial to Jeffrey Yasskin's ampersand variation, and I hope conversation soon drives towards something along those lines. XML is almost ten years old, and I see nothing wrong with a bit of a shake-up.

Until then, I think we can deploy two safeguards to protect ourselves from the subtle problems of namespaces. I call them: "sanity within the document, and registries without". The two components are very different in character.

Firstly, we need to discard the idea of in-document scoping of namespaces. It seemed a great idea at the time, even to me, but in practice it's a mess, and Joe English was the first one to illuminate the mess in the light of a brilliant metaphor (Google cache of original since XML-DEV is down now). (See my article "Principles of XML design: Use XML namespaces with care" for more on this). If we can rely on sanity in XML documents we can at least simplify state processing a good deal. Ideally all the XML sources in an XML processing pipeline would emit sane XML.

Secondly, I think the time has come for namespace registries. It would definitely be nice to build on the unfortunately stalled RDDL, but whereas the goal of RDDL is to provide human readable information, what I think we really need in a namespace registry is a little nugget of machine-readable data. Drumroll please...

A list of preferred prefixes for a namespace (supporting lookup of namespace name to well-known prefix, and vice versa). I know this will be controversial. Prefixes are supposed to be insignificant. Users should have flexibility to use whatever prefix, blah blah blah. I'm sorry, but that's all theoretically nice, but we have practical problems to solve. The fact that the most powerful constructs in XPath depend for their semantics on the whimsy of prefix choices should bother you a bit. The fact that Canonical XML had to abandon the idea of normalizing prefixes should bother you even more. It's time to just say that xsl means "The XSLT namespace" (yeah, yeah: "what version?" etc.—hard problems would still remain) and that if you choose to use it for a different namespace, you're technically compliant to namespaces, but you're asking for a heaping help of trouble, buddy.

For now I'm just throwing out ideas to help organize my thoughts, and for discussion. It seems to me that if we could rely on authors and tools, supported by a registry, to produce sane documents that (wherever possible) used essentially reserved prefixes, including, of course, for hidden namespaces, we could simplify namespace-aware processing a great deal. I can think of some practical hurdles for the registry idea, but I can't think of any reason why it's not even worth a try.

[Uche Ogbuji]

via Copia
2 responses
"hidden namespaces", ya, that's an improvement. I'm not so sure about the prefix registry idea though, doesn't it undermine the whole idea of using URIs in the first place?



I can't get to the xml-dev archives right now (404), but I get the picture from your IBM piece. I don't actually see a problem - the XML processor is more like a person with autism than a psychiatrist...



Also I'm not sure of the argument against in-document scoping. The argument for is mostly that a registry isn't needed - that's good, isn't it? Assuming of course that the namespaces can be unhidden locally.



I'm sure the following has come up elsewhere, but I don't have pointers. Namespaces can be unhidden with fairly minimal application-level extensions to XML, either something like:

<namespace prefix uri="http://example.org/blah#" prefix="b" />

or:

<?namespace prefix uri="http://example.org/blah#" prefix="b" ?>



As I type it occurs to me that there may be an wierd little application of GRDDL here, using data-view to extract:

[

:namespace <http://example.org/blah#> ;

:prefix "b"

]
I switched to the Stylus Corp's versions of the archive link, and added the Google cache of the original, since that's down.  I'm in a work crunch this morning, so I'll have to come back and address your comment at greater length later, but I did want to say you should read Joe's message for a good explanation of the problems that come with insanity (really, this is no longer even a matter of any controversy: it's well known that insane documents are hard to process).  And no, having sane documents does *not* eliminate the need for registries.  They are orthogonal problems.  FInally I don't see how your last bit does anything to "unhide" namespaces.  Please clarify.  Are you meaning another syntactic take on Norm's XMl 2.0 ideas (if so I much prefer Norm's and Jeffrey's syntaxes).