I'll cut your ass in half and leave you with a semi-colon
—Mr. Man
QNames in content have been on my brain today. See a follow up posting for more on why.
First of all I think we should find a new name for this phenomenon,
because I don't think QNames qua QNames are key to the problem. For one
thing, you have the problem even if you only use a prefix in content, as
XSLT does in, say the extension-element-prefixes
, or XPath in
html:*/html:span
(the second step is a QName, but not the first). I
think a better name for this problem is "hidden namespaces" because
that's exactly the problem: the document depends on a construct that is
hiding a namespace in a separate layer where generic processing cannot
see it.
Whatever the name, I re-read today a couple of important documents regarding the issue. First of all there is the TAG finding on QNames, which is unfortunately not much more than an agglomeration of existing wisdom. Norm Walsh, the editor of that document wrote of a more radical direction as part of his "XML 2.0" article. I like his ideas (though I'm partial to Jeffrey Yasskin's ampersand variation, and I hope conversation soon drives towards something along those lines. XML is almost ten years old, and I see nothing wrong with a bit of a shake-up.
Until then, I think we can deploy two safeguards to protect ourselves from the subtle problems of namespaces. I call them: "sanity within the document, and registries without". The two components are very different in character.
Firstly, we need to discard the idea of in-document scoping of namespaces. It seemed a great idea at the time, even to me, but in practice it's a mess, and Joe English was the first one to illuminate the mess in the light of a brilliant metaphor (Google cache of original since XML-DEV is down now). (See my article "Principles of XML design: Use XML namespaces with care" for more on this). If we can rely on sanity in XML documents we can at least simplify state processing a good deal. Ideally all the XML sources in an XML processing pipeline would emit sane XML.
Secondly, I think the time has come for namespace registries. It would definitely be nice to build on the unfortunately stalled RDDL, but whereas the goal of RDDL is to provide human readable information, what I think we really need in a namespace registry is a little nugget of machine-readable data. Drumroll please...
A list of preferred prefixes for a namespace (supporting lookup of
namespace name to well-known prefix, and vice versa). I know this will
be controversial. Prefixes are supposed to be insignificant. Users
should have flexibility to use whatever prefix, blah blah blah. I'm
sorry, but that's all theoretically nice, but we have practical problems
to solve. The fact that the most powerful constructs in XPath depend
for their semantics on the whimsy of prefix choices should bother you a
bit. The fact that Canonical XML had to abandon the idea of normalizing
prefixes should bother you even more. It's time to just say that xsl
means "The XSLT namespace" (yeah, yeah: "what version?" etc.—hard
problems would still remain) and that if you choose to use it for a
different namespace, you're technically compliant to namespaces, but
you're asking for a heaping help of trouble, buddy.
For now I'm just throwing out ideas to help organize my thoughts, and for discussion. It seems to me that if we could rely on authors and tools, supported by a registry, to produce sane documents that (wherever possible) used essentially reserved prefixes, including, of course, for hidden namespaces, we could simplify namespace-aware processing a great deal. I can think of some practical hurdles for the registry idea, but I can't think of any reason why it's not even worth a try.