Small fix to atom.rnc, and what about xml:space?

RobertBachmann stopped by #atom to mention that he'd tried to run an Atom file on the non-normative RELAX NG for the Atom RFC draft (I haven't seen an RNC for the final RFC itself). It failed because he used xml:lang in an atom:name child of atom:author. This contradicts the Atom spec, which says:

Any element defined by this specification MAY have an xml:lang attribute, whose content indicates the natural language for the element and its descendents.

The RNC did not specify this attribute in a couple of cases. The RNC is non-normative, but in this case there is no reason for divergence from the spec. I whipped up an atom.rnc that fixes the bug. Here's the diff from the version I found on-line.

This did set up a discussion between Anne van Kesteren and me. I feel that xml:lang only makes sense for some Atom elements, and that perhaps allowing it on all of them could be confusing. What, for example, does it mean to have xml:lang on the atom:uri child of atom:author? I suppose an outlandish (pun intended) interpretation could be references to localized sites, but that's really the province of the likes of XHTML's hreflang attribute. Moreover, I'm a bit puzzled by the bit from the Atom spec that seems to support my leaning:

The language context is only significant for elements and attributes declared to be "Language-Sensitive" by this specification.

So if it's not significant, why allow it? I think maybe there should have been a split in attribute sets between atomCommonAttributes and a atomCommonLanguageSensitiveAttributes, where the former would omit xml:lang.

Also, I'm used to the convention where xml:lang is used with content models that allow a language-sensitive element to be repeated, providing for multiple language versions in the same document. There are many cases in Atom where this would not be possible. For example, you could not have an English atom:title and a French one within the same atom:entry element. You could get tricky with by using a single atom:entry with type="xhtml" and multiple language versions within the xhtml:div, but this feels a bit constricting.

Anne doesn't mind xml:lang everywhere, and pointed out that xml:lang="" is an option for specifying no language context (rather than language context inherited from parent). I think in the end I could go either way on xml:lang everywhere.

This discussion also made me think of xml:space. This special attribute might get a mention right in the XML spec, but that doesn't mean it doesn't have to be addressed in XML applications. Even in the case of DTD, the spec says

In valid documents, this attribute, like any other, must be declared if it is used.

The same goes for RELAX NG, the conventional schema language for Atom. There is no xml:space to be found in either the normative RFC or non-normative schema, but the rules for Atom undefinedAttribute do allow for this attribute (as well as xml:id and just about any other XML or 'global' attribute). I assume that the intention is for applications to treat this attribute using the suggested semantics in the XML 1.0 spec. I do wish Atom had been explicit about this as is, for example, the XSLT 1.0 spec.

[Uche Ogbuji]

via Copia
5 responses
I pointed out that it was currently impossible for an author to have, say, both a Chinese and an American name, but this was rejected as being on the wrong side of 80/20.
Hi Uche,

As we were saying yesterday, to a larger extent the fact the RFC offers a RelaxNG schema is not a good idea. Evene a normative one because many people will assume it is a correct interpretation of the specification text itself and will base their Atom construction on the schema rather than on the spec.

Therefore, either the schema should be fixed to a point where it is a miror of the spec or it should be left out of the spec altogether.

In its current form it is more misleading IMO.

- Sylvain
Uche, I would love to know how you go about embedding schematron rules within a compact RelaxNG schema.  Have you any references for how to do this? It's just what I need at the moment.....
"So if it's not significant, why allow it?"

If it doesn't hurt, why ban it?
I guess the bottome line Robert is that the specification should have been more precise and explicit about what it meant. Whether they keep it or not.