There has been a lot of chatter about Tim Bray's piece "Don’t Invent XML
Languages".
Good. I'm all for anything that makes people think carefully about XML
language design and problems of semantic transparency (communicated
meaning of XML structure). I'm all for it even though I generally
disagree with Tim's conclusions. Here are some quick thoughts on Tim's
essay, and some of the responses I've seen.
Here’s a radical idea: don’t even think of making your own language
until you’re sure that you can’t do the job using one of the Big Five:
XHTML, DocBook, ODF, UBL, and Atom.—Bray
This is a pretty biased list, and happens to make sense for the circles
in which he moves. Even though I happen to move in much the same
circles, the first thing I'd say is that there could hardly ever be an
authoritative "big 5" list of XML vocabs. There is too much debate and
diversity, and that's too good a thing to sweep under the rug. MS Office
XML or ODF? OAGIS or UBL? RSS 2.0 or Atom? Sure I happen to plump for
the latter three, as Tim does, but things are not so clear cut for the
average punter. (I didn't mention TEI or DocBook because it's much less
of a head to head battle).
I made my own list in "A survey of XML standards: Part 3—The most
important vocabularies" (IBM developerWorks,
2004).
It goes:
- XHTML
- Docbook
- XSL-FO
- SVG
- VoiceXML
- MathML
- SMIL
- RDF
- XML Topic Maps
And in that article I admit I'm "just scratching the surface". The list
predates first full releases of Atom and ODF, or they would have been on
it. I should also mention XBEL, which is, I think, not as widely trumpetd, but just about as important as those other entrants. BTW, see the full cross-reference of my survey of XML
standards.
Designing XML Languages is hard. It’s boring, political,
time-consuming, unglamorous, irritating work. It always takes longer
than you think it will, and when you’re finished, there’s always this
feeling that you could have done more or should have done less or got
some detail essentially wrong.—Bray
This is true. It's easy to be flip and say "sure, that's true of
programming, but we're not being advised to write no more programs". But
then I think this difficulty is even more true of XML design than of
programming, and it's worth reminding people that a useful XML
vocabulary is not something you toss off in the spare hour. Simon
St.Laurent has always been a sound analyst
of the harm done by programmers who take shortcuts and abuse markup in order to suite their
conventions. The lesson, however, should be to learn best practices of markup design rather than to become a helpless spectator.
If you’re going to design a new language, you’re committing to a
major investment in software development. First, you’ll need a
validator. Don’t kid yourself that writing a schema will do the trick;
any nontrivial language will have a whole lot of constraints that you
can’t check in a schema language, and any nontrivial language needs an
automated validator if you’re going to get software to interoperate.
If people would just use decent schema technology, this point would be
very much weakened. Schema designers rarely see beyond plain W3C XML
Schema or RELAX NG. Too bad. RELAX NG plus Schematron (with XPath
1.0/XSLT 1.0 drivers) covers a huge number of constraints.
Add in EXSLT 1.0 drivers for Schematron and you can cover probably 95+%
of Atom's constraints (probably more, actually). Throw in user-defined extensions and you have a
very powerful and mostly declarative validation engine. We should do a
better job of rendering such goodness to XML developers, rather than
scaring them away with duct-tape-validator bogeymen.
Yes, XHTML is semantically weak and doesn’t really grok hierarchy and
has a bunch of other problems. That’s OK, because it has a
general-purpose class
attribute and ignores markup it doesn’t know
about and you can bastardize it eight ways from center without anything
breaking. The Kool Kids call this “Microformats”...
This understated bit is, I think, the heart of Tim's argument. The
problem is that I still haven't been able to figure out why Microformats
have any advantage in Semantic transparency over new vocabularies.
Despite the fuzzy claims of μFormatters, a microformat requires just as
much specification as a small, standalone format to be useful. It didn't
take me long kicking around XOXO to solve a real-world
problem before
this became apparent to
me.
Some interesting reactions to the piece
Dare Obasanjo.
Dare indirectly brought up that Ian
Hickson had argued
against inventing XML vocabularies in 2003. I remember violently and
negatively reacting to the idea that everyone should stick to XHTML and
its elite companions. Certainly such limitations make sense for some,
but the general case is more nuanced (thank goodness). Side note:
another pioneer of the pessimistic side of this argument is Mark
Pilgrim
http://www.xml.com/pub/au/164.
Needless to say I disagree with many of his points as well.
I've always considered it a gross hack to think that instead of
having an HTML web page for my blog and an Atom/RSS feed, instead I
should have a single HTML page with <div class="rss:item">
or <h3 class="atom:title">
embedded in it instead. However given that one of
the inventors of XML (Tim Bray) is now advocating this approach, I
wonder if I'm simply clinging to old ways and have become the kind of
intellectual dinosaur I bemoan.—Obasanjo
Dare is, I think, about as stubborn and tart as I am, so I'm amazed to
see him doubting his convictions in this way. Please don't, Dare. You're
quite correct. Microformats are just a hair away from my pet reductio
ad absurdum—<tag type="title">
rather than just <title>
. I
still haven't heard a decent argument for such periphrasis. And I don't
see how the fact that tag
is semantically anchored does anything
special for the stepchild identifier title
in the microformats scenario.
BTW, there is a priceless quote in comments to Dare:
OK, so they're saying: don't create new XML languages - instead,
create new HTML languages. Because if you can't get people to [separate
presentation from data], hijack the presentation!—"Steve"
Wot he said. With bells on.
Danny
Ayers
.
I think most XML languages have been created by one of three
processes - translating from a legacy format; mapping directly from the
domain entities to the syntax; creating an abstract model from the
domain, then mapping from that to the XML. The latter two of these are
really on a greyscale: a language designer probably has the abstract
entities and relationships in mind when creating the format, whether or
not they have been expressed formally.—Ayers
I've had my tiffs with RDF gurus lately, but this is the sort of point
you can trust an RDF guru to nail, and Danny does so. XML languages are,
like all languages, about expression. The farther the expression lies
from the abstraction being expressed (the model), the more expensive the
maintenance. Punting to an existing format that might have some vague
ties to the problem space is a much worse economic bet than the effort
of designing a sound and true format for that problem space.
To slightly repurpose another Danny quote towards XML,
...in most cases it’s probably best to initially make up afresh a new
representation that matches the domain model as closely as
possible(/appropriate). Only then start looking to replacing the new
terms with established ones with matching semantics. But don’t see
reusing things as more important than getting an (appropriately)
accurate model.—Ayers
Ned
Batchelder.
He correctly identifies that Tim Bray's points tend to be most
applicable to document-style XML. I've long since come to the conclusion
(again with a lot of influence from Simon St.Laurent) that XML is too
often the wrong solution for programmer-data-focused formats (including
software configuration formats). Yeah, of course I've already
elaborated in the Python
context.
[Uche Ogbuji]