Thinking“Thinking XML: The XML decade”

I've been very pleased at the response to my article “Thinking XML: The XML decade”. I like to write two types of articles: straightforward expositions of running code, and pieces providing more general perspective on some aspect of technology. I've felt some pressure in the publishing world lately to dispense with some of the latter in favor of puff pieces for the hottest technology fad; right now, having "SOA", "AJAX" or "podcast" in the title is the formula for selling a book or article. I've been lucky throughout my career to build relationships with editors who trust my judgment, even if my writing is so often out of the mainstream. As such, whenever an article that's not of any obvious appeal touches a chord and provokes intelligent response, I welcome it as some more ammunition about following the road less trampled in my future writing.

Karl Dubost of the W3C especially appreciated my effort to bring perspective to the inevitable tensions that emerged as XML took hold in various communities.

Uche Ogbuji has written an excellent article The XML Decade. The author is going through XML development history as well as tensions between technological choices in XML communities. One of the fundamental choices of creating XML was to remove the data jail created by some application or programming languages.

Mike Champion, with whom I've often jousted (we have very different ideas of what's practical) was very kind in "People are reflecting on XML after 10 years".

A more balanced assessment of the special [IBM Systems Journal issue] is from Uche Ogbuji. There is quite a nice summary of the very different points of view about what XML is good for and how it can be used. He reminds us that today's blog debates about simplicity/complexity, tight/loose coupling, static/dynamic typing, etc. reflect debates that go back to the very beginning. I particularly like his pushback on one article's assertion that XML leverages the value of "information hiding" in OO design.

It was a really big leap for me from OO (and structured programming/abstract data type) orthodoxy to embrace of XML's open data philosophy (more on that in “Objects. Encapsulation. XML?”). It did help that I'd watched application interface wars from numerous angles in my career: RPC granularity, mixins and implementation versus interface inheritance, SQL/C interface coupling, etc. It start to become apparent to me that something was wrong when we were translating natural business domain problems into such patently artificial forms. RDBMS purists have been making a similar point for ages, but in my opinion, they just want to replace N-tier applications artifice with their own brand of artifice. XML is far from some magic mix for natural expression of the business domain, but I believe that XML artifacts tend to be fundamentally more transparent than other approaches to computing. In my experience, I've found that it's easier to maintain even a poorly-designed XML-driven system than a well-designed system where the programming is preeminent.

Mike's entry goes on to analyze the usefulness, in perspective, of the 10 guiding principles for XML. When he speaks of "a more balanced view" he's contrasting the SlashDot thread on the article, which is mostly filled with the sort of half-educated nonsense that drove me from that site a few years ago (these days I find the most respectable discussion on reddit.com). Poor Liam Quinn spent an awful lot of patience on a gang of inveterate flame-throwers. Besides his calm explanations the best bits in that thread were on HL7 and ASN.1.

Supposedly the new version 3 [HL7] standard (which uses the "modeling approach") will be much more firm with the implementors, which will hopefully mean that every now and then one implementation will actually be compatible with another implementation. I've looked over their "models" and they've modelled a lot of the business use-case stuff for patient data, but not a lot of the actual data itself. Hopefully when it's done, it'll come out a bit better baked than previous versions.

That does not sound encouraging. I know Chimezie, my brother and Copia co-conspirator, is doing some really exciting work on patient data records. More RDF than XML, but I know he has a good head for the structured/unstructured data continuum, so I hope his work propagates more widely than just the Cleveland Clinic Foundation.

J. Andrew Rogers made the point that Simon St.Laurent and I (among others) have made in the many cases where people misuse XML rather than something more suitable, such as ASN.1.

The "slow processing" is caused by more than taking a lot of space. XML is basically a document markup but is frequently and regular used as a wire protocol, which has very different design requirements if you want a good standard. And in fact we already have a good standard for this kind of thing called "ASN.1", which was actually engineered to be extremely efficient as a wire protocol standard. (There is also an ITU standard for encoding XML as ASN.1 called XER, which solves many of the performance problems.)

Of course, I think he goes a bit too far.

The only real advantage XML has is that it is (sort of) human readable. Raw TLV formatted documents are a bit opaque, but they can be trivially converted into an XML-like format with no loss (and back) without giving software parsers headaches. There is buckets of irony that the deficiencies of XML are being fixed by essentially converting it to ASN.1 style formats so that machines can parse them with maximum efficiency. Yet another case of computer science history repeating itself. XML is not useful for much more than a presentation layer, and the fact that it is often treated as far more is ridiculous.

I'd actually argue that XML is suited for a (semi-structured) model layer, not a presentation layer. For one thing, wire efficiency often counts in presentation as well. But his essential point is correct that XML is an awful substitute for ASN.1 as a wire protocol. By the same token, the Web services stack is an awful substitute for CORBA/OMA and even Microsoft's answers to same. It seems the industry is slowly beginning to realize this. I love all the many articles I see with titles such as "We know your SOA investment is stuck firmly in the toilet, but honest, believe us, there is an effective way to use this stuff. Really."

Anyway later on in that sub-thread:

The company I work for has had a lot of success with XML, and are planning to move the internal data structure for our application from maps to XML. There is one simple reason for our sucess with it: XSLT. A customer asks for output in a specific format? Write a template. Want to display the data on a web page? Write a template that converts to HTML. Want to print to PDF? Write a template that converts to XSL, and use one of many available XSL->PDF processors. Want to use PDF forms to input data? Write a template to convert XFDF to our format. Want to import data from a competitor and steal their customer? You get the picture.

Bingo! The secret to XML's value is transformation. Pass it on.

In "XML at 10: Big or Little?" Mark writes:

What the article ultimately ends up being about is the "Big" idea of XML vs. the oftentimes "Little" implementation of it. The Big idea is that XML can be used in a bottom-up fashion to model the grammar of a particular problem domain in an application- and context-independent manner. The little implementation is when XML is essentially used as a more verbose protocol for data interchange between existing applications. I would guess that is 90+ percent of what it is currently used for.

He's right, and I this overuse is one of the reasons XML so often elicits reflex hostility. Then again, anything that doesn't elicit such hostility in some quarters is, of course, entirely irrelevant. I think it's a good thing that cannot be said of XML.

[Uche Ogbuji]

via Copia