Mark Baker's "Validation considered harmful" touched off a fun series of responses.
[C]onsider the scenario of two parties on the Web which want to exchange a certain kind of document. Party A has an expensive support contract with BigDocCo that ensures that they’re always running the latest-and-greatest document processing software. But party B doesn’t, and so typically lags a few months behind. During one of those lags, a new version of the schema is released which relaxes an earlier stanza in the schema which constrained a certain field to the values “1″, “2″, or “3″; “4″ is now a valid value. So, party B, with its new software, happily fires off a document to A as it often does, but this document includes the value “4″ in that field. What happens? Of course A rejects it; it’s an invalid document, and an alert is raised with the human [administrator], dramatically increasing the cost of document exchange. All because evolvability wasn’t baked in, because a schema was used in its default mode of operation; to restrict rather than permit.
Upon reading this I had 2 immediate reactions:
Yep. Walter Perry was going on about all this sort of thing a long time ago, and the industry would be in a much saner place, without, for example crazy ideas such as WS-Kaleidoscope and tight binding of documents to data records (read WXS and XQuery). For an example of how Perry absolutely skewered class-conscious XML using a scenario somewhat similar to Mark's, read this incisive post. To me the perils of bondage-and-discipline validation are as those of B&D datatyping. It's all more example of the poor design that results when you follow twopenny Structured Programming too far and let early binding rule the cosmos.
Yep. This is one of the reasons why once you use Schematron and actually deploy it in real-life scenarios where schema evolution is inevitable, you never feel sanguine about using a grammar-based schema language (not even RELAX NG) again.
Dare's response took me aback a bit.
The fact that you enforce that the XML documents you receive must follow a certain structure or must conform to certain constraints does not mean that your system cannot be flexible in the face of new versions. First of all, every system does some form of validation because it cannot process arbitrary documents. For example an RSS reader cannot do anything reasonable with an XBRL or ODF document, no matter how liberal it is in what it accepts. Now that we have accepted that there are certain levels validation that are no-brainers the next question is to ask what happens if there are no constraints on the values of elements and attributes in an input document. Let's say we have a purchase order format which in v1 has a element which can have a value of "U.S. dollars" or "Canadian dollars" then in v2 we now support any valid currency. What happens if a v2 document is sent to a v1 client? Is it a good idea for such a client to muddle along even though it can't handle the specified currency format?
Dare is not incorrect, but I was surprised at his reading of Mark. When I considered it carefully, though, I realized that Mark did leave himself open to that interpretation by not being explicit enough. As he clarified in comment to Dare:
The problem with virtually all uses of validation that I've seen is that this document would be rejected long before it even got to the bit of software which cared about currency. I'm arguing against the use of validation as a "gatekeeper", not against the practice of checking values to see whether you can process them or not ... I thought it goes without saying that you need to do that! 8-O
I actually think this is a misunderstanding that other readers might easily have, so I think it's good that Dare called him on it, and teased out the needed clarification. I missed it because I know Mark too well to ever imagine he'd ever go so far off in the weeds.
Of course the father of Schematron would have a response to reckon with in such debate, but I was surprised to find Rick Jelliffe so demure about Schematron. His formula:
schema used to validating incoming data only validates traceable business requirements
Is flash-bam-alakazam spot on, but somewhat understated. Most forms of XML validation do us disservice by making us nit-pick every detail of what we can live with, rather than letting us make brief declarations of what we cannot live without. Yes Schematron's phases provide a powerful mechanism for elagant modularization of expression of rules and requirements, but long before you go that deep Schematron sets you free by making validation an open rather than a closed operation. The gains in expressiveness thereby provided are near astonishing, and this is despite the fact that Schematron is a less terse schema language than DTD, WXS or RELAX NG.
Of course XML put us on the road to unnecessary bondage and discipline on day one when it made it so easy, and even a matter of recommendation, to top each document off with a lordly DTD. Despite the fact that I think Microformats are a rickety foundation for almost anything useful at Web scale, I am hopeful they will act as powerful incentive for moving the industry away from knee-jerk validation.