XML data bindings, static languages, dynamic languages

A discussion about the brokenness of W3C XML Schema (WXS) on XML-DEV turned interestingly to the topic of the limitations of XML data bindings. This thread crystallized into a truly bizarre subthread where we had Mike Champion and Paul Downey actually trying to argue that the silly WXS wart xsi:nil might be more important in XML than mixed content (honestly the arrogance of some of the XML gentry just takes my breath away). As usual it was Eric van der Vlist and Elliotte Harold patiently arguing common sense, and at one point Pete Cordell asked them:

How do you think a data binding app should handle mixed content? We lump a complex types mixed content into a string and stop there, which I don't think is ideal (although it is a common approach). Another approach could be to have strings in your language binding classes (in our case C++) interleaved with the data elements that would store the CDATA parts. Would this be better? Is there a need for both?

Of course as author of Amara Bindery, a Python data binding, my response to this is "it's easy to handle mixed content." Moving on in the thread he elaborates:

Being guilty of being a code-head (and a binding one at that - can it get worse!), I'm keen to know how you'd like us to make a better fist of it. One way of binding the example of "<p>This is <strong>very</strong> important</p>" might be to have a class structure that (with any unused elements ignored) looks like:-

class p
    string cdata1;        // = "This is "
    class strong strong;
    string cdata2;        // = " important"

class strong
    string cdata1;        // = "very"

as opposed to (ignoring the CDATA):

class p
    class strong strong;

class strong

or (lumping all the mixed text together):

class p
    string mixedContent;    // = "<p>This is <strong>very</strong> important</p>"

Or do you just decide that binding isn't the right solution in this case, or a hybrid is required?

It looks to me like a problem with poor expressiveness in a statically, strongly typed language. Of course, static versus dynamic is a hot topic these days, and has been since the "scripting language" diss has started to wear thin. But the simple fact is that Amara doesn't even blink at this, and needs a lot less superstructure:

>>> from amara.binderytools import bind_string
>>> doc = bind_string("<p>This is <strong>very</strong> important</p>")
>>> doc.p
<amara.bindery.p object at 0xb7bab0ec>
>>> doc.p.xml()
'<p>This is <strong>very</strong>  important</p>'
>>> doc.p.strong
<amara.bindery.strong object at 0xb7bab14c>
>>> doc.p.strong.xml()
>>> doc.p.xml_children
[u'This is ', <amara.bindery.strong object at 0xb7bab14c>, u' important']

There's the magic. All the XML data is there; it uses the vocabulary of the XML itself in the object model (as expected for a data binding); it maintains the full structure of the mixed content in a very easy way for the user to process. And if we ever decide we just want to content, unmixed, we can just use the usual XPath technique:

>>> doc.p.xml_xpath(u"string(.)")
u'This is very  important'

So there. Mixed content easily handled. Imagine my disappointment at the despairing responses of Paul Downey and even Elliotte Harold:

Personally I'd stay away from data binding for use cases like this. Dealing with mixed content is hardly the only problem. You also have to deal with repeated elements, omitted elements, and order. Child elements just don't work well as fields. You can of course fix all this, but then you end up with something about as complicated as DOM.

Data binding is a plausible solution for going from objects and classes to XML documents and schemas; but it's a one-way ride. Going the other direction: from documents and schemas to objects and classes is much more complicated and generally not worth the hassle.

As I hope my Amara example shows, you do not need to end up with anything nearly as complex as DOM, and it's hardly a one-way ride. I think it should be made clear that a lot of the difficulties that seem to stem from Java's own limitations are not general XML processing problems, and thus I do not think they should properly inform a problem such as the emphasis of an XML schema language. In fact, I've [always argued]() that it's the very marrying of XML technology to the limitations of other technologies such as statically-typed OO languages and relational DBMSes that results in horrors such as WXS and XQuery. When designers focus on XML qua XML, as the RELAX NG folks did and the XPath folks did, for example, the results tend to be quite superior.

Eric did point out Amara in the thread.

An interesting side note—a question about non-XHTML use cases of mixed content (one even needs to ask?!) led once again to mention of the most widely underestimated XML modeling problem of all time: the structure of personal names. Peter Gerstbach provided the reminder this time. I've done my bit in the past.

[Uche Ogbuji]

via Copia
6 responses
BTW don't miss Rick Jelliffe's brilliant message about one of the frequently overlooked reaons for the importance of mixed content:


And he's not done dropping jewels with that one.

"I guess you could summarize the problem by saying that, for good internationalization, free text is rich text. It is not so much that DBMS and ancilliary systems need to support mixed content or XML, but rather that they need to support rich, freely annotatable text.  Suporting XML and mixed content is a solution rather than a use case, IYSWIM."

I agree that dynamically typed languages make life significantly simpler. In fact, XML and scripting seem betroth to each other at birth.

However, in cases where one is stuck with a static language like Java, is there a good solution? In other words, if one only has access to Java, do the arguments you refute stand?
Amara looks cool, and like you say, it helps if you use an /expressive/ programming language like Python.

fwiw i'm very much a fan of mixed content and would take it without a moment's thought over xsi:nil.

otoh I'm currently tasked to make life more bearable for people who are determined to view XML as "nasty horrible angle brackets", so find myself  having to sympathise with the code binding mindset, or at least find ways of helping them have a better experience when processing documents using their tools of abstraction. 

Maybe my "a mixed message" blog entry will reassure you of my intent:

I just saw this post yesterday. I've written a longish post on my weblog Regarding XML data bindings, static languages, dynamic languages that discusses your post, pretty much supporting it :-) but also shows how a Java binding tool deals with this.
Interesting, Bob.  Posted my thoughts here:

Bindings seem to be gaining a lot of attention lately - indeed, if you filtered out the GIS side from the discussions at the SVG Open 2005 conference, you'd think that the conference should have been named Open Bindings (which...