"Semantic Transparency" by any other name

In response to "Semantic hairball, y'all" Paul Downey responded with approval of my skewering of some of the technologies I see dominating the semantics space, but did say:

..."semantic transparency" in "XML Schema" sounds just a little too scary for my tastes....

This could mean that the names sound scary, or that his interpretation of the idea itself sounds scary. If it's the latter, I'll try to show soon that the idea is very simple and shouldn't be at all scary. If it's the former, the man has a point. "Semantic Transparency" is a very ungainly name. As far as I can tell, it was coined by Robin Cover, and I'm sure it was quite suitable at the time, but for sure right now it's a bit of a liability in the pursuit that most interests me.

The pursuit is of ways to build on the prodigious success of XML to make truly revolutionary changes in data architecture within and across organizations. Not revolutionary in terms of the technology to be used. In fact, as I said in "No one ever got fired for...", the trick is to finally give age-old and well proven Web architecture more than a peripheral role in enterprise architecture. The revolution would be in the effects on corporate culture that could come from the increased openness and collaboration being fostered in current Web trends.

XML ushered in a small revolution by at least codifying a syntactic basis for general purpose information exchange. A common alphabet for sharing. Much is made of the division between data and documents in XML (more accurate terms have been proposed, including records versus narrative, but I think people understand quite well what's meant by the data/documents divide, and those terms are just fine). The key to XML is that even though it's much more suited to documents, it adapts fairly well to data applications. Technologies born in the data world such as relational and object databases have never been nearly as suitable for document applications, despite shrill claims of relational fundamentalists. XML's syntax mini-revolution means that for once those trying to make enterprise information systems more transparent by consolidating departmental databases into massive stores (call them the data warehouse empire), and those trying to consolidate documents into titanic content management stores (call them the CMS empire) can use the same alphabet (note: the latter group is usually closely allied with those who want to make all that intellectual capital extremely easy to exchange under contract terms. Call them the EDI empire). The common alphabet might not be ideal for any one side at the table, but it's a place to start building interoperability, and along with that the next generation of applications.

All over the place I find in my consulting and speaking that people have embraced XML just to run into the inevitable limitations of its syntactic interoperability and scratch their head wondering OK, what's the next stop on this bus route? People who know how to make money have latched onto the suspense, largely as a way of re-emphasizing the relevance of their traditional products and services, rather than as a way to push for further evolution. A few more idealistic visionaries are pushing such further evolution, some rallying under the banner of the "Semantic Web". I think this term is, unfortunately, tainted. Too much of the 70s AI ambition has been cooked into recent iterations of Semantic Web technologies, and these technologies are completely glazing over the eyes of the folks who really matter: the non-Ph.Ds (for the most part) who generate the vast bodies of public and private documents and data that are to drive the burgeoning information economy.

Some people building on XML are looking for a sort of mindshare arbitrage between the sharp vendors and the polyester hippies, touting sloppy, bottom-up initiatives such as microformats and folksonomies. These are cheap, and don't make the head spin to contemplate, but it seems clear to anyone paying attention that they don't scale as a way to consolidate knowledge any more than the original Web does.

I believe all these forces will offer significant juice to next generation information systems, and that the debate really is just how the success will be apportioned. As such, we still need an umbrella term for what it means to build on a syntactic foundation by sharing context as well. To start sharing glossaries as well as alphabets. The fate (IMO) of the term "Semantic Web" is instructive. I often joke about the curse of the s-word. It's a joke I picked up from elsewhere (I forget where) to the effect that certain words starting with "s", and "semantic" in particular are doomed to sound imposing yet impossibly vague. My first thought was: rather than "semantic transparency", how about just "transparency? The problem is that it's a bit too much of a hijack of the generic. A data architect probably will get the right picture from the term, but we need to communicate to ithe wider world.

Other ideas that occur to me are:

"information transparency"
"shared context" or "context sharing"
"merged context"
"context framing"
"Web reference"

The latter idea comes from my favorite metaphor for these XML++ technologies: that they are the reference section (plus card catalog) of the library (see "Managing XML libraries"). They are what makes it possible to find, cross-reference and understand all the information in the actual books themselves. I'm still fishing for good terms, and would welcome any suggestions.

[Uche Ogbuji]

via Copia