What Do Closed Systems Have to Gain From SW Technologies?

Aaron Swartz asked that I elaborate on a topic that is dear to me and I didn't think a blog comment would do it justice, so here we are :)

The question is what do single-purpose (closed) databases have to gain from SW technologies. I think the most important misconception to clear up first is the idea that XML and Semantic Web technologies are mutually exclusive.
They are most certainly not.

It's not that I think Aaron shares this misconception, but I think that the main reason why the alternative approach to applying SW technologies that he suggests isn't very well spoken for is that quite a few on the opposing sides of the issue assume that XML (and it's whole strata of protocols and standards) and RDF/OWL (the traditionally celebrated components of SW) are mutually exclusive. There are other misconceptions that hamper this discussion, such as the assumption that the SW is an all or nothing proposition, but that is a whole other thread :)

As we evolve towards a civilization where the value in information and it's synthesis is of increasing importance, 'traditional' data mining, expressiveness of representation, and portability become more important for most databases (single-purpose or not).

These are areas that these technologies are meant to address, explicitly because “standard database” software / technologies are simply not well suited for these specific requirements. Not all databases are alike and so it follows that not all databases will have these requirements: consider databases where the primary purpose is the management of financial transactions.

Money is money, arithmetic is arithmetic, and the domain of money exchange and management for the most part is static and traditional / standard database technologies will suffice. Sure, it may be useful to be able to export a bank statement in a portable (perhaps XML-based) format, but inevitably the value in using SW-related technologies is very minimal.

Ofcourse, you could argue that online banking systems have a lot to gain from these technologies, but the example was of pure transactional management, the portal that manages the social aspects of money management is a layer on top.

However, where there is a need to leverage:

  • More expressive mechanisms for data collection (think XForms)
  • (Somewhat) unambiguous interpretation of content (think FOL and DL)
  • Expressive data mining (think RDF querying languages)
  • Portable message / document formats (think XML)
  • Data manipulation (think XSLT)
  • Consistent addressing of distributed resources (think URLs)
  • General automation of data management (think Document Definitions and GRDDL)

These technologies will have an impact on how things are done. It's worth noting that these needs aren't restricted to distributed databases (which is the other assumption about the Semantic Web - that it only applies within the context of the 'Web'). Consider the Wiki example and the advantages that Semantic Wikis have over them:

  • Much Improved possibility of data mining from more formal representation of content
  • 'Out-of-the-box' interoperability with tools that speak in SW dialects
  • Possibility of certain amount of automation from the capabilities that interpretation bring

It's also worth noting that recently the Semantic Wiki project introduced mechanisms for using other vocabularies for 'marking-up' content (FOAF being the primary vocabulary highlighted).

It's dually important in that 1) it demonstrates the value in incorporating well-established vocabularies with relative ease and 2) the policed way in which these additional vocabularies can be used demonstrate precisely the middle ground between a very liberal, open world assumption, approach to distributed data in the SW and controlled, closed, (single-purpose) systems approach.

Such constraints can allow for some level of uniformity that can have very important consequences in very different areas: XML as a messaging interlingua and extraction of RDF.

Consider the value in developing a closed vocabulary with it's semantics spelled out very unambiguously in RDF/RDFS/OWL and a uniform XML representation of it's instances with an accompanying XSLT transform (something the AtomOWL project is attempting to achieve).

What do you gain? For one thing, XForms-based data entry for the uniform XML instances and a direct, (relatively) unambiguous mapping to a more formal representation model – each of which have their own very long list of advantages they bring by themselves much less in tandem!

Stand-alone databases (where their needs intersect with the value in SW technologies) stand to gain: Portable, declarative data entry mechanisms, interoperability, much improved capabilities for interpretation and synthesis of existing information, increased automation of data management (by closing the system certain operations become much more predictable), and the additional possibilities for alternative reasoning heuristics that take advantage of closed world assumptions.

Chimezie Ogbuji

via Copia