GRDDL: A Knowledge Rrepresentation Architectural Style? - part one

Yay. GRDDL is now a W3C Recommendation!

I'm very proud to be a part of that and there is alot about this particular architectural style that I have always wanted to write about. I recently came upon the opportunity to consider one particular facet.

This is why it seems the same as with GRDDL. There are tranformations you can make, but they are not entailments in a logic, they just go from one graph to a different graph.

Yes, that is one part of the larger framework that is well considered. GRDDL does not rely on logical entaiment for its normative definition. It is defined operationally, but can also be described via declarative (formal) semantics. It defines a mapping (not a function in the true sense - the specification clearly identifies ambiguity at the level of the infoset) from an XML representation of an "information resource" to a typed RDF representation of the same "information resource". The output is required to have a well-defined mapping of its own into the RDF abstract syntax.

The less formal definition uses a dialect of Notation 3 that is a bit more expressive than Datalog Logic Programming (it uses function symbols - builtins - in some of the clauses ). The proof at the bottom of that page justifies the assertion that http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html has a GRDDL result which is composed entirely of the following RDF statement:

<http://musicbrainz.org/mm-2.1/album/6b050dcf-7ab1-456d-9e1b-c3c41c18eed2> is named "Are You Experienced?" .

Frankly, I would have gone with "Bold as Love", myself =)

Once you have a (mostly) well-defined function for rendering RDF from information resources, you enable the deployment of useful ( and re-usable ) interpretations for intelligent agents (more on these later). For example, the test suite, is a large semantic web of XML documents that GRDDL-aware agents can traverse, performing Quality Assurance tests (using EARL) of their conformance to the operational semantics of GRDDL.

However, it was very important to leave entailment out of the equation until it serves a justifiable purpose. For example, a well-designed RDF querying language does not require logical entailment (RDF, RDFS, OWL, or otherwise) for it to be useful in the general case. You can calculate a closure (or Herbrand base) and then dispatch structural graph queries. This was always true with Versa. You can glean (pun intended) quite a bit from only the structural nature of a Graph. A whole generation of graph theoretical literature demonstrates this.

In addition, once you have a well-defined set of semantics for identifying information resources with RDF assertions that are (logically) relevant to the closure, you have a clear seperation between manipulation of surface syntax and full-blown logical reasoning systems.

It should be considered a semantic web architectural style (if you will) to constrain the use of entailment to only where it has some demonstrated value to the problem space. Where it makes sense to use entailment, however, you will find the representations are well-engineered for the task.

Chimezie Ogbuji

via Copia

The Architectural Style of a Simple Interlingua

It just occurred to me that there is a strong correlation between the hardest nuance to get (or grok, as the saying goes) about REST and RDF.

With RDF, there is the pervasive Clay Shirky misconception that the semantic web is about one large-ontology-to rule-them-all. I've made it a point to start every semantic web-related presentation with some background information about Knowledge Representation (yes, that snow-covered relic of the AI winter). Knowledge Representation Triangle My favorite initial read on the subject is "How To Tell Stuff To A Computer - The Enigmatic Art of Knowledge Representation". As a follow-up, I'd suggest "What is a Knowledge Representation?" .

The thing that we miss (or forget) most often is that formal knowledge representations are first about a common syntax (and their interpretation: semantics) and then about the vocabularies you build with the common syntax. A brief read on the history of knowledge representation emphasizes this subtle point. At each point in the progression, the knowledge representation becomes more expressive or sophisticated but the masonry is the same.

With RDF, first there is the RDF abstract syntax, and then there are the vocabularies (RDFS,OWL,FOAF,DC,SKOS,etc..). Similarly (but more recursively), a variety of grammars can each be written to define a distinct class of XML documents all via the same language (RELAX NG, for instance). An Application Programming Interface (API) defines a common dialect for a variety applications to communicate with. And, finally, the REST architectural style defines a uniform interface for services, to which a variety of messages (HTTP messages) conform.

In each case, it is simplicity that is the secret catalyst. The RDF abstract syntax is nowhere as expressive as Horn Logic or Description Logic (this is the original motivation for DAML+OIL and OWL), but it is this limitation that makes it useful as a simple metadata framework. RELAX NG is (deceptively) much simpler than W3C XML Schema (syntactically), but its simple syntax makes it much more malleable for XML grammar contortions and easier to understand. The REST architectural style is dumbfounding in its simplicity (compared to WS-*) but it is this simple uniformity that scales so well to accommodate every nature of messaging between remote components. In addition, classes of such messages are trivial to describe.

So then, the various best practices in the Semantic Web canon (content negotiated vocabulary addresses, http-range14, linked data, etc..) and those in the REST architectural style are really manifestations of the same principle in two different arenas: knowledge representation and network protocols?

Chimezie Ogbuji

via Copia

What's good and bad in agile methodology?

Frank Kelly has some good Thoughts on Agile Methods - XP and the like.. He's a skeptic of dynamic languages, and of course I'm an avid user of and advocate for these, so I was almost put off by his first couple of paragraphs, but I think in the end he nails the essential points.

Whether you create a design or not - the second you write a line of code you are realizing a design - it may be in your head but it's still a design. If you are on a team of more than about 7-8 developers then to really "scale" communication a written and agreed upon design is a very helpful (I would say 'necessary') task on the path to to success.

but, as he admits:

As I've said before Agile has taught me that at many times "less is more" - so I tend to write smaller design documents with lots more pictures and try to always keep them under 20-30 pages or so. From there on, 1-on-1 and team meetings can help get the details out. Also you can farm out individual component designs to individual developers - each creating a small 5-10 page document. Beyond that you do get the "glaze over" effect in people's eyes.

This is exactly right, and it has been my attitude to agile methodology. You can't ignore design, but you can make it less of a white elephant. As I wrote in "What is this ‘agility’?"

It’s not easy to come to criticism of BDUF. After all, it brings to the young profession of software engineering the rigor and discipline that have established other engineering disciplines so respectably. No one would commission a skyscraper or build a jet plane without mountains of specifications, models, surveys and procedural rules. It would seem that similar care is the only solution for the bad reputation software has earned for poor quality.

Despite this, there has been a steady movement lately toward “agile” techniques, which contradict BDUF. Agile boosters claim that BDUF is impractical for all but the largest and most mission-critical systems, and causes a lot of problems because inevitable change in requirements and environment are very difficult to accommodate in the process. The track is laid out during analysis and design, and any variation therefrom is a catastrophic derailment. Agile methods focus on flexibility and accommodation of change, including greater involvement of the end user throughout the process.

One area where I tended to disagree with Frank is in his discussion of the "waterfalls" approach to the software development life cycle (SDLC):

Here's my issue with rejecting waterfall - it's like rejecting Gravity - that's all well and good if you live in a parallel universe where the laws of physics don't apply :-)

He goes on to imply that you are rejecting integration testing and design when you reject waterfalls. I strongly disagree. Let's say there are strong and weak agile methodology supporters, and that Frank and I are examples of the weak sort based on our attitude towards design, (with our belief that some design is always necessary). I think the part of waterfalls that most weak agile supporters reject is the part that gives its name, i.e. irreversible flow between stages of the SDLC. The problem with waterfalls is that it is the contrary to iteration, and I think iterative development is important. I think Frank does as well, given his acceptance of more, smaller releases, so I think our difference is less substantive and more a different understanding of what it is in waterfalls that agile methodology rejects.

[Uche Ogbuji]

via Copia

The Versatility of XForms

I'll be giving a presentation at the upcoming XML 2006 Conference in Boston on Tuesday December 5th at 1:30pm: The Essence of Declarative, XML-based Web Applications: XForms and XSLT.

I've been doing some hardcore XSLT/XForms development over the last 2 years or so and have come to really admire the versatility of using XSLT to generate XForms user interfaces. Using XSLT to generate XHTML from compact XML documents is a well known design pattern for seperating content from presentation. Using XSLT to generate XHTML+XForms takes this to the nth degree by seperating content from both presentation and behavior (The Model View Controller design pattern).

The icing on the cake is the XPath processing capabilities native to both XSLT and XForms. It makes for easily-managed and relatively compact applications with very little redundancy.

The presentation doesn't cover this, but the XForm framework also includes transport-level components / mechanisms that are equally revolutionary in how they tie web clients into the overall web architecture context very comprehensively (Rich Web Application Backplane has good coverage of patterns to this effect). I've always thought of XForms as a complete infrastructure for web application development and AJAX as more of an interim, scripting gimick that enables capabilities that are a small portion of what XForms has to offer.

[Uche Ogbuji]

via Copia

What Do Closed Systems Have to Gain From SW Technologies?

Aaron Swartz asked that I elaborate on a topic that is dear to me and I didn't think a blog comment would do it justice, so here we are :)

The question is what do single-purpose (closed) databases have to gain from SW technologies. I think the most important misconception to clear up first is the idea that XML and Semantic Web technologies are mutually exclusive.
They are most certainly not.

It's not that I think Aaron shares this misconception, but I think that the main reason why the alternative approach to applying SW technologies that he suggests isn't very well spoken for is that quite a few on the opposing sides of the issue assume that XML (and it's whole strata of protocols and standards) and RDF/OWL (the traditionally celebrated components of SW) are mutually exclusive. There are other misconceptions that hamper this discussion, such as the assumption that the SW is an all or nothing proposition, but that is a whole other thread :)

As we evolve towards a civilization where the value in information and it's synthesis is of increasing importance, 'traditional' data mining, expressiveness of representation, and portability become more important for most databases (single-purpose or not).

These are areas that these technologies are meant to address, explicitly because “standard database” software / technologies are simply not well suited for these specific requirements. Not all databases are alike and so it follows that not all databases will have these requirements: consider databases where the primary purpose is the management of financial transactions.

Money is money, arithmetic is arithmetic, and the domain of money exchange and management for the most part is static and traditional / standard database technologies will suffice. Sure, it may be useful to be able to export a bank statement in a portable (perhaps XML-based) format, but inevitably the value in using SW-related technologies is very minimal.

Ofcourse, you could argue that online banking systems have a lot to gain from these technologies, but the example was of pure transactional management, the portal that manages the social aspects of money management is a layer on top.

However, where there is a need to leverage:

  • More expressive mechanisms for data collection (think XForms)
  • (Somewhat) unambiguous interpretation of content (think FOL and DL)
  • Expressive data mining (think RDF querying languages)
  • Portable message / document formats (think XML)
  • Data manipulation (think XSLT)
  • Consistent addressing of distributed resources (think URLs)
  • General automation of data management (think Document Definitions and GRDDL)

These technologies will have an impact on how things are done. It's worth noting that these needs aren't restricted to distributed databases (which is the other assumption about the Semantic Web - that it only applies within the context of the 'Web'). Consider the Wiki example and the advantages that Semantic Wikis have over them:

  • Much Improved possibility of data mining from more formal representation of content
  • 'Out-of-the-box' interoperability with tools that speak in SW dialects
  • Possibility of certain amount of automation from the capabilities that interpretation bring

It's also worth noting that recently the Semantic Wiki project introduced mechanisms for using other vocabularies for 'marking-up' content (FOAF being the primary vocabulary highlighted).

It's dually important in that 1) it demonstrates the value in incorporating well-established vocabularies with relative ease and 2) the policed way in which these additional vocabularies can be used demonstrate precisely the middle ground between a very liberal, open world assumption, approach to distributed data in the SW and controlled, closed, (single-purpose) systems approach.

Such constraints can allow for some level of uniformity that can have very important consequences in very different areas: XML as a messaging interlingua and extraction of RDF.

Consider the value in developing a closed vocabulary with it's semantics spelled out very unambiguously in RDF/RDFS/OWL and a uniform XML representation of it's instances with an accompanying XSLT transform (something the AtomOWL project is attempting to achieve).

What do you gain? For one thing, XForms-based data entry for the uniform XML instances and a direct, (relatively) unambiguous mapping to a more formal representation model – each of which have their own very long list of advantages they bring by themselves much less in tandem!

Stand-alone databases (where their needs intersect with the value in SW technologies) stand to gain: Portable, declarative data entry mechanisms, interoperability, much improved capabilities for interpretation and synthesis of existing information, increased automation of data management (by closing the system certain operations become much more predictable), and the additional possibilities for alternative reasoning heuristics that take advantage of closed world assumptions.

Chimezie Ogbuji

via Copia

Updating Metacognition Software Stack

Metacognition was down for some time as I updated the software stack that it runs on (namely 4Suite and RDFLib). There were some core changes:

  • 4Suite repository was reinitialized using the more recent persistence drivers.
  • I added a seperate section for archived publications (I use Google Reader's label service for copia archives)
  • I switched to using RDFLib as the primary RDF store (using a recently written maping between N3/FOL and a highly efficient SQL schema) and the filesystem for everything else
  • I added the SIOC ontology to the core index of ontologies
  • Updated my FOAF graph and Emeka's DOAP graph

Earlier, I wrote up the mapping in a formal notation (using MathML so it will only be viewable in a browser that supports it - like firefox) that RDFLib's FOPLRelationalModel was based on.

In particular, it's incredibly more responsive and better organized. Generally, I hope for it to serve two purposes: 1) Organize my thoughts on and software related to applying Semantic Web Technologies to 'closed' content management systems 2) Serve as a (markdown-powered) whiteboard and playground for tools / demos for advocacy on best practices in problem solving with these technologies.

Below is a marked-up diagram of some of these ideas.

Metacognition-Roadmap

The publications are stored in a single XML file and are rendered (run-time at the server) using a pre-compiled XSLT stylesheet against a cached Domlette. Internally the document is mapped into RDF persistence using an XSLT document definition associated with the document so all modifications are synched into an RDF/XML equivalent.

Mostly as an academic exercise - since the 4Suite repository (currently) doesn't support document definitions that output N3 and the content-type of XSLT transform responses is limited to HTML/XML/Text - I wrote an equivalent publications-to-n3.xslt. The output is here.

Chimezie Ogbuji

via Copia