Why Web Architecture Shouldn't Dictate Meaning

This is a very brief demonstration motivated by some principled arguments I've been making over the last week or so regarding Web Architecture dictates which are ill-concieved and may do more damage to the Semantic Web than good. A more fully articulated argument is sketched out in "HTTP URIs are not Without Expense" and "Semiotics of RDF Signs". In particular, the argument about why most of the httpRange-14 dialog is confusing dereference with denotation. I've touched on some of this before.

Anywho, the URI I've minted for myself is

http://metacognition.info/profile/webwho.xrdf#chime

When you 'dereference' it, the server responds with:

chimezie@otherland:~/workspace/Ontologies$ curl -I http://metacognition.info/profile/webwho.xrdf#chime
HTTP/1.1 200 OK
Date: Thu, 30 Aug 2007 06:29:12 GMT
Server: Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 mod_python/3.2.10 Python/2.4.4 PHP/4.4.4-8+etch1 proxy_html/2.5 mod_ssl/2.2.3 OpenSSL/0.9.8c mod_perl/2.0.2 Perl/v5.8.8
Last-Modified: Mon, 23 Apr 2007 03:09:22 GMT
Content-Length: 6342
Via: 1.1 www.metacognition.info
Expires: Thu, 30 Aug 2007 07:28:26 GMT
Age: 47
Content-Type: application/rdf+xml

According to TAG dictate, a 'web' agent can assume it refers to a document (yes, apparently an RDF document composed this blog you are reading).

Update: Bijan points out that my example is mistaken. The TAG dictate only allows the assumption to be made of the URI which goes across the wire (the URI with the fragment stripped off). The RDF (FOAF) doesn't make any assertions about this (stripped) URI being a foaf:Person. This is technically correct, however, the concern I was highlighting still holds (albeit it is more likely to confuse folks who are already confused about dereference and denotation). The assumption still gets in the way of 'proper' interpretation. Consider if I had used the FOAF graph URL as the URL for me. Under which mantra would this be taboo? Furthermore, if I wanted to avoid confusing unintelligent agents such as this one above, which URI scheme would I be likely to use? Hmmm...

Okay, a more sophisticated semantic web agent parses the RDF and understands (via the referential mechanics of model theory) that the URI denotes a foaf:Person (much more reasonable). This agent is also much better equipped to glean 'meaning' from the model-theoretic statements made about me instead of jumping to binary conclusions.

So I ask you, which agent is hampered by a dictate that has all to do with misplaced pragmatics and nothing to do with semantics? Until we understand that the 'Semantic Web' is not 'Web-based Semantics', Jim Hendler's question about where all the agents are (Where are all the agents?) will continue to go unanswered and Tim Bray's challenge will never be fulfilled.

A little tongue-in-cheek, but I hope you get the point

Chimezie Ogbuji

via Copia

Why FuXi?

So, I updated the cheeseshop entry for FuXi (should that be a capital 'X'?). This is the freeware I forced myself to write in order to better express myself (I don't always do a good job of that in person), and engage people, generally. It is very fast (so, I use it wherever I need to do any OWL/N3 inference ). I hope to port its serialize/parse capabilities to use (in addition): SWRL, the "new" Rule Interchange Format, and CycML (since this is trivial with 4Suite and OpenCyc is, well, "open")

I host it on Google Code because I like their combined service: um, it's free, the use of Subversion, a mailing list component, a Wiki, and other community services. In addition, I can synchronize my license(s) - in this case Fuxi's license is bare-bones BSD (I wonder if I should switch to an Apache license?). I link my cheeseshop entry to the Google Code page, and this is the primary "entry point" for package management. Cheeseshop + easyinstall + Python = very painless. I'm planning on setting up triclops this way (a WSGI-based SPARQL service).

Update: I added a google group for Fuxi: All discussion on Fuxi

Doing this brought me back to the question of why I gave this piece of software a name (see: origin) which conventional wisdom might consider "odd". I named it after a very coherent philosophy written a very loong time ago. Sometime in 2004, I started reading alot of text from that canon and then did some experimentation with 1) capturing the trigrams in OWL 2) generating SVG diagrams of them as an additional serialization. These were some of my older Copia entries.

The text is very mathematical, in fact it is based (almost entirely) on the binary numerical system. My formal "study" was Computer Engineering, which emphasized microprocessor theory (all of which is based on the binary numerical system as well), so my interest was not just "spiritual" but also very practical as I have come to a better appreciation of microprocessor theory many years after graduating from the University of Champaign Urbana.

My interest is also very historical. I believe that the theory that these text are based on represent some of the oldest human analysis of semiotics, binary numerics, psychology, and ontology. I have heard that the oldest ontology is purported to be Aristotle's, but I think this is very much mistaken if you consider the more mathematical aspects of "classic" semiotics. This was why I thought it would be interesting (at the time) to capture the trigrams in OWL (i.e., the formal theory) with annotations that consist of the better English translations of the original text (the Yijing) as well as SVG diagram exports.

This could serve as a good tool for older generations that study these text via conventional methods (consider the nature of the more oral traditions). Igbo tradition (my tradition) is very much "oral". I had thought at the time that a tool which relied on inference to interpret this ancient theory (for students of this ancient theory) would make for a good demonstration of "a" value proposition for Semantic Web technologies in a (very) unintended way. In many ways, the "philosophies" of open source/communities/standards echo a contemporary manifestation of this older way of life. It gives me some relief amidst a modern society obsessed with military expenditure (one of the oldest human archetypes).

However, at that point, my day job picked up. Even though I use Fuxi every day to do inference for reasons other than the original intent, I decided to keep the original name as motivation to (someday) go back to that particular "project", at least as a way to excercise my self-expression (which, as I said earlier, I normally do a poor job of this).

Chimezie Ogbuji

via Copia

SemanticDB: A CMS Methodology for the Enterprise Semantic Web

This is a heads up that this weekend I will be writing a full length article giving an architectural overview of SemanticDB, a Content Management System methodology (and implementation) we have been developing at the Cleveland Clinic Foundation over the last 4 years (since I've been there). I hope this may trigger more public dialog (which unfortunately has not been happening) about how we've been leveraging Semantic Web and document management (XML-related technologies) W3C standards to build a robust platform to facilitate all aspects of clinical research.

I believe most of our success with SemanticDB can be attributed to the strenghts of the standards being leveraged, the opensource tools (specifically: Python, 4Suite, RDFLib, FuXi, Paste, and FormsPlayer) which serve as primary infrastructure, and the enthusiasm of the relevant communities behind these open standards and opensource software tools. As such, it is only fair that these communities become aware of examples which demonstrate how this arena is slowly transitioning from the realm of pure research to practical problem solving. In addition, I hope it may contribute to the growing body of literature demonstrating concrete problems being solved with these technologies in a way that simply would not have been possible via legacy means.

Below is the abstract from a technological whitepaper that has been in clandestine distribution as we have sought to ramp up our efforts:

SemanticDB represents a methodology for building and maintaining a highly flexible content management system built around a centralized vocabulary.
This vocabulary incorporates a formal semantics (via a combination of an OWL ontology and a N3 ruleset) for heirarchical document composition and an abstract framework for modelling terms in a domain in a way that facilitates semi-automated use of native XML and RDF representations.

It relies on a very recent set of technologies (Semantic Web technologies) to automate certain aspects of data entry, structure, storage, display, and retrieval with minimal intervention by traditional database administrators and computer programmers. A SemanticDB instance is built around a domain model expressed using a Data Node Model. From the domain model various XML/RDF management components are generated. The Data Mason takes a domain model and generates XML schemas, formal ontologies, XSLT transforms, stored queries, XML templates, document definitions, etc.

The ScreenCompiler takes an abstract representation of a data entry form (with references to terms in a domain model) and generates a user interface in a particular host language (currently XForms)

Below is a core diagram of the methodology which facilitates semi-automated data management of XML & RDF content:

The Data Mason and automated XML/RDF Data Management

Watch this space..

Chimezie Ogbuji

via Copia

Investor's Business Daily, Zepheira, and what some insist on calling "Web 3.0"

O yes, I've been quiet. What a year. My consulting work at Sun is pretty much a full-time job. As if that's not enough, I moved from Fourthought to Kadomo. I'd scarcely been there for a month when most of us decided we needed to restructure and start afresh, and thus was born Zepheira. Joining one company at early stage, and then launching another two months later is no way to have a life left over. I've had no time whatsoever for Weblogging, and what little time I have had for such things I've given over to my OSS projects such as 4Suite, Amara and Bright Content.

Then a curious thing happened today. We learned that our company (which has been getting oodles of good press lately was featured in an Investor's Business Daily story, but that the article would likely disappear from the Web forever by the end of the day. (we've all heard that "Cool URIs don't change", so I'm dismayed that here's a cool URI that might just completely vanish). I guess when those cats say "Daily" they are not playing. Something about the ephemeral nature of the news inspired me to throw in my tuppence.

Anyways "After All This Interactivity, Look Out For Web 3.0 Leap" (linked via PURL, just in case) discusses Semantic Web at suitable high level, and since it includes an interview with Eric Miller, includes a good dose of practicality. Quoting the article:

Software giants Oracle and Adobe Systems already support or plan to back the RDF and OWL standards to represent data in some of their products.

These Web standards should help companies spot new relationships among huge sets of data and use the findings for better conclusions about their business, says Eric Miller, president of Web startup Zepheira.

"We want the ability to free data from applications and use the data in other applications for which it was not originally intended," said Miller.

Current Web 2.0 firms could apply the future benefits of metadata in Web 3.0.

For instance, MySpace might let personal pages share information with the pages of relevant friends or colleagues in the social network.

Take someone whose MySpace page describes a fondness for vintage jazz. By entering that information once, that person could automatically be linked to others who share the same interest.

Furthermore, that information could be applied to future Web searches for new music releases. In effect, using metadata could become a way to make MySpace "truly mine," said Miller.

And no, this is not magic. It's no more than taking the open data precepts people already associate with Web 2.0, and making them a bit easier to aggregate. And yes, this makes them easier for sharp types to run game: the Web has ben easy to game from day one, and we've managed just fine. Even phishies in Russia, despite dire warnings of Total Internet Meltdown, have never posed more actual threat than any other scam mechanism such as those that come through that other perilous instrument: your phone. As Eric goes on to say:

"This means there is a much more flexible, personalized integration point to really connect people," he said. "The notion here is to enter data just once, but to use it often."

In recent years, Miller led the Semantic Web program at the World Wide Web Consortium.

Yeah, that simple. DRY for Web data. Now does that sound like strong AI redux to you? Apparently some people are incapable of hearing anything else, as the very end of this article shows. Then again, the W3C themselves can take some of the blame for creating that straw man.

Web 3.0 involves building a Web of interconnected data, Miller says. This approach will let companies quickly change computer processes as their business needs evolve.

"What we've got here is a set of useful technologies that when combined become very powerful," he said. "This makes it easier to free the data from the application that created it and make it more useful and easier to combine with other little bits of information."

Yeah. Separating data from applications has kinda been an obsession of mine for a while now, and it really ain't that hard, and it's about time someone brought such practical solutions to the enterprise. It's a very important generalization of the "separate content from presentation" mantra that just about every Web developer has heard 1000 times, and reading Eric, you might get a sense of why I've poured som much of myself into this startup. I think that not only are we architecting like Google, but we're following natural lessons Google brings to a new generation of architecture. And I'm a bit surprised as well as gratified at how well this message has been playing in a growing number of enterprises.

Certainly that's the sort of thing I might mention to an investor type at a cocktail party, although you wouldn't hear me say "Web 3.0" (reminds me too uncomfortably of the RSS wars). And I'd certainly be wary of mentioning to said investor type exactly where I work.

Update: so seems I was too cryptic in the last para. I'm definitely proud of my company; I meant that neither I nor any of the other partners are courting investment

[Uche Ogbuji]

via Copia

The Architectural Style of a Simple Interlingua

It just occurred to me that there is a strong correlation between the hardest nuance to get (or grok, as the saying goes) about REST and RDF.

With RDF, there is the pervasive Clay Shirky misconception that the semantic web is about one large-ontology-to rule-them-all. I've made it a point to start every semantic web-related presentation with some background information about Knowledge Representation (yes, that snow-covered relic of the AI winter). Knowledge Representation Triangle My favorite initial read on the subject is "How To Tell Stuff To A Computer - The Enigmatic Art of Knowledge Representation". As a follow-up, I'd suggest "What is a Knowledge Representation?" .

The thing that we miss (or forget) most often is that formal knowledge representations are first about a common syntax (and their interpretation: semantics) and then about the vocabularies you build with the common syntax. A brief read on the history of knowledge representation emphasizes this subtle point. At each point in the progression, the knowledge representation becomes more expressive or sophisticated but the masonry is the same.

With RDF, first there is the RDF abstract syntax, and then there are the vocabularies (RDFS,OWL,FOAF,DC,SKOS,etc..). Similarly (but more recursively), a variety of grammars can each be written to define a distinct class of XML documents all via the same language (RELAX NG, for instance). An Application Programming Interface (API) defines a common dialect for a variety applications to communicate with. And, finally, the REST architectural style defines a uniform interface for services, to which a variety of messages (HTTP messages) conform.

In each case, it is simplicity that is the secret catalyst. The RDF abstract syntax is nowhere as expressive as Horn Logic or Description Logic (this is the original motivation for DAML+OIL and OWL), but it is this limitation that makes it useful as a simple metadata framework. RELAX NG is (deceptively) much simpler than W3C XML Schema (syntactically), but its simple syntax makes it much more malleable for XML grammar contortions and easier to understand. The REST architectural style is dumbfounding in its simplicity (compared to WS-*) but it is this simple uniformity that scales so well to accommodate every nature of messaging between remote components. In addition, classes of such messages are trivial to describe.

So then, the various best practices in the Semantic Web canon (content negotiated vocabulary addresses, http-range14, linked data, etc..) and those in the REST architectural style are really manifestations of the same principle in two different arenas: knowledge representation and network protocols?

Chimezie Ogbuji

via Copia

A Content Repository API for Rich, Semantic Web Applications?

[by Chimezie Ogbuji]

I've been working with roll-your-own content repositories long enough to know that open standards are long overdue.

The slides for my Semantic Technology Conference 2007 session are up: "Tools for the Next Generation of CMS: XML, RDF, & GRDDL" (Open Office) and (Power point)

This afternoon, I merged documentation of the 4Suite repository from old bits (and some new) into a Wiki that I hope to contribute to (every now and then).
I think there is plenty of mature, supporting material upon which a canon of best practices for XML/RDF CMSes can be written, with normative dependencies on:

  • GRDDL
  • XProc
  • Architecture of the World Wide Web, Volume One
  • URI RFCs
  • Rich Web Application Backplane
  • XML / XML Base / XML Infoset
  • RDDL
  • XHTML 1.0
  • SPARQL / Versa (RDF querying)
  • XPath 2.0 (JSR 283 restriction) with 1.0 'fallback'
  • HTTP 1.0/1.1, ACL-based HTTP Basic / Digest authentication, and a convention for Web-based XSLT invokation
  • Document/graph-level ACL granularity

The things that are missing:

  • RDF equivalent of DOM Level 3 (transactional, named graphs, connection management, triple patterns, ... ) with events.
  • A mature RIF (there is always SWRL, Notation 3, and DLP) as a framework for SW expert systems (and sentient resource management)
  • A RESTful service description to complement the current WSDL/SOAP one

For a RESTful service description, RDF Forms can be employed to describe transport semantics (to help with Agent autonomy), or a mapping to the Atom Publishing Protocol (and a thus a subset of GData) can be written.

In my session, I emphasized how closely JSR 283 overlaps with the 4Suite Repository API.

The delta between them mostly has to do with RDF, other additional XML processing specifications (XUpdate, XInclude, etc.), ACL-based HTTP authentication (basic, and sessions), HTTP/XSLT bindings, and other miscellaneous bells and whistles

Chimezie Ogbuji

via Copia

Linked Data and Overselling the HTTP URI Scheme

So, I'm going to do something which may not be well-recieved: I'm going to push-back (slightly) on the Linked Data movement, because, frankly, I think it is a bit draconian with respect to the way it oversells the HTTP URI scheme (points 3 and 4):

2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information.

There is some interesting overlap as well between this overselling and a recent W3C TAG finding which takes a close look at motivations for 'inventing' URI schemes instead of re-using HTTP. The word 'inventing' seems to suggest that the URI specification discourages the use of URI schemes beyond the most popular one. Does this really only boil down to an argument of popularity?

So, here is an anecdotal story that is based part in fiction and part in fact. So, a vocabulary author within an enterprise is (at the very beginning) has a small domain in mind that she wants to build some concensus around by developing an RDF vocabulary. She doesn't have any authority with regards to web space within (or outside) the enterprise. Does she really have to stop developing her vocabulary until she has selected a base URI from which she can gurantee that something useful can be dereferenced from the URIs she mints for her terms? Is it really the case that her vocabulary has no 'semantic web' value until she does so? Why can't she use the tag scheme (for instance) to identify her terms first and then worry later about the location of the vocabulary definition. Afterall, those who push HTTP URI schemes as a panacea solution must be aware that URIs are about identification first and location second (and this latter characteristic is optional).

Over the years, I've developed an instinct to immediately question arguments that suggests a monopoly on a particular approach. This seems to be the case here. Proponents of a HTTP URI scheme monoploy for follow your nose mechanics (or auto discovery of useful RDF data) seem to suggest (quite strongly) that using anything else besides the HTTP URI scheme is bad practice, without actually saying so. So, if this is not the case, my original question remains: is it just a URI scheme popularity contest? If the argument is to make it easy for clients to build web closure then I've argued before that there are better ways to do this without stressing the protocol with brute force and unintelligent term 'sniffing'.

It seems to be a much better approach to be unambigious about the the trail left for software agents by using an explicit term (within a collection of RDF statements) to point to where more aditionally useful information can be retrieved for said collection of RDF statements. There is already decent precedent in terms such as rdfs:seeAlso and rdfs:isDefinedBy. However, these terms are very poorly defined and woefully abused (the latter term especially).

Interestingly, I was introduced to this "meme" during a thread on the W3C HCLS IG mailing list about the value of the LSID URI scheme and whether it is redundant with respect to HTTP. I believe this disconnect was part of the motivation behind the recent TAG finding: URNs, Namespaces and Registries. Proponents of a HTTP URI scheme monopoly should educate themselves (as I did) on the real problems faced by those who found it neccessary to 'invent' a URI scheme to meet needs they felt were not properly addressed by the mechanics of the HTTP protocol. They reserve that right as the URI specification does not endorse any monopolies on schemes. See: LSID Pros & Cons

Frankly, I think fixing what is broken with rdfs:isDefinedBy (and pervasive use of rdfs:seeAlso - FOAF networks do this) is sufficient for solving the problem that the Linked Data theme is trying to address, but much less heavy handedly. What we want is a way to say is:

this collection of RDF statements are 'defined' (ontologically) by these other collections of RDF statements.

Or we want to say (via rdfs:seeAlso):

with respect to this current collection of RDF statements you might want to look at this other collection

It is also worth noting the FOAF namespace URI issues which recently 'broke' Protege. It appears some OWL tools (Protege - at the time) were making the assumption that the FOAF OWL RDF graph would always be resolvable from the base namespace URI of the vocabulary: http://xmlns.com/foaf/0.1/ . At some point, recently, the namespace URI stopped serving up the OWL RDF/XML from that URI and instead served up the specification. Nowhere in the the human-readable specification (which - during that period - was what was being served up from that URI) is there a declaration that the OWL RDF/XML is served up from that URI. The only explicit link is to : http://xmlns.com/foaf/spec/20070114.rdf

However, how did Protege come to assume that it could always get the FOAF OWL RDF/XML from the base URI? I'm not sure, but the short of it was that any vocabulary which referred to FOAF (at that point) could not be read by Protege (including my foundational ontology for Computerized Patient Records - which has since moved away from using FOAF for reasons that included this break in Protege).

The problem here is that Protege should not have been making that assumption but should have (instead) only attempted to assume an OWL RDF/XML graph could be dereferenced from a URI if that URI is the object of an owl:imports statement. I.e.,

http://example.com/ont owl:imports http://xmlns.com/foaf/spec/20070114.rdf

This is unambigous as owl:imports is very explicit about what the URI at the other end points to. If you setup semantic web clients to assume they will always get something useful from the URI used within an RDF statement or that HTTP schemed URI's in an RDF statement are always resolveable then you set them up for failure or at least alot of uneccessary web crawling in random directions.

My $0.02

Chimezie Ogbuji

via Copia

Planet Atom's Information Pipeline

The hosting of Planet Atom has moved (perhaps temporarily) over to Athena: Christopher Schmidt's excellent hosting service.
Metacognition is hosted there. In addition, the planetatom source was extended to support additional representational modes for the aggregated Atom feed: GRDDL, RDFa, Atom OWL RDF/XML (via content negotiation), and Exhibit.

The latter was the subject of my presentation at XTech 2007. As I mentioned during my session, you can go to http://planetatom.net/exhibit to see the live faceted-browsing of the aggregated Atom feed. An excerpt from the Planet Atom front page describes the nature of the project:

Planet Atom focuses Atom streams from authors with an affinity for syndication and Atom-specific issues. This site was developed by Sylvain Hellegouarch, Uche Ogbuji, John L. Clark, and Chimezie Ogbuji

I wrote previously (in my XML 2006 paper) on this methodology of splicing out multiple (disjoint) representations from a single XML source and the Exhibit mode is yet another facet: specifically for quick, cross-browser, filtering of the aggregated feed.

Planet Atom Pipeline

The Planet Atom pipleline as a whole is pretty interesting. First an XBEL bookmark document is used as the source for aggregation. RESTful caching minimizes load on the sources during aggregation. The aggregation groups the entries by calendar groups (months and days). The final aggregated feed is then sent through several XML pipelines to generate the JSON which drives the Exhibit view, an HTML version of the front page, an XHTML version of the front page (one of the prior two is served to the client depending on the kind of the agent which requested the front page), and an RDF/XML serialization of the feed expressed in Atom OWL.

Note in the diagram that a Microformat approach could have been used instead to embed the Atom OWL statements. RDFa was used instead as it was much easier to encode the statements in a common language and not contend with adding profiles for each Microformat used. Elias's XTech 2007 presentation touched a bit on this contrast between the two approaches. In either case, GRDDL is used to extract the RDF.

These representations are stored statically at the server and served appropriately via a simple CherryPy module As mentioned earlier, the XHTML front page now embeds the Atom OWL assertions about the feed (as well as assertions about the sources, their authors, and the Planetatom developers) in RDFa and includes hooks for a GRDDL-aware Agent to extract a faithful rendition of the feed in RDF/XML. The same XML pipeline which extracts the RDF/XML from the aggregated feed is identified as a GRDDL transform. So, the RDF can either be fetched via content negotiation or by explicit GRDDL processing.

Unfortunately, the RDFa is broken. RDFa can be extracted by an explicit parser (which is how Elias Torrez's Python-based RDFa parser, his recent work on operator, and Ben Adida's RDFa bookmarklets ) or via XSLT as part of a GRDDL mechanism. Due to a quirk in the way RDFa uses namespace declarations (which may or may not be a necessary evil ), the various vocabularies used in the resulting RDF/XML are not properly expanded from the CURIES to their full URI form. I mentioned this quirk to Steven Pemberton.

As it happens, the stylesheet which transforms the aggregated Atom feed into the XHTML host document defines the namespace declarations:

xmlns:dc="http://purl.org/dc/elements/1.1/" 
  xmlns:foaf="http://xmlns.com/foaf/0.1/" 
  xmlns:aOwl="http://bblfish.net/work/atom-owl/2006-06-06/#" 
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

However, since there are not any elements which use QNames formed from these declarations, they are not included in the XSLT result! This trips up the RDF -> RDF/XML transformation (written by Fabien Gandon, a fellow GRDDL WG member) and results in RDF statements where the URIs are simply the CURIEs as originally expressed in RDFa. This seems to only be a problem for XSLT processors which explicitly strip out the unused namespace declarations. They have a right to do this as it has no effect on the underlying infoset. Something for the RDF-in-XHTML task group to consider, especially for scenarios such as this where the XHTML+RDFa is not hand-crafted but produced from an XML pipeline.

[Uche Ogbuji]

via Copia

Musings of a Semantic / Rich Web Architect: What's Next?

I'm writing this on my flight back from XTech 2007, Paris, France. This gives me a decent block of time to express some thoughts and recent developments. This is the only significant time I've had in a while to do any writing.
My family

Between raising a large family, software development / evangelism, and blogging I can only afford to do two of these. So, blogging loses out consistently.

My paper (XML-powered Exhibit: A Case Study of JSON and XML Coexistence) is now online. I'll be writing a follow-up blog on how http://planetatom.net demonstrates some of what was discussed in that paper. I ran into some technical difficulties with projecting from Ubuntu, but the paper covers everything in detail. The slides are here.

My blog todo list has become ridiculously long. I've been heads-down on a handful of open source projects (mostly semantic web related) when I'm not focusing on work-related software development.
Luckily there has been a very healthy intersection of the open source projects I work on and what I do at work (and have been doing non-stop for about 4 years). In a few cases, I've spun these 'mini-projects' off under an umbrella project I've been working on called python-dlp. It is meant (in the end) to be a toolkit for semantic web hackers (such as myself) who want to get their hands dirty and have an aptitude for Python. There is more information on the main python-dlp page (linked above).

sparql-p evaluation algorithm Some of the other things I've been working on I'd prefer to submit to appropriate peer-reviewed outlets considering the amount of time I've put into them. First, I really would like to do a 'proper' write-up on the map/reduce approach for evaluating SPARQL Algebra expressions and the inner mechanics of Ivan Herman's sparql-p evaluation algorithm. The latter is one of those hidden gems I've become closely familiar with for some time that I would very much like to examine in a peer-reviewed paper especially if Ivan is interested doing so in tandem =).

Since joining the W3C DAWG, I've had much more time to get even more familiar with the formal semantics of the Algebra and how to efficiently implement it on-top of sparql-p to overcome the original limitation on the kinds of patterns it can resolve.

I was hoping (also) to release and talk a bit about a SPARQL server implementation I wrote in CherryPy / 4Suite / RDFLib for those who may find it useful as a quick and dirty way to contribute to the growing number of SPARQL endpoints out there. A few folks in irc:///freenode.net/redfoot (where the RDFLib developers hang out) have expressed interest, but I just haven't found the time to 'shrink-wrap' what I have so far.

On a different (non-sem-web) note, I spoke some with Mark Birbeck (at XTech 2007) about my interest in working on a 4Suite / FormsPlayer demonstration. I've spent the better part of 3 years working on FormsPlayer as a client-side platform for XML-driven applications served from a 4Suite repository and I've found the combination quite powerful. FormsPlayer (and XForms 1.1 specifically) is really the icing on the cake which takes an XML / RDF Content Management System like the 4Suite repository and turns it into a complete platform for deploying next generation rich web applications.

The combination is a perfect realization of the Rich Web Application Backplane (a reoccurring theme in my last two presentations / papers) and it is very much worth noting that some of the challenges / requirements I've been able address with this methodology can simply not be reproduced in any other approach: neither vanilla DHTML, .NET, J2EE, Ruby on Rails, Django, nor Jackrabbit. The same is probably the case with Silverlight and Apollo.

In particular, when it comes to declarative generation of user interfaces, I have yet to find a more complete approach than via XForms.

Mark Birbeck's presentation on Skimming is a good read (slides / paper is not up yet) for those not quite familiar with the architectural merits of this larger methodology. However, in his presentation eXist was used as the XML store and it struck me that you could do much more with 4Suite instead. In particular, as a CMS with native support for RDF as well as XML it opens up additional avenues. Consider extending Skimming by leveraging the SPARQL protocol as an additional mode of expressive communication beyond 'vanilla' RESTful operations on XML documents.

These are very exciting times as the value proposition of rich web (I much prefer this term over the much beleaguered Web 2.0+) and semantic web applications has fully transitioned from vacuous / academic musings to concretely demonstrable in my estimation. This value proposition is still not being communicated as well as it could, but having bundled demos can bridge this gap significantly in my opinion; much more so than just literature alone.

This is one of the reasons why I've been more passionate about doing much less writing / blogging and more hands-on hacking (if you will). The original thought (early on this year) was that I would have plenty to write about towards the middle of this year and time spent discussing the ongoing work would be premature. As it happens, things turned out exactly this way.

There is a lesson to be learned for how the Joost project progressed to where it is. The approach of talking about deployed / tested / running code has worked perfectly for them. I don't recall much public dialog about that particular effort until very recently and now they have running code doing unprecedented things and the opportunity (I'm guessing) to switch gears to do more evangelism with a much more effective 'wow' factor.

Speaking of wow, I must say of all the sessions at XTech 2007, the Joost session was the most impressive. The number of architectures they bridged, the list of demonstrable value propositions, the slick design, the incredibly agile and visionary use the most appropriate technology in each case etc.. is an absolutely stunning achievement.

The fact that they did this all while remembering their roots: open standards, open source, open communities leaves me with a deep sense of respect for all those involved in the project. I hope this becomes a much larger trend. Intellectual property paranoia and cloak / dagger completive edge is a thing of the past in today's software problem solving landscape. It is a ridiculously outdated mindset and I hope those who can effect real change (those higher up in their respective ORG charts than the enthusiastic hackers) in this regard are paying close attention. Oh boy. I'm about to launch into a rant, so I think I'll leave it at that.

The short of it is that I'm hoping (very soon) to switch gears from heads-down design / development / testing to much more targeted write-ups, evangelism, and such. The starting point (for me) will be Semantic Technology Conference in San Jose. If the above topics are of interest to you, I strongly suggest you attend my colleague's (Dr. Chris Pierce) session on SemanticDB (the flagship XML & RDF CMS we've been working on at the Clinic as a basis for Computerized Patient Records) as well as my session on how we need to pave a path to a new generation of XML / RDF CMSes and a few suggestions on how to go about paving this path. They are complementary sessions.

Jackrabbit architecture

JSR 170 is a start in the right direction, but the work we've been doing with the 4Suite repository for some time leaves me with the strong, intuitive impression that CMSes that have a natural (and standardized) synthesis with XML processing is only half the step towards eradicating the stronghold that monolithic technology stacks have over those (such as myself) with 'enterprise' requirements that can truly only be met with the newly emerging sets of architectural patterns: Semantic / Rich Web Applications. This stronghold can only be eradicated by addressing the absence of a coherent landscape with peer-reviewed standards. Dr. Macro has an incredibly visionary series of 'write-ups' on XML CMS that paints a comprehensive picture of some best practices in this regard:

However (as with JSR 170), there is no reason why there isn't a bridge or some form of synthesis with RDF processing within the confines of a CMS.

There is no good reason why I shouldn't be able to implement an application which is written against an abstract API for document and knowledge management irrespective of how this API is implemented (this is very much aligned with larger goal of JSR 170). There is no reason why the 4Suite repository is the only available infrastructure for supporting both XML and RDF processing in (standardized) synthesis.

I should be able to 'hot-swap' RDFLib with Jena or Redland, 4Suite XML with Saxon / Libxml / etc.., and the 4Suite repository with an implementation of a standard API for synchronized XML / RDF content management. The value of setting a foundation in this arena is applicable to virtually any domain in which a CMS is a necessary first component.

Until such a time, I will continue to start with 4Suite repository / RDFLib / formsPlayer as a platform for Semantic / Rich Web applications. However, I'm hoping (with my presentation at San Jose) to paint a picture of this vacuum with the intent of contributing towards enough of a critical mass to (perhaps) start putting together some standards towards this end.

Chimezie Ogbuji

via Copia