Musings of a Semantic / Rich Web Architect: What's Next?

I'm writing this on my flight back from XTech 2007, Paris, France. This gives me a decent block of time to express some thoughts and recent developments. This is the only significant time I've had in a while to do any writing.
My family

Between raising a large family, software development / evangelism, and blogging I can only afford to do two of these. So, blogging loses out consistently.

My paper (XML-powered Exhibit: A Case Study of JSON and XML Coexistence) is now online. I'll be writing a follow-up blog on how http://planetatom.net demonstrates some of what was discussed in that paper. I ran into some technical difficulties with projecting from Ubuntu, but the paper covers everything in detail. The slides are here.

My blog todo list has become ridiculously long. I've been heads-down on a handful of open source projects (mostly semantic web related) when I'm not focusing on work-related software development.
Luckily there has been a very healthy intersection of the open source projects I work on and what I do at work (and have been doing non-stop for about 4 years). In a few cases, I've spun these 'mini-projects' off under an umbrella project I've been working on called python-dlp. It is meant (in the end) to be a toolkit for semantic web hackers (such as myself) who want to get their hands dirty and have an aptitude for Python. There is more information on the main python-dlp page (linked above).

sparql-p evaluation algorithm Some of the other things I've been working on I'd prefer to submit to appropriate peer-reviewed outlets considering the amount of time I've put into them. First, I really would like to do a 'proper' write-up on the map/reduce approach for evaluating SPARQL Algebra expressions and the inner mechanics of Ivan Herman's sparql-p evaluation algorithm. The latter is one of those hidden gems I've become closely familiar with for some time that I would very much like to examine in a peer-reviewed paper especially if Ivan is interested doing so in tandem =).

Since joining the W3C DAWG, I've had much more time to get even more familiar with the formal semantics of the Algebra and how to efficiently implement it on-top of sparql-p to overcome the original limitation on the kinds of patterns it can resolve.

I was hoping (also) to release and talk a bit about a SPARQL server implementation I wrote in CherryPy / 4Suite / RDFLib for those who may find it useful as a quick and dirty way to contribute to the growing number of SPARQL endpoints out there. A few folks in irc:///freenode.net/redfoot (where the RDFLib developers hang out) have expressed interest, but I just haven't found the time to 'shrink-wrap' what I have so far.

On a different (non-sem-web) note, I spoke some with Mark Birbeck (at XTech 2007) about my interest in working on a 4Suite / FormsPlayer demonstration. I've spent the better part of 3 years working on FormsPlayer as a client-side platform for XML-driven applications served from a 4Suite repository and I've found the combination quite powerful. FormsPlayer (and XForms 1.1 specifically) is really the icing on the cake which takes an XML / RDF Content Management System like the 4Suite repository and turns it into a complete platform for deploying next generation rich web applications.

The combination is a perfect realization of the Rich Web Application Backplane (a reoccurring theme in my last two presentations / papers) and it is very much worth noting that some of the challenges / requirements I've been able address with this methodology can simply not be reproduced in any other approach: neither vanilla DHTML, .NET, J2EE, Ruby on Rails, Django, nor Jackrabbit. The same is probably the case with Silverlight and Apollo.

In particular, when it comes to declarative generation of user interfaces, I have yet to find a more complete approach than via XForms.

Mark Birbeck's presentation on Skimming is a good read (slides / paper is not up yet) for those not quite familiar with the architectural merits of this larger methodology. However, in his presentation eXist was used as the XML store and it struck me that you could do much more with 4Suite instead. In particular, as a CMS with native support for RDF as well as XML it opens up additional avenues. Consider extending Skimming by leveraging the SPARQL protocol as an additional mode of expressive communication beyond 'vanilla' RESTful operations on XML documents.

These are very exciting times as the value proposition of rich web (I much prefer this term over the much beleaguered Web 2.0+) and semantic web applications has fully transitioned from vacuous / academic musings to concretely demonstrable in my estimation. This value proposition is still not being communicated as well as it could, but having bundled demos can bridge this gap significantly in my opinion; much more so than just literature alone.

This is one of the reasons why I've been more passionate about doing much less writing / blogging and more hands-on hacking (if you will). The original thought (early on this year) was that I would have plenty to write about towards the middle of this year and time spent discussing the ongoing work would be premature. As it happens, things turned out exactly this way.

There is a lesson to be learned for how the Joost project progressed to where it is. The approach of talking about deployed / tested / running code has worked perfectly for them. I don't recall much public dialog about that particular effort until very recently and now they have running code doing unprecedented things and the opportunity (I'm guessing) to switch gears to do more evangelism with a much more effective 'wow' factor.

Speaking of wow, I must say of all the sessions at XTech 2007, the Joost session was the most impressive. The number of architectures they bridged, the list of demonstrable value propositions, the slick design, the incredibly agile and visionary use the most appropriate technology in each case etc.. is an absolutely stunning achievement.

The fact that they did this all while remembering their roots: open standards, open source, open communities leaves me with a deep sense of respect for all those involved in the project. I hope this becomes a much larger trend. Intellectual property paranoia and cloak / dagger completive edge is a thing of the past in today's software problem solving landscape. It is a ridiculously outdated mindset and I hope those who can effect real change (those higher up in their respective ORG charts than the enthusiastic hackers) in this regard are paying close attention. Oh boy. I'm about to launch into a rant, so I think I'll leave it at that.

The short of it is that I'm hoping (very soon) to switch gears from heads-down design / development / testing to much more targeted write-ups, evangelism, and such. The starting point (for me) will be Semantic Technology Conference in San Jose. If the above topics are of interest to you, I strongly suggest you attend my colleague's (Dr. Chris Pierce) session on SemanticDB (the flagship XML & RDF CMS we've been working on at the Clinic as a basis for Computerized Patient Records) as well as my session on how we need to pave a path to a new generation of XML / RDF CMSes and a few suggestions on how to go about paving this path. They are complementary sessions.

Jackrabbit architecture

JSR 170 is a start in the right direction, but the work we've been doing with the 4Suite repository for some time leaves me with the strong, intuitive impression that CMSes that have a natural (and standardized) synthesis with XML processing is only half the step towards eradicating the stronghold that monolithic technology stacks have over those (such as myself) with 'enterprise' requirements that can truly only be met with the newly emerging sets of architectural patterns: Semantic / Rich Web Applications. This stronghold can only be eradicated by addressing the absence of a coherent landscape with peer-reviewed standards. Dr. Macro has an incredibly visionary series of 'write-ups' on XML CMS that paints a comprehensive picture of some best practices in this regard:

However (as with JSR 170), there is no reason why there isn't a bridge or some form of synthesis with RDF processing within the confines of a CMS.

There is no good reason why I shouldn't be able to implement an application which is written against an abstract API for document and knowledge management irrespective of how this API is implemented (this is very much aligned with larger goal of JSR 170). There is no reason why the 4Suite repository is the only available infrastructure for supporting both XML and RDF processing in (standardized) synthesis.

I should be able to 'hot-swap' RDFLib with Jena or Redland, 4Suite XML with Saxon / Libxml / etc.., and the 4Suite repository with an implementation of a standard API for synchronized XML / RDF content management. The value of setting a foundation in this arena is applicable to virtually any domain in which a CMS is a necessary first component.

Until such a time, I will continue to start with 4Suite repository / RDFLib / formsPlayer as a platform for Semantic / Rich Web applications. However, I'm hoping (with my presentation at San Jose) to paint a picture of this vacuum with the intent of contributing towards enough of a critical mass to (perhaps) start putting together some standards towards this end.

Chimezie Ogbuji

via Copia
2 responses
Hey Chemezie,



This is GREAT!  In the background I have been working on the next generation of AspectXML-based projects, the next being something I have been designing in my head for about 2 years, and recently started putting into code about two weeks ago.  The project?  Atomictalk > a Smalltalk-influenced client-side XSLT-based inter and intra document language that uses an Aspect-Oriented MVC framework where Atom feeds represent the data source for the data models, views are written in a cross-breed of XML and Smalltalk, and the document is in and of itself the controller.



Russ Miles is now working with me to fine tune the language to ensure it fits well with the various AOP paradigms where it makes sense to apply proper AOP paradigms.  We've created a Google Code project @ http://code.google.com/p/atomictalk and I recently checked in the first batch of code.  If you visit http://test.atomictalk.org/ you will see the first Hello, World! sample.  The sample is lame as it doesn't showcase any of what is, if you view source, obviously a much more feature rich platform.  Please keep in mind that this is a rapidly evolving syntax, so what you see now is changing quickly.  In fact, I have a few hours worth of work left before I check in the next batch of changes of which, while the syntax will look similar, the changes in features and functionality are fairly dramatic.



Connecting this back to your post, if you visit http://groups.google.com/group/llup/msg/e50f2f7d46128518 you will find a link to a comment on an entry from Bruce D'Arcus from just under a year ago regarding the connection between Atomictalk and SPARQL+RDF, complete with use-case.  That's where this project is headed. :)



I'll be at the Semantic Web conference on Wednesday morning, flying out Thursday afternoon (it's the only time I could take off.  Have a deadline at the end of next week to meet.) I'm staying at the Fairmont, so hopefully we will have some time to hang out and possibly chat about some integration/cross-over work between these two projects.  Could be very interesting to see where these two projects might be able to help one another. :D



I'll ping you offline with a URI and uname/password to a site that provides a MUCH better demo of the Atomictalk code base in action.
Chemezie,



I think we're seeing a fair amount of convergent evolution going on in this space. I've been approaching many of the same problems from the XSLT/XQuery/XH (XHTML+XForms) direction, the idea of synthesizing from an XML-centric server space (we're using eXist) and rich generated XForms clients a cohesive architecture.



I'm beginning to see myself that the workflow aspect is key, and that in order for it to be effective you have to be able to encode this as an XML "role-state matrix" with XML documents (or their proxies) maintaining their own state within the workflow.



I wish I could make it to the Semantic Web conference myself (time and money constraints at the moment). Have fun, and I look forward to seeing your paper when its done.