Copia

True Knowledge - A logic based web question-answering platform

The world's first AI question-answering platform.

We are using our unique semantic technology to build the first internet-scale platform for directly answering the world's questions. As knowledge is added to the platform we understand and answer more and more.

via trueknowledge.com

The True Knowledge session at the Semantic Technologies Conference 2010 was where I first heard of this and tried to use their web-based interface during their presentation and was very impressed by the interface. It includes a justification trace of how answers are reached and handles things such as temporal reasoning as well. Also includes a Google chrome extension to enhance google answers with results to the same questions.

-- Chimezie

Musings of a Semantic / Rich Web Architect: What's Next?

I'm writing this on my flight back from XTech 2007, Paris, France. This gives me a decent block of time to express some thoughts and recent developments. This is the only significant time I've had in a while to do any writing.
My family

Between raising a large family, software development / evangelism, and blogging I can only afford to do two of these. So, blogging loses out consistently.

My paper (XML-powered Exhibit: A Case Study of JSON and XML Coexistence) is now online. I'll be writing a follow-up blog on how http://planetatom.net demonstrates some of what was discussed in that paper. I ran into some technical difficulties with projecting from Ubuntu, but the paper covers everything in detail. The slides are here.

My blog todo list has become ridiculously long. I've been heads-down on a handful of open source projects (mostly semantic web related) when I'm not focusing on work-related software development.
Luckily there has been a very healthy intersection of the open source projects I work on and what I do at work (and have been doing non-stop for about 4 years). In a few cases, I've spun these 'mini-projects' off under an umbrella project I've been working on called python-dlp. It is meant (in the end) to be a toolkit for semantic web hackers (such as myself) who want to get their hands dirty and have an aptitude for Python. There is more information on the main python-dlp page (linked above).

sparql-p evaluation algorithm Some of the other things I've been working on I'd prefer to submit to appropriate peer-reviewed outlets considering the amount of time I've put into them. First, I really would like to do a 'proper' write-up on the map/reduce approach for evaluating SPARQL Algebra expressions and the inner mechanics of Ivan Herman's sparql-p evaluation algorithm. The latter is one of those hidden gems I've become closely familiar with for some time that I would very much like to examine in a peer-reviewed paper especially if Ivan is interested doing so in tandem =).

Since joining the W3C DAWG, I've had much more time to get even more familiar with the formal semantics of the Algebra and how to efficiently implement it on-top of sparql-p to overcome the original limitation on the kinds of patterns it can resolve.

I was hoping (also) to release and talk a bit about a SPARQL server implementation I wrote in CherryPy / 4Suite / RDFLib for those who may find it useful as a quick and dirty way to contribute to the growing number of SPARQL endpoints out there. A few folks in irc:///freenode.net/redfoot (where the RDFLib developers hang out) have expressed interest, but I just haven't found the time to 'shrink-wrap' what I have so far.

On a different (non-sem-web) note, I spoke some with Mark Birbeck (at XTech 2007) about my interest in working on a 4Suite / FormsPlayer demonstration. I've spent the better part of 3 years working on FormsPlayer as a client-side platform for XML-driven applications served from a 4Suite repository and I've found the combination quite powerful. FormsPlayer (and XForms 1.1 specifically) is really the icing on the cake which takes an XML / RDF Content Management System like the 4Suite repository and turns it into a complete platform for deploying next generation rich web applications.

The combination is a perfect realization of the Rich Web Application Backplane (a reoccurring theme in my last two presentations / papers) and it is very much worth noting that some of the challenges / requirements I've been able address with this methodology can simply not be reproduced in any other approach: neither vanilla DHTML, .NET, J2EE, Ruby on Rails, Django, nor Jackrabbit. The same is probably the case with Silverlight and Apollo.

In particular, when it comes to declarative generation of user interfaces, I have yet to find a more complete approach than via XForms.

Mark Birbeck's presentation on Skimming is a good read (slides / paper is not up yet) for those not quite familiar with the architectural merits of this larger methodology. However, in his presentation eXist was used as the XML store and it struck me that you could do much more with 4Suite instead. In particular, as a CMS with native support for RDF as well as XML it opens up additional avenues. Consider extending Skimming by leveraging the SPARQL protocol as an additional mode of expressive communication beyond 'vanilla' RESTful operations on XML documents.

These are very exciting times as the value proposition of rich web (I much prefer this term over the much beleaguered Web 2.0+) and semantic web applications has fully transitioned from vacuous / academic musings to concretely demonstrable in my estimation. This value proposition is still not being communicated as well as it could, but having bundled demos can bridge this gap significantly in my opinion; much more so than just literature alone.

This is one of the reasons why I've been more passionate about doing much less writing / blogging and more hands-on hacking (if you will). The original thought (early on this year) was that I would have plenty to write about towards the middle of this year and time spent discussing the ongoing work would be premature. As it happens, things turned out exactly this way.

There is a lesson to be learned for how the Joost project progressed to where it is. The approach of talking about deployed / tested / running code has worked perfectly for them. I don't recall much public dialog about that particular effort until very recently and now they have running code doing unprecedented things and the opportunity (I'm guessing) to switch gears to do more evangelism with a much more effective 'wow' factor.

Speaking of wow, I must say of all the sessions at XTech 2007, the Joost session was the most impressive. The number of architectures they bridged, the list of demonstrable value propositions, the slick design, the incredibly agile and visionary use the most appropriate technology in each case etc.. is an absolutely stunning achievement.

The fact that they did this all while remembering their roots: open standards, open source, open communities leaves me with a deep sense of respect for all those involved in the project. I hope this becomes a much larger trend. Intellectual property paranoia and cloak / dagger completive edge is a thing of the past in today's software problem solving landscape. It is a ridiculously outdated mindset and I hope those who can effect real change (those higher up in their respective ORG charts than the enthusiastic hackers) in this regard are paying close attention. Oh boy. I'm about to launch into a rant, so I think I'll leave it at that.

The short of it is that I'm hoping (very soon) to switch gears from heads-down design / development / testing to much more targeted write-ups, evangelism, and such. The starting point (for me) will be Semantic Technology Conference in San Jose. If the above topics are of interest to you, I strongly suggest you attend my colleague's (Dr. Chris Pierce) session on SemanticDB (the flagship XML & RDF CMS we've been working on at the Clinic as a basis for Computerized Patient Records) as well as my session on how we need to pave a path to a new generation of XML / RDF CMSes and a few suggestions on how to go about paving this path. They are complementary sessions.

Jackrabbit architecture

JSR 170 is a start in the right direction, but the work we've been doing with the 4Suite repository for some time leaves me with the strong, intuitive impression that CMSes that have a natural (and standardized) synthesis with XML processing is only half the step towards eradicating the stronghold that monolithic technology stacks have over those (such as myself) with 'enterprise' requirements that can truly only be met with the newly emerging sets of architectural patterns: Semantic / Rich Web Applications. This stronghold can only be eradicated by addressing the absence of a coherent landscape with peer-reviewed standards. Dr. Macro has an incredibly visionary series of 'write-ups' on XML CMS that paints a comprehensive picture of some best practices in this regard:

However (as with JSR 170), there is no reason why there isn't a bridge or some form of synthesis with RDF processing within the confines of a CMS.

There is no good reason why I shouldn't be able to implement an application which is written against an abstract API for document and knowledge management irrespective of how this API is implemented (this is very much aligned with larger goal of JSR 170). There is no reason why the 4Suite repository is the only available infrastructure for supporting both XML and RDF processing in (standardized) synthesis.

I should be able to 'hot-swap' RDFLib with Jena or Redland, 4Suite XML with Saxon / Libxml / etc.., and the 4Suite repository with an implementation of a standard API for synchronized XML / RDF content management. The value of setting a foundation in this arena is applicable to virtually any domain in which a CMS is a necessary first component.

Until such a time, I will continue to start with 4Suite repository / RDFLib / formsPlayer as a platform for Semantic / Rich Web applications. However, I'm hoping (with my presentation at San Jose) to paint a picture of this vacuum with the intent of contributing towards enough of a critical mass to (perhaps) start putting together some standards towards this end.

Chimezie Ogbuji

via Copia

XML 2006 Synopsis: Are we there yet?

Well, XML 2006 came and went with a rather busy bang. My presentation on using XSLT to generate Xforms (from XUL/XHTML) was well attended and I hoped it helped increase awareness on the importance and value of XForms, (perhaps) the only comprehensive vehicle by which XML can be brought to the web in the way proponents of XML have had in mind for some time. As Simon puts it:

XML pretty (much) completely missed its original target market. SGML culture and web developer culture seemed like a poor fit on many levels, and I can't say I even remember a concerted effort to explain what XML might mean to web developers, or to ask them whether this new vision of the Web had much relationship to what they were doing or what they had planned. SGML/XML culture and web culture never really meshed.

Most of the questions I received had to do with our particular choice of FormsPlayer (an Internet Explorer plugin) instead of other alternatives such as Orbeon, Mozilla, Chiba, etc. This was a bit unfortunate and an indication of a much larger problem in this particular area of innovation we lovingly coin 'web 2.0'. I will get back to this later.

I was glad to hear John Boyer tell me he was pleasantly surprised to see mention of the Rich Web Application Backplane W3C Note. Mark Birbeck and Micah Dubinko (fellow XForms gurus and visionaries in their own rights) didn't let this pass over their radar, either.

I believe the vision outlined in that note is much more lucid than a lot of the hype-centered notions of 'web 2.0' which seem more focused on painting a picture of scattered buzzwords ('mash-ups', AJAX etc..) than commonalities between concrete architectures.

Though this architectural style accommodates solutions based on scripting (AJAX) as well as more declarative approaches, I believe the primary value is in freeing web developers from the 80% of scripting that is a result of not having an alternative (READ: browser vendor monopolies) than being the appropriate solution for the job. I've jousted with Kurt Kagle before on this topic and Mark Birkeck has written extensively on this as well.

In writing the presentation, I sort of stumbled upon some interesting observations about XUL and XForms:

XUL relies on a static, inarticulate means of binding components to their behavior
XForms relies on XPath for doing the same
XUL relies completely on javascript to define the behavior of it's widgets / components
A more complete mapping from XUL to XForms (than the one I composed for my talk) could be valuable to those more familiar with XUL as a bridge to XForms.

At the very least, it was a great way to familiarize myself with XUL.

In all, I left Boston feeling like I had experienced a very subtle anti-climax as far as innovation was concerned.
If I were to plot a graph of innovative progression over time, it would seem to me that the XML space has plateaued as of late and political in-fighting and spec proliferation has overtaken truly innovative ideas. I asked Harry Halpin about this and his take on it was that perhaps "XML has won". I think there is some truth to this, though I don't think XML has necessarily made the advances that were hoped in the web space (as Simon St. Laurent put it earlier).

There were a few exceptions however

XML Pipelines

I really enjoyed Norm Walsh's presentation on XProc and it was an example of scratching a very real itch: consensus on a vocabulary for XML processing workflows. Though, ironically, it probably wouldn't take much to implement in 4Suite as support for most (if not all) of the pipeline operations are already there.

I did ask Norm if XProc would support setting up XPath variables for operations that relied on them and was pleased to hear that they had that in mind. I also asked about support for non-standard XML operations such as XUpdate and was also pleased to hear that they had that covered as well. It was worth noting that XUpdate by itself could make the viewport operation rather redudant.

The Semantic Web Contingent

There was noticeable representation by semantic web enthusiasts (myself, Harry Halpin, Bob Ducharm, Norm Walsh, Elias Torres, Eric Prud'hommeux, Ralph Hodgson, etc..) and their presentations had somewhat subdued tones (perhaps) so as not to incite ravenous bickering from narrow-minded enthusiasts. There was still some of that however as I was asked by someone why RDF couldn't be persisted natively as XML, queried via XQuery, and inferred over via extension functions! Um... right... There is some irony in that as I have yet to find a legitimate reason myself to even use XQuery in the first place.

The common scenario is when you need to query across a collection of XML documents, but I've personally preferred to index XML documents with RDF content (extracted from a subset of the documents), match the documents via RDF, isolate a document and evaluate an XPath against it essentially bypassing the collection extension to XPath with a 'semantic' index. Ofcourse, this only makes sense where there is a viable mapping from XML to RDF, but where there is one I've preferred this approach. But to each his/her own..

Content Management API's

I was pleasantly surprised to learn from Joel Amousou that there is a standard (a datastore and language-agnostic? standard) for CMS APIs. called JSR-170. The 4Suite repository is the only Content Mangement System / API with a well though-out architecture for integrating XML & RDF persistence and processing in a way that emphasizes their strengths with regard to content management. Perhaps there is some merit in investigating the possibility of porting (or wrapping) the 4Suite repository API as JSR-170? Joel seems to think so.

Meta-stylesheets

Micheal Kay had a nice synopsis of the value of generating XSLT from XSLT – a novel mechanism I've been using for some time and it was interesting to note that one of his prior client projects involved a pipeline that started with an XForm, post-processed by XSLT and aggregated with results from an Xquery (also generated from XSLT).

Code generation is a valuable pattern that has plenty unrecognized value in the XML space and I was glad to see Micheal Kay highlight this. He had some choice words on when to use XSLT and when to use XQuery that I thought was on point: Use XSLT for re-purposing, formatting and use Xquery for querying your database.

GRDDL

Finally, I spent quite some time with Harry Halpin (chair of the GRDDL Working Group) helping him installing / using the 4Suite / RDFLib client I recently wrote for use with the GRDDL test suite. You can take what I say with a grain of salt (as I am a member and loud, vocal supporter), but I think that GRDDL will end up having the most influential impact in the semantic web vision (which I believe is much less important than the technological components it relies on to fulfill the vision) and XML adoption on the web than any other, primarily because it allows content publishers to leverage the full spectrum of both XML and RDF technologies.
Within my presentation, I mention an architectural style I call 'modality segregation' that captures the value proposition of XSLT for drawing sharp, distinguishable boundaries (where there were once none) between:

content
presentation
meaning (semantics)
application behavior

Modality Segregation

I believe it's a powerful idiom for managing, publishing, and consuming data & knowledge (especially over the web).

Harry demonstrated how easy it is to extract review data, vocabulary mappings, and social networks (the primary topic of his talk) from XHTML that would ordinarily be dormant with regards to everything other than presentation.
We ran into a few snafus with 4Suite when we tried to run Norm Walsh's hCard2RDF.xslt against Dan Connolleys web site and Harrys home page. We also ran into problems with the client (which is mostly compliant with the Working Draft).

I also had the chance to set Harry up with my blazingly fast RETE-based N3 reasoner, which we used to test GRDDL-based identity consolidation by piping multiple GRDDL results (from XHTML with embedded XFN) into the reasoner, performing an OWL DL closure, and identifying duplicate identities via Inverse Functional Properties (smushing)

As a result of our 5+ hour hackathon, I ended up writing 3 utilities that I hope to release once I find a proper place for them:

FOAFVisualizer - A command-line tool for merging and rendering FOAF networks in a 'controlled' and parameterized manner
RDFPiedPipe - A command-line tool for converting between the syntaxes that RDFLib supports: N3, Ntriples, RDF/XML
Kaleidos - A library used by FOAFVisualizer to control every aspect of how an RDF graph (or any other network structure) is exported to a graphviz diagram via BGL-Python bindings.

In the final analysis, I feel as if we have reached a climax in innovation only to face a bigger challenge from politics than anything else:

RDFa versus eRDF
SPARQL without entailment versus SPARQL with OWL entailment
XHTML versus HTML5
Web Forms versus XForms
Web 2.0 versus Web 3.0
AJAX versus XForms
XQuery versus XSLT
XQuery over RDF/XML versus SPARQL over abstract RDF
XML 1.0 specifications versus the new 2.0 specifications

The list goes on. I expressed my concerns about the danger of technological camp warefare to Liam Quin (XML Activity Lead) and he concurred. We should spend less time arguing over whether or not my spec is more l33t than yours and more time asking the more pragmatic questions about what solutions works best for the problem(s) at hand.

[Uche Ogbuji]