Planet Atom's Information Pipeline

The hosting of Planet Atom has moved (perhaps temporarily) over to Athena: Christopher Schmidt's excellent hosting service.
Metacognition is hosted there. In addition, the planetatom source was extended to support additional representational modes for the aggregated Atom feed: GRDDL, RDFa, Atom OWL RDF/XML (via content negotiation), and Exhibit.

The latter was the subject of my presentation at XTech 2007. As I mentioned during my session, you can go to http://planetatom.net/exhibit to see the live faceted-browsing of the aggregated Atom feed. An excerpt from the Planet Atom front page describes the nature of the project:

Planet Atom focuses Atom streams from authors with an affinity for syndication and Atom-specific issues. This site was developed by Sylvain Hellegouarch, Uche Ogbuji, John L. Clark, and Chimezie Ogbuji

I wrote previously (in my XML 2006 paper) on this methodology of splicing out multiple (disjoint) representations from a single XML source and the Exhibit mode is yet another facet: specifically for quick, cross-browser, filtering of the aggregated feed.

The Planet Atom pipleline as a whole is pretty interesting. First an XBEL bookmark document is used as the source for aggregation. RESTful caching minimizes load on the sources during aggregation. The aggregation groups the entries by calendar groups (months and days). The final aggregated feed is then sent through several XML pipelines to generate the JSON which drives the Exhibit view, an HTML version of the front page, an XHTML version of the front page (one of the prior two is served to the client depending on the kind of the agent which requested the front page), and an RDF/XML serialization of the feed expressed in Atom OWL.

Note in the diagram that a Microformat approach could have been used instead to embed the Atom OWL statements. RDFa was used instead as it was much easier to encode the statements in a common language and not contend with adding profiles for each Microformat used. Elias's XTech 2007 presentation touched a bit on this contrast between the two approaches. In either case, GRDDL is used to extract the RDF.

These representations are stored statically at the server and served appropriately via a simple CherryPy module As mentioned earlier, the XHTML front page now embeds the Atom OWL assertions about the feed (as well as assertions about the sources, their authors, and the Planetatom developers) in RDFa and includes hooks for a GRDDL-aware Agent to extract a faithful rendition of the feed in RDF/XML. The same XML pipeline which extracts the RDF/XML from the aggregated feed is identified as a GRDDL transform. So, the RDF can either be fetched via content negotiation or by explicit GRDDL processing.

Unfortunately, the RDFa is broken. RDFa can be extracted by an explicit parser (which is how Elias Torrez's Python-based RDFa parser, his recent work on operator, and Ben Adida's RDFa bookmarklets ) or via XSLT as part of a GRDDL mechanism. Due to a quirk in the way RDFa uses namespace declarations (which may or may not be a necessary evil ), the various vocabularies used in the resulting RDF/XML are not properly expanded from the CURIES to their full URI form. I mentioned this quirk to Steven Pemberton.

As it happens, the stylesheet which transforms the aggregated Atom feed into the XHTML host document defines the namespace declarations:

xmlns:dc="http://purl.org/dc/elements/1.1/" 
  xmlns:foaf="http://xmlns.com/foaf/0.1/" 
  xmlns:aOwl="http://bblfish.net/work/atom-owl/2006-06-06/#" 
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

However, since there are not any elements which use QNames formed from these declarations, they are not included in the XSLT result! This trips up the RDF -> RDF/XML transformation (written by Fabien Gandon, a fellow GRDDL WG member) and results in RDF statements where the URIs are simply the CURIEs as originally expressed in RDFa. This seems to only be a problem for XSLT processors which explicitly strip out the unused namespace declarations. They have a right to do this as it has no effect on the underlying infoset. Something for the RDF-in-XHTML task group to consider, especially for scenarios such as this where the XHTML+RDFa is not hand-crafted but produced from an XML pipeline.

[Uche Ogbuji]

via Copia