OPML, XOXO, RDF and more on outlining for note-taking

There has been a lot of good comment on my earlier entry on outline formats in XML. I've been pretty busy the past week or so, but I'd better get my thoughts down before they deliquesce.

Bob DuCharme pointed me at Micah's article which includes mention of XOXO. Henri Sivonen asked what I think of it.

Taking the name "outlining format" literally, it's actually just fine. As Micah says:

Some people might feel warmer and fuzzier with elements named outline, topic, item, and so on, or with elements in a freshly minted namespace, but microformats can still claim the semantic high ground, even when reusing XHTML. In the above, the parts of an outline are ordered lists and list items, exactly as the XHTML element names say.

The problem is that what made me start looking into outlining formats was the fact that I'd heard from others that these make such a great format for personal information space organization, and XOXO is just about useless in that regard.

Along that vector, I wonder what a pure outline format is useful for, anyway? I can't remember having ever needed a stand-alone outline document separate from what I'm outlining. If I'm writing a presentation or a long article, I'd prefer to have the table of contents or presentation outline section generated from the titles and structure of the full work. Sure, XOXO might be suitable for such a generated outline, but my exploration is really about hand editing.

In short I think XOXO is just fine for outlining, and yet I can't imagine when I'd ever use it. As others have mentioned, and as I suspected, the entire idea of outlining formats for general note-taking is a big stretch. Danny Ayers mentioned in a comment on the earlier format that for some attraction to OPML is a matter of neat outlining UIs. I've always been conservative in adopting UIs. I use emacs plus the command line for most of my coding, and after trying out a half dozen blog posting tool for posting to Copia, I ended up writing an e-mail-to-post gateway so that I can enter text (markdown) into a UI I'm already familiar with, Evolution's e-mail composition window.

As I said in the earlier entry, full-blown XHTML 2.0 makes more sense than an outlining format for managing a personal information space, and yet it seems too weak to me for this purpose. The weakness, as Danny points out, is semantic. If everything in my personal information space is just a para or an anchor or a list, I'll quickly get lost. As followers of Copia know, my brain is a rat trap of wandering thoughts, and I'm a poster child for the need for clearly expressed semantics.

As an RDF pioneer, I'm happy to use ideas from RDF, but I do not want to type RDF/XML by hand. I've always argued, as Danny Ayers hinted, that RDF should strive hard to by syntax agnostic, especially because RDF/XML is awful syntax. I agree with him that GRDDL is a good way to help rescue XHTML microformats from their semantic soup, and I think this is a better approach than trying to shovel all the metadata into the XHTML header (Dan Brickley mentions this possibility, but I wonder whether he tends to prefer it to GRDDL). GRDDL has a natural draw for me since I've been working with and writing tools for the XML+XSLT=RDF approach for about four years. But when I'm using markup for markup (e.g. in a personal information space) I'd rather have semantic transparency fitting comfortably within the markup, rather than dangling off it as an afterthought. In a nutshell, I want to use the better markup design of:

<to-do>

rather than the kludge of:

<ul class="to-do">

I think there's little excuse for why we don't have the best of both worlds. People should be able to enjoy the relative semantic cleanliness of RDF, but within the simplest constructs of markup, without having to endure the added layer of abstraction of RDF. That added layer of abstraction should only be for those aggregating models. The fact that people would have to pay the "RDF tax" every time they start to scribble in some markup explains why so many markup types dislike RDF. I'm not sure I've found as clear a case for this point than this discussion of extended uses for outlining formats.

Microformats are generally a semantic mess, from what I've seen of them. They do best when they just borrow the semantics of existing formats, as XOXO does, but I think they're not the solution to lightweight-syntax +clean-semantics that the GRDDL pioneers hope. GRDDL has too much work to do in bringing the rigor of RDF to microformats, and this work should be part of the format itself, not something like GRDDL. I think the missed opportunity here is that XML schema systems cling so stubbornly to syntax-only-syntax. As I've been exploring in articles such as "Use data dictionary links for XML and Web services schemata" (I have a more in-depth look at this in the upcoming Thinking XML article), one can make almost all the gains of RDF by putting the work into the XML schema, rather than heaping the abstraction directly into the XML format. And the schema is where such sophistication belongs.

But back to outlining and personal information spaces, I've tried the personal Wiki approach, and it doesn't work for me. Again Danny nails it: Wiki nodes and links are untyped. This is actually similar to the problem that I have with XHTML, but Wikis are even more of a semantic shambles. In XHTML there is at least a bit of an escape with class="foo". The difficulty of navigating and managing Wikis increases at a much greater rate than their information content, in my experience. My Akara project was in effect an attempt at a more semantically transparent Wiki, but since I wrote that paper I've had almost no time for Akara, unfortunately. I do plan to make it the showcase application for my vision of 4Suite 2.0, and in doing so I have an ally Luis Miguel Morillas, so there is still hope for Akara, perhaps even more so if I am able to build on Rhizome, which might help eliminate some wheel reinvention.

[Uche Ogbuji]

via Copia

Wow! XML formats for outlining are complete rubbish

There's no other way to put it other than the title. After having heard a lot about OPML (and having used it as a blind RSS feed exchange format), Ian Forrester's comment that he is using OPML for all his personal note-taking finally pushed me to look seriously at the format. It is just complete and utter garbage. OPML might possibly be the best example of the worst abuses of XML markup. It's really hard to fully express how horrible OPML is, and there is no way in the world that I'll ever be dealing with it directly. The language that I use as a hub for my personal information space needn't be perfect, but it can't make me gag at every other tag.

I looked a bit further and found OML. It's a reaction to the ugliness of OPML and so I expected it would be the ticket for me. It does partially fix perhaps the most immediate and visceral abomination of OPML: the abuse of attributes for prosaic textual content (although why it doesn't completely eliminate the text attribute in favor of a title child element is beyond me). But it leaves a lot of nastiness and introduces some of its own (the idea of item as generic extensibility element is hugely ill-begotten). OML isn't even widely-used enough to just compromise and deal with its flaws. I think I'll consider creating my own language. I can export to OPML via XSLT when I really have to. But I think I can use some of the fixes in OML as a starting point.

"Sharing, the web way", by Danny Ayers is a good outline [n/m] of the horrors of OPML. He does get into the politics as well, which I think are less important to me than the technical flaws. He does state what has always been my reaction to the OPML hype:

But more and more I'm thinking things like blogrolls or whatever are much better handled using something more specific - XBEL or simply (X)HTML (like Netscape bookmarks).

Of course I'd plump for XBEL, but this expresses my general viewpoint. I wanted to look at outlining formats because so many people go on about outline editors and formats as productivity tools. I want to be sure I'm not missing anything. Based on what I've found so far, I'm really confused at what people are gaining in this space.

Danny goes on to say:

If what you want is versatility and be able to combine material from different sources and of potentially different data types, then you really do need something like RDF - the glue of OPML isn't strong enough.

I think that O*ML is just one of those examples that illustrate the limits of RDF. RDF is not the best model for content with a significant prose quotient, and RDF/XML not the best syntax. I think that a format I came up with would have the following characteristics:

  • Lightweight overall framework using other formats such as XBEL and XHTML 2.0 to do the heavy lifting
  • Sound XML design overall
  • Metadata sections that can readily be mapped to RDF model without using RDF/XML directly

Where would that take me that no one else has already charted? Maybe nowhere. I certainly wouldn't plan to spend much time on it (and I might not even bother at all). It's just a bit of exploration suggested to me by what I've found poking around what's there already.

[Uche Ogbuji]

via Copia