I must be missing something about XOXO (and maybe microformats in general)

As I mentioned I started working with XBEL as a way to manage my Web feeds. It occurred to me that I should consider XOXO for this purpose, since it hsa more traditionally been put up in opposition to OPML.

Well, I don't get it. Sure, it's simple XHTML with some conventions for overlaid semantics, but how does that do anything for interoperability of Webfeed subscription lists? I've taken a look at attention.xml, and that seems more thoroughly specified, but it's way overkill for my needs.

Look, all I need to do is represent a categorized structure of feeds items with the following information per item:

Web feed URL (e.g. RSS or Atom link)
title
optional description or notes
optional Web site URL (e.g. Link to HTML or XHTML page for Weblog)

The trouble with OPML is that there are dozens of ways to encode these four bits of information, and as I've found, tools pretty much range all across that dozen. Besides, OPML is really poor XML design. That's a practical and not just aesthetic concern, because I expect to manage this information partly by hand.

XBEL is much better markup design, but I don't know that it has a natural way to represent the distinction between feed and content URL for the same item ("bookmark").

Everything I heard about XOXO led me to believe that this is a slam dunk, but hardly. The XOXO "spec" is not all that illuminating, and from what I can gather there or elsewhere, there are also a dozen ways I could encode the above information. Perhaps:

<li>
  Weblog home
  Weblog feed
  <dl>
    <dt>description</dt>
    <dd>That good ole Weblog</dd>
  </dl>
</li>

Perhaps (so that the likely HTML rendering is not jumbled):

<li>
  <ul>
    <li>Weblog home</li>
    <li>Weblog feed</li>
  </ul>
  <dl>
    <dt>description</dt>
    <dd>That good ole Weblog</dd>
  </dl>
</li>

But since Weblog contents could be XML, is it really safe to use media type as the distinguishing mark between Web site and Web feed links? OK, so perhaps:

<li>
  <ul>
    <li>Weblog home</li>
    <li>Weblog feed</li>
  </ul>
  <dl>
    <dt>description</dt>
    <dd>That good ole Weblog</dd>
  </dl>
</li>

But now I've invented a relationship vocabulary (I guess this is technically my own microformat) and why would I expect another XOXO tool to use rel="website" and rel="webfeed"?

I could go on with possible variations. I do like the way that I can simply refer to the XHTML Hypertext Attributes Module to get some general ideas about semantics, but that's not really good enough because I have a fairly specific need.

I imagine someone will say that XOXO is just a general outlining format, and can't specify such things because it's all about being micro. But in that case why do people put XOXO itself as a solution for Webfeed corpus exchange? I can't see how XOXO can do the job without overlaying yet another microformat on it. And if we need to stack microformat on microformat to address such a simple need, what's wrong with good old macroformats: you know: a real, specialized XML format.

I've really only spent an hour or two exploring XOXO (although according to the microformats hype I shouldn't expect to need more time than that), so maybe I'm missing something. If so, I'd be grateful for any enlightening comments.

[Uche Ogbuji]

via Copia

7 responses

Perhaps you can enumerate the poor XML design behind OPML. My guess is that the looseness and the openness of the 1.0 spec' is not comforting to most strongly typed people.

— Bryan Wilhite

Enumerate the poor XML design behind OPML? I could list several really evil touches, but the stuffing of human-readable text into "text" attributes is more than execrescence enough.

For the record, I'm all for open content models, as long as the extensibility mechaism is clearly specified, which in OPML it is not.

As for "strongly typed people": wow. In a million years I wouldn't have expected anyone to cast me in that mold. I suspect the strong-typing advocates whom I've been debating for years are having a good belly laugh at my expense now. :-)

— Uche

Bryan: Charles Miller has a great “What’s wrong with OPML” writeup – the most lucid, plainspoken roasting I’ve ever seen:

http://fishbowl.pastiche.org/2005/10/02/whats_wrong_with_opml

Uche: you know, framing the problem that way makes me think “that’s a feed.” Look at it: title, link, description… so long as a flat list without nesting is all you need, that’s exactly what a feed is. I was thinking of an Atom feed with one entry per linked feed, where the permalink is a homepage link and there’s an atom:link[@rel='related'] that points at the feed. Add an atom:summary with a HTML rendition of the feed link so unaware tools can do something useful, and presto.

Sound good, or am I way off the mark?

That reminds me that James Snell was bouncing around ideas for a “profile” extension for Atom, with which you could say “this is a weblog feed” or “…an event log” or “…a webservice directory” – or, just as well, “…a blogroll/directory”. That way you could also annotate such a feed more precisely for processing.

— Aristotle Pagaltzis

"And if we need to stack microformat on microformat to address such a simple need, what's wrong with good old macroformats: you know: a real, specialized XML format."

Well, with respect to OPML, I think there are at least two things going on with it: 1) It's used in aggregators to manage subscriptions and 2) It's used to manage blogrolls in sidebars. There's lots, lots more going on with it, but I'll ignore that for now.

For the case of blogrolls, I'd say that if you came up with a quick "hBlogroll" spec to throw into the nodes of an XOXO outline, you'd be set. Most blogrolls seem to consist of what you've got over in the "folks" sidebar: A name, an HTML link, a feed link. Not too macro.

But, if you were to encode and manage that blogroll in a microformat, 1) it would already be in XHTML suitable for inclusion in your page, styled by CSS and 2) it would already be in a machine-friendly form with a predictable and pre-established structure for scooping out of your page. No intermediary XML and accompanying XSL transformation needed, necessarily.

Now... I think this whole scenario starts to fall apart if you think about using XOXO / "hBlogroll" for feed subscriptions in an aggregator (or some similar role). If it's not necessarily meant to appear at a URL at some point, it's not necessarily appropriate for a microformat.

— l.m.orchard

Hey Uche,

[BTW... Great article!] I think I have come to the conclusion that the best possible tool for bookmarks is an Atom feed. While at first it might seem a bit strange, if all that is being represented per entry is one URI, then using two links, one with rel="self" and one with rel="alternate" you can then combine the title, summary, content, and the category elements to provide all of the above information (and even it a bit more).

Of course your not limited to these elements, and in fact would be required, obviously, to add the id and updated elements as child elements to both the feed and for each entry itself. But I don't see any real downside to such requirements and plenty of potential opportunities, such as building a quick and dirty polling system to either check a feed for a change in the updated value or, in the case of a non-feed format such as an HTML-based page, simply sending a HEAD request and using the returned header as a clue as to if and when the link has been updated (would be nice if we could promote a more MD5-type system, which further only validated the content itself, ignoring the fact that a new add may have been added since your last visit -- but I have my doubts we could get too many people to put for the effort... would be nice thoug :), updating the value of the updated element accordingly, and automagically telling you all the content that apparently has something new added to it since your last visit.

The secret then to using the format for bookmarks is to either use your standard path-based directory structure to categorize your bookmarks, recursively selective which directories had atom feeds contained within them that needed to be parsed (which, of course would give you a clue as to whether the directory structure goes any deeper.) Or, better yet, just use one feed (or several feeds in the same directory, each with a different name for each top level categorization) and then sort the feed(s) (combining them together into one feed for processing if several feeds are used) based on that path value of the rel="alternate" link. Once sorted into whatever path-based directory "schema" you might use you could further use the category elements to fine tune your searches and combine this with running a "return everything that is less than 2 weeks old" type query against all of the bookmarks who's category/@label="python" (sorry, couldnt think of anything more creative right at the moment.)

The real beauty behind something like this is then becomes quite easy to "feed"(sorry :) off of each others bookmark feeds, sharing them, mashing them up into various groups that can be published and labeled "the top 100 songs of 2005 based on total bookmarks for each particular track within such and such a community"(obviously this suggests something more towards the other unannounced project) and archived as such.

Going about things this way could lend very well to a quick adoption rate given the fact that, obviously, using an existing standard wouldn't require any further standardization. Furthermore, with the effort to bring forth the publishing protocol well underway we can quite easily rely on the idea that the same system we use to build our blog Atom feeds can also be used to build our bookmark files.

Thoughts?

— M. David Peterson

One quick addition... if you were to take these same feeds, throw them into a local instance of an XQuery-enabled DB (of which I can think of five(5) off the top of my head that are either open-source, free, or both, includin MS SQL Serve Express 2005 which, while obviously nothing of interest to you, would mean that in the not to distant future it could easily be assume that a majority of desktop will have built-in capabilities to manage these types of systems, and write these systems in the same XPath, XQuery, XSLT, or variety of cross-platform supported scripting and have them, for the most part, work across the XQuery DB board.

Would definitely help in keeping things managed in a much "cleaner and clutter-free" sort of way while making advanced queries for data really quite simple (using reusable XQuery files to make and keep things even simpler)...

Hmmm... I'm going back up and reading some of the other responses and just noticed Aristotle's post(I thought I had skimmed the other responses well enough to gain a feel for what others thought... guess I missed his.

Aristotle, I promise I didn't just plagiarize your idea and extend from it! :D Its one thing to steal an idea and post it somewhere where the originator is not likely to stumble across, its quite another to "copy-and-paste" two comments down from the one you "copy-and-pasted" from :)

What this does do however (and basing this extension on Aristotle's furhther insight into James Snell's line-of-thought) is suggest that the community as a whole is collectively realizing that the need for a separate spec all together is, more than likely, completey unneccesary.

If interested, while its fairly surface level code at the moment I would be happy to make available the work I have been doing (note: its not ALL about this general idea, but partially) such as to act as a proof of concept into just how simple it is to implement such a system. This is all part of another project thats being built that, at the time I started it, presented the general idea in email to Edd soon after he wrote his piece on OPML a while back... I wonder if he has had any time to mull the general idea over as his general feel for such things could prove to be quite helpful.

While I'm here, I should add one last area to think about... It might even be possible to skip the link/@rel="alternate" element/attribue/value all together and use the scheme attribute of category for the same purpose.