Tagging meets hierarchies: XBELicious

The indefatigable John L. Clark recently announced another very useful effort, the start of a system for managing your del.icio.us bookmarks as XBEL files. Of course not everyone might be as keen on XBEL as I am, but even if you aren't, there is a reason for more general interest in the project. It uses a very sensible set of heuristics for mapping tagged metadata to hierarchical metadata. del.icio.us is all Web 2.0-ish and thus uses tagging for organization. XBEL is all XML-ish and thus uses hierarchicy for same. I've long wanted to document simple common-sense rules for mapping one scenario to another, and John's approach is very similar to sketches I had in my mind. Read section 5 ("Templates") of the XBELicious Installation and User's Guide for an overview. Here is a key snippet:

For example, if your XBEL template has a hierarchy of folders like "Computers → linux → news" and you have a bookmark tagged with all three of these tags, then it will be placed under the "news" folder because it has tags corresponding to each level in this hierarchy. Note, however, that this bookmark will not be placed in either of the two higher directories, because it fits best in the news category. A bookmark tagged with "Computers" and "news" would only be placed under "Computers" because it doesn't have the "linux" tag, and a bookmark tagged with "linux" and "news" would not be stored in any of these three folders.

XBELicious is work in progress, but worthy work for a variety of reasons. I hope I have some time to lend a hand soon.

[Uche Ogbuji]

via Copia

AJAX and the Back button

Sylvain and I have discussed recently his discomfort with Web browser state of the art in the age of AJAX (to use a grand term, even though I strongly believe that AJAX is nothing but an incremental gathering of conventions rather than anything new and special). He has gathered his thoughts in a blog posting "The chicken and egg problem". I posted a comment, but I thought I might copy the comment here as well.

[Let me summarize] in brief my reasons for thinking that the current system is not broken, and that we do not need to change anything fundamental about browsers.

First of all the basic semantic of "link history" in a Web browser has not changed since the Mosaic days for a very good reason: it is empirical to HTTP, REST and all that. At each point a browser is at a particular resource, and it moves from one resource to another according to actuation of simple REST verbs. Within each resource the browser can do all sorts of complex things, including showing animations (Shockwave, SVG, etc.), providing mini-applications to the user (Java applets, Flash, AJAX, etc.) and more, but the resource has not changed. The boundary of resource is defined by the service provider, and the browser simply reflects that in the history, URL bar and other features. I don't think the back and forward buttons should be overloaded for any operation within a resource. They should not be used as hot buttons in Flash apps or in AJAX apps. This violates the layering that is so important to the success of the Web.

If service providers want to provide navigation within a particular resource, they should do so within the application, and not at the REST level. I want my Front office app to have an "Undo" button (which makes much more sense than "Back"). [Why do I need chameleon browser chrome when I can just do <xforms:button id="undo"><xforms:caption>Undo</xforms:caption>...</xforms:button>?] When I click browser "Back" I want that to exit the application and go to the previous resource.

IMO People think they have trouble with the back button and Ajax because they do not appreciate protocol layering very well, and because the AJAX tools do not yet help in this understanding. I think a better understanding of this layering and better tools are what's needed, not a major redesign of the browser idiom.

[Uche Ogbuji]

via Copia

Agile Web #2: "Handling Atom Text and Content Constructs"

"Handling Atom Text and Content Constructs"

Uche Ogbuji's Agile Web column returns with a look at handling some of the trickier issues in the Atom Syndication Format, which has recently become RFC 4287, an internet standard.

Second article in my new column is out. In this one I focus on Atom text and content constructs. I spent more time on the Atom examples and less on the sample processing code, but I thought more of the former would be especially useful. I've been working with and writing about Atom a lot lately, and in fact I have an IBM developerWorks tutorial for Atom processing in XSLT in production. It should be live some time today.

Joe Gregorio has been working the other half of the Atom pie (old joke for folks who've been following Atom), and he has a very timely new article out: "Catching Up with the Atom Publishing Protocol".

And once again, if you'd like to discuss Atom (syntax or publishing protocol), please do join us on the #atom channel on irc.freenode.net.

[Uche Ogbuji]

via Copia

XSLT for converting from OPML to XBEL and XOXO

In all this Web feed hacking I've been working with my list originally exported from Lektora in OPML format. I wrote XSLT to convert from OPML to XBEL and XOXO. In the case of XOXO I really couldn't figure out any common conventions for Web feeds so I made up my own for now. The resulting XBEL looks a lot easier to work with, so I'm propose extensions for feed URL / site URL coupling in the renewal of XBEL. I figured my XSLT might be useful to others, so here are the links:

Going from XBEL to OPML, I've been using Dan MacTough's XSLT. (He also has an XBEL to XHTML transform). I sometimes have to tweak the resulting attributes to deal with xmlUrl/url and title/text type OPML madness.

I've also posted my Web feed list in XBEL form. It uses old school XBEL 1.0, and not any of the metadata additions I'm hoping to see in 1.2. As such, it's only a list of Web feeds and doesn't include the corresponding Weblog home pages.

[Uche Ogbuji]

via Copia

XML Bookmark Exchange Language (XBEL) gets a proper home

XML Bookmark Exchange Language (XBEL)

The Python XML SIG has had some really great times in its history. One of the highlights is the development of XML Bookmark Exchange Language (XBEL). In September of 1998, just as I was joining the group, they were developing this bookmarks exchange language that's still used in more browsers and bookmark management projects than any other particular format. The XML-SIG has fallen on quiet times, and one of the side effects of this is that additional work on XBEL has been neglected.

Earlier this year we agreed on the SIG to give XBEL its own home on SourceForge, but no one stepped up to make it happen, until John L. Clark got to it last week (thanks, John).

XBEL's new home is http://sourceforge.net/projects/xbel/. The old home is still up, but I think we should move it to http://xbel.sourceforge.net/, with some updates and maybe a design update (maybe make the page XHTML). We'll be discussing such things on the new XBEL mailing list, so please come join us. The main goal is to add more features to XBEL needed for its original role in browser bookmarks exchange, but I'm also interested in making it a useful format for general Web resource lists such as feed lists (e.g. a superior alternative to OPML).

John wrote up a good summary of recent discussions of XBEL.

I'll have more on our efforts summarized here on Copia as we progress.

[Uche Ogbuji]

via Copia

Ouch. I feel your pain, Sam

Seems Sam Ruby's presentation suffered a bit in the spectacle. Unfortunately, I'm no stranger to presentation set-up problems. I've also been lucky enough to have patient audiences. Maybe conference organizers will someday factor Linux A/V support into consideration when choosing venues (I can dream, eh?). I almost always can use projectors and stuff with no problem in usual business scenarios, and I can only guess that conference venues tend to have archaic A/V technology that doesn't like Linux.

As for the presentation itself, based on the slides much of it is an accumulation of issues probably well known to, say, a long-time XML-DEV reader, but useful to collect in one place. It looks like a much-needed presentation, and I hope Sam gets to present it again, with better luck with the facilities. Here follow a few reactions I had to stuff in the slides.

expat only understands utf-8

This hasn't been true for ages. Expat currently understands UTF-8, UTF-16, ASCII, ISO-8859-1, out of the box, and the user can add to this list by registering an "unknown encoding" event handler.

Encoding was routinely ignored by most of the initial RSS parsers and even the initial UserLand RSS validator. “Aggregators” did the equivalent of strcat from various sources and left the results for the browser

Yuck. Unfortunately, I worry that Mark Pilgrim's Universal Feed Parser might not help the situation with its current practice of returning some character data as strings without even guessed encoding information (that I could find, anyway). I found it very hard to build a character-correct aggregator around the Feed Parser 4.0 alpha version. Then again, I understand it's a hard problem with all the character soup ("char soup"?) Web feeds out there.

[Buried] in a non-normative appendix, there is an indication that the encoding specified in an XML document may not be authoritative.

Nope. There is no burial going on. As I thought I've pointed out on Copia before (but I can't find the entry now), section " 4.3.3 Character Encoding in Entities" of XML 1.0 says:

In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration.

So the normative part of the spec also makes it quite clear that an externally specified encoding can trump what's in the XML or text declaration.

The accuracy of metadata is inversely proportional to the square of the distance between the data and the metadata.

Very apt. I think that's why XML's attributes work as well as they do (despite the fact that they are so inexplicably maligned in some quarters).

In fact, Microsoft’s new policy is that they will always completely ignore [HTTP Content-Type] for feeds—even when the charset is explicitly present

XML of course doesn't force anyone to conform to RFC 3023, but Microsoft could prove itself a really good Web citizen by adopting it. Maybe they could lead the way to reducing the confusion I mention in this entry.

I think of Ruby's section on the WS-* mess to be an excellent indictment of the silly idea of universal and strong data typing.

In general, programming XML is hard.

Indeed it is. Some people seem to think this is a product of architecture astronautics. They are laughably mistaken. XML is hard because managing data is hard. Programmers have developed terrible habits through long years of just throwing their data over the wall at a SQL DBMS and hoping all goes OK in the end. The Web is ruthless in punishing such diffidence.

XML is the first technology that has forced mainstream programmers to truly have to think hard about data. This is a boundlessly good thing. Let the annoyances proliferate (that's your cue, Micah).

[Uche Ogbuji]

via Copia

Suspicions confirmed, so I'm ditching Rojo

Rojo was a pretty nice UI, but I always had a nagging suspicion it was showing me only a small number of my feeds. I finally got time to check on that last this morning. I loaded Liferea (Aristotle's suggestion) with the same feeds list I used for Rojo. I found numerous feeds with recent entries, which Rojo wasn't showing me. The worst part of it is that Rojo wasn't giving any indication that it was having trouble with some feeds (I did notice that the number unread in the left hand bar didn't match the number in the folder newspaper view, but I could never figure out how to get at exactly what that meant). Liferea is a bit clunky, but at least I don't miss entries using it. I guess I'll use it until I find something better.

[Uche Ogbuji]

via Copia

I must be missing something about XOXO (and maybe microformats in general)

As I mentioned I started working with XBEL as a way to manage my Web feeds. It occurred to me that I should consider XOXO for this purpose, since it hsa more traditionally been put up in opposition to OPML.

Well, I don't get it. Sure, it's simple XHTML with some conventions for overlaid semantics, but how does that do anything for interoperability of Webfeed subscription lists? I've taken a look at attention.xml, and that seems more thoroughly specified, but it's way overkill for my needs.

Look, all I need to do is represent a categorized structure of feeds items with the following information per item:

  1. Web feed URL (e.g. RSS or Atom link)
  2. title
  3. optional description or notes
  4. optional Web site URL (e.g. Link to HTML or XHTML page for Weblog)

The trouble with OPML is that there are dozens of ways to encode these four bits of information, and as I've found, tools pretty much range all across that dozen. Besides, OPML is really poor XML design. That's a practical and not just aesthetic concern, because I expect to manage this information partly by hand.

XBEL is much better markup design, but I don't know that it has a natural way to represent the distinction between feed and content URL for the same item ("bookmark").

Everything I heard about XOXO led me to believe that this is a slam dunk, but hardly. The XOXO "spec" is not all that illuminating, and from what I can gather there or elsewhere, there are also a dozen ways I could encode the above information. Perhaps:

<li>
  Weblog home
  Weblog feed
  <dl>
    <dt>description</dt>
    <dd>That good ole Weblog</dd>
  </dl>
</li>

Perhaps (so that the likely HTML rendering is not jumbled):

<li>
  <ul>
    <li>Weblog home</li>
    <li>Weblog feed</li>
  </ul>
  <dl>
    <dt>description</dt>
    <dd>That good ole Weblog</dd>
  </dl>
</li>

But since Weblog contents could be XML, is it really safe to use media type as the distinguishing mark between Web site and Web feed links? OK, so perhaps:

<li>
  <ul>
    <li>Weblog home</li>
    <li>Weblog feed</li>
  </ul>
  <dl>
    <dt>description</dt>
    <dd>That good ole Weblog</dd>
  </dl>
</li>

But now I've invented a relationship vocabulary (I guess this is technically my own microformat) and why would I expect another XOXO tool to use rel="website" and rel="webfeed"?

I could go on with possible variations. I do like the way that I can simply refer to the XHTML Hypertext Attributes Module to get some general ideas about semantics, but that's not really good enough because I have a fairly specific need.

I imagine someone will say that XOXO is just a general outlining format, and can't specify such things because it's all about being micro. But in that case why do people put XOXO itself as a solution for Webfeed corpus exchange? I can't see how XOXO can do the job without overlaying yet another microformat on it. And if we need to stack microformat on microformat to address such a simple need, what's wrong with good old macroformats: you know: a real, specialized XML format.

I've really only spent an hour or two exploring XOXO (although according to the microformats hype I shouldn't expect to need more time than that), so maybe I'm missing something. If so, I'd be grateful for any enlightening comments.

[Uche Ogbuji]

via Copia

I already said OPML is crap, right? I had to hack through another reminder today.

So today I tried to import OPML (yeah, that very OPML) into Findory (see last entry). The OPML is based on what I originally exported from Lektora and has been through all my feed experiments. A sample entry:

<outline url="http://www.parand.com/say/index.php/feed/" text="Parand Tony Darugar" type="link"/>

What does Findory tell me? 97 feeds rejected for "invalid source". Great. Now I actually have to get my hands dirty in OPML again. I check the spec. Of course there's no useful information there. I eventually found this Wiki of OPML conventions. I saw the type='rss' convention, but that didn't seem to make a difference. I also tried xmlUrl rather than url, like so:

<outline xmlUrl="http://www.parand.com/say/index.php/feed/" text="Parand Tony Darugar" type="link"/>

This time the Findory import works.

But not only do several of the feed readers I use have url rather than xmlUrl, but the XBEL to URL XSLT I've found assumes that name as well. The conventions page also mentions title versus text as a way to provide formatting in some vague way, but I've seen OPML feeds use only title and nary a text to be seen anywhere. Besides, what's wrong with the XML way of allowing formatting: elements rather than attributes. It's enough to boil the brain.

Speaking of XBEL, that's actually how I'm managing my feeds now, as I'll discuss further in the next entry. Now that Web feeds have become important to me I'll be using a sane format to manage them, thank you very much. I'll do the XSLT thing to export OPML for all the different tools that insist on tag soup. That is, of course, if I can figure out what precise shade of OPML each tool insists on. Today's adventure with feed URL attributes makes me wonder whether there is any escaping the chaos.

[Uche Ogbuji]

via Copia

Still looking for a feed reader, perhaps

Earlier I trawled around for a new way of reading my Web feeds. Readers were kind enough to mention Rojo in comments, and I've been using it ever since, but I'm not so sure anymore. It's very nice, but there are a couple of UI nits, and I have a sneaking suspicion it's not showing all new stories. I'm not set on ditching it yet, but I'm looking around again. I found the very useful resources "1 week comparison: SearchFox, Feedster, Pluck, Bloglines, Rojo, and NewsGator", and the follow-up "3 week shakedown, 2 RSS readers remain.". He's leaning towards SearchFox (which a Copia reader had mentioned earlier) and Rojo. Someone in his comments mentioned Findory as well. I went to www.searchfox.com and signed up, but I guess that's the wrong site. I went to rss.searchfox.com, but you have to e-mail them to get a beta account. I'll give my initial impressions on SearchFox if and when I get an account. As for Findory, the first issue is that I couldn't figure out how to import OPML. I went back to the comment from which I learned about it and found that the key URL is http://findory.com/s/, but I don't think this is very clear on their Web site.

After some OPML silliness (more on that later) I completed the import and found that clicking the "favorites" link in the upper right hand side is the key to focusing on your own feeds, and not all the other stuff Findory wants to show you. I don't think it will work for me though, because it's not the newspaper style aggregator that I prefer.

[Uche Ogbuji]

via Copia