“Real Web 2.0: Bookmarks? Tagging? Delicious!”

“Real Web 2.0: Bookmarks? Tagging? Delicious!”

Subtitle: Learn how real-world developers and users gain value from a classic Web 2.0 site
Synopsis: In this article, you'll learn how to work with del.icio.us, one of the classic Web 2.0 sites, using Web XML feeds and JSON, in Python and ECMAScript. When you think of Web 2.0 technology, you might think of the latest Ajax tricks, but that is just a small part of the picture. More fundamental concerns are open data, simple APIs, and features that encourage users to form social networks. These are also what make Web 2.0 a compelling problem for Web architects. This column will look more than skin deep at important real-world Web 2.0 sites and demonstrate how Web architects can incorporate the best from the Web into their own Web sites.

This is the first installment of a new column, Real Web 2.0. Of course "Web 2.0" is a hype term, and as has been argued to sheer tedium, it doesn't offer anything but the most incremental advances, but in keeping with my tendency of mildness towards buzzwords I think that anything that helps focus Web developers on collaborative features of Web sites is a good thing. And that's what this column is about. It's not about the Miss AJAX pageant, but rather about open data for users and developers. From the article:

The substance of an effective Web 2.0 site, and the points of interest for Web architects (as opposed to, say, Web designers), lie in how readily real developers and users can take advantage of open data features. From widgets that users can use to customize their bits of territory on a social site to mashups that developers can use to create offspring from Web 2.0 parents, there are ways to understand what leads to success for such sites, and how you can emulate such success in your own work. This column, Real Web 2.0, will cut through the hype to focus on the most valuable features of actual sites from the perspective of the Web architect. In this first installment, I'll begin with one of the ancestors of the genre, del.icio.us.

And I still don't want that that monkey-ass Web 1.0. Anyway, as usual, there's lots of code here. Python, Amara, ECMAScript, JSON, and more. That will be the recipe (mixing up the ingredients a bit each time) as I journey along the poster child sites for open data.

[Uche Ogbuji]

via Copia

Discussion group for Atom protocol implementations in Python

I've had discussions about implementing Atom protocol in Python with man colleagues, and I decided to create a proper forum for discussion, and so the Google group atom-protocol-python was born.

A group dedication to discussion among developers and users of Python libraries and tools for processing the atom protocol, either as client or server.

Honestly, the idea of an Atom store, and of an Atom client is so broad that I expect there to be several implementations in Python. This group is to be very open, and I'd love for even folks working on competing implementations to join up, so we can at least discuss interoperability.

And don't forget there is also an Atom IRC channel where we can discuss Atom syntax and Atom protocol. And while I'm plugging stuff, I shan't forget Planet Atom.

[Uche Ogbuji]

via Copia

Adding feeds to Liferea on the command line

Despite the kind help of the Rojo people I still can't get the service to import my updated feed lists ('An error has occurred...Failed to import: null...We apologize for the inconvenience.'), so I'm still reading my Web feeds on Liferea for now. One nice bonus with Liferea is the ability to add feeds from the command line (or really, any program) courtesy GNOME's DBUS. Thanks to Aristotle for the tip, pointing me to 'a key message on liferea-help'. I've never used DBUS before, so I may be sketchy on some details, but I got it to work for me pretty easily.

I start with a simple script to report on added feed entries. It automatically handles feed lists in OPML or XBEL (I use the latter for managing my feed lists, and Liferea uses the former to manage its feed list).

import amara
import sets

old_feedlist = '/home/uogbuji/.liferea/feedlist.opml'
new_feedlist = '/home/uogbuji/devel/uogbuji/webfeeds.xbel'

def get_feeds(feedlist):
    doc = amara.parse(feedlist)
    #try OPML first
    feeds = [ unicode(f) for f in doc.xml_xpath(u'//outline/@xmlUrl') ]
    if not feeds:
        #then try XBEL
        feeds = [ unicode(f) for f in doc.xml_xpath(u'//bookmark/@href') ]
    return feeds

old_feeds = sets.Set(get_feeds(old_feedlist))
new_feeds = sets.Set(get_feeds(new_feedlist))

added = new_feeds.difference(old_feeds)
for a in added: print a

I then send a subscription request for each new item as follows:

$ dbus-send   --dest=org.gnome.feed.Reader /org/gnome/feed/Reader \
  org.gnome.feed.Reader.Subscribe \
  "string:http://feeds.feedburner.com/DrMacrosXmlRants"

The first time I got an error "Failed to open connection to session message bus: Unable to determine the address of the message bus". I did an apropos dbus and found dbus-launch. I added the suggested stanza to my .bash_profile:

if test -z "$DBUS_SESSION_BUS_ADDRESS" ; then
    ## if not found, launch a new one
    eval ‘dbus-launch --sh-syntax --exit-with-session‘
    echo "D-BUS per-session daemon address is: $DBUS_SESSION_BUS_ADDRESS"
fi

After running dbus-launch the dbus-send worked and Liferea immediately popped up a properties dialog box for the added feed, and stuck it into the feeds tree at the point I happened to last be browsing in Liferea (not sure I like that choice of location). Simple drag&drop to put it where I want. Neat.

[Uche Ogbuji]

via Copia

Small fix to atom.rnc, and what about xml:space?

RobertBachmann stopped by #atom to mention that he'd tried to run an Atom file on the non-normative RELAX NG for the Atom RFC draft (I haven't seen an RNC for the final RFC itself). It failed because he used xml:lang in an atom:name child of atom:author. This contradicts the Atom spec, which says:

Any element defined by this specification MAY have an xml:lang attribute, whose content indicates the natural language for the element and its descendents.

The RNC did not specify this attribute in a couple of cases. The RNC is non-normative, but in this case there is no reason for divergence from the spec. I whipped up an atom.rnc that fixes the bug. Here's the diff from the version I found on-line.

This did set up a discussion between Anne van Kesteren and me. I feel that xml:lang only makes sense for some Atom elements, and that perhaps allowing it on all of them could be confusing. What, for example, does it mean to have xml:lang on the atom:uri child of atom:author? I suppose an outlandish (pun intended) interpretation could be references to localized sites, but that's really the province of the likes of XHTML's hreflang attribute. Moreover, I'm a bit puzzled by the bit from the Atom spec that seems to support my leaning:

The language context is only significant for elements and attributes declared to be "Language-Sensitive" by this specification.

So if it's not significant, why allow it? I think maybe there should have been a split in attribute sets between atomCommonAttributes and a atomCommonLanguageSensitiveAttributes, where the former would omit xml:lang.

Also, I'm used to the convention where xml:lang is used with content models that allow a language-sensitive element to be repeated, providing for multiple language versions in the same document. There are many cases in Atom where this would not be possible. For example, you could not have an English atom:title and a French one within the same atom:entry element. You could get tricky with by using a single atom:entry with type="xhtml" and multiple language versions within the xhtml:div, but this feels a bit constricting.

Anne doesn't mind xml:lang everywhere, and pointed out that xml:lang="" is an option for specifying no language context (rather than language context inherited from parent). I think in the end I could go either way on xml:lang everywhere.

This discussion also made me think of xml:space. This special attribute might get a mention right in the XML spec, but that doesn't mean it doesn't have to be addressed in XML applications. Even in the case of DTD, the spec says

In valid documents, this attribute, like any other, must be declared if it is used.

The same goes for RELAX NG, the conventional schema language for Atom. There is no xml:space to be found in either the normative RFC or non-normative schema, but the rules for Atom undefinedAttribute do allow for this attribute (as well as xml:id and just about any other XML or 'global' attribute). I assume that the intention is for applications to treat this attribute using the suggested semantics in the XML 1.0 spec. I do wish Atom had been explicit about this as is, for example, the XSLT 1.0 spec.

[Uche Ogbuji]

via Copia

Merging Atom 1.0 feeds with Python

mergeatom.py

At the heart of Planet Atom is the mergeatom module. I've updated mergeatom a lot since I first released it. It's still a simple Python utility for merging multiple Atom 1.0 feeds into an aggregated feed. Some of the features:

  • Reads in a list of atom URLs, files or content strings to be merged into a given target document
  • Puts out a complete, merged Atom document (duplicates by atom:id are suppressed).
  • Collates the entries according to date, allowing you to limit the total. WARNING: Entries from the original Atom feed may be deleted according to ID duplicate removal or entry count limits.
  • Allows you to set the sort order of resulting entries
  • Uses atom:source elements, according to the spec, to retain key metadata from the originating feeds
  • Normalizes XML namespaces prefixes for output Atom elements (uses atom:*)
  • Allows you to limit contained entries to a date range
  • Handles base URIs fixup intelligently (Base URIs on feed elements are) migrated on to copied entries so that contained relative links remain correct

It requires atomixlib 0.3.0 or more recent, and Amara 1.1.6 or more recent

[Uche Ogbuji]

via Copia

Planet Atom

Planet Atom is now live.

Planet Atom focuses Atom streams from authors with an affinity for syndication and Atom-specific issues. This site was developed by Sylvain Hellegouarch, Uche Ogbuji, and John L. Clark. Please visit the Planet Atom development site if you are interested in the source code. The complete list of sources is maintained in XBEL format (with some experimental extensions); please contact one of the site developers if you want to suggest a modification to this list.

John, Sylvain and I have been working at this on and off for over a month now (we've all been swamped with other things—the actual development of the site was fairly straightforward). Planet Atom is built on an aggregation from Atom 1.0 feeds into one larger feed (with entries collated, trimmed etc.) It's built on 4Suite (for XSLT processing), CherryPy (for Web serving), Amara (for Atom feed slicing and dicing), atomixlib (for building the aggregate feed) and dateutil (for date wrangling), with Python and XML as the twin foundations, of course. Thanks to folks on the #atom and #swhack IRC channels for review and feedback.

[Uche Ogbuji]

via Copia

Embedded markup

In an article I'm working on I refer to Norm Walsh's piece Embedded Markup Considered Harmful and his follow-up "Escaped Markup: What To Do Instead". I've always urged people to use type="xhtml" in Atom rather than type="html" and do the tidying to XHTML in the aggregation processing stage, and my arguments largely line up with Norm's.

From a comment by Dan Connolly I also found this older XML.com article with the same title as Norm's: "Embedded Markup Considered Harmful" by Theodor Holm Nelson. I can't make much sense of what Mr. Nelson is saying. Can you?

[Uche Ogbuji]

via Copia