Adding feeds to Liferea on the command line

Despite the kind help of the Rojo people I still can't get the service to import my updated feed lists ('An error has occurred...Failed to import: null...We apologize for the inconvenience.'), so I'm still reading my Web feeds on Liferea for now. One nice bonus with Liferea is the ability to add feeds from the command line (or really, any program) courtesy GNOME's DBUS. Thanks to Aristotle for the tip, pointing me to 'a key message on liferea-help'. I've never used DBUS before, so I may be sketchy on some details, but I got it to work for me pretty easily.

I start with a simple script to report on added feed entries. It automatically handles feed lists in OPML or XBEL (I use the latter for managing my feed lists, and Liferea uses the former to manage its feed list).

import amara
import sets

old_feedlist = '/home/uogbuji/.liferea/feedlist.opml'
new_feedlist = '/home/uogbuji/devel/uogbuji/webfeeds.xbel'

def get_feeds(feedlist):
    doc = amara.parse(feedlist)
    #try OPML first
    feeds = [ unicode(f) for f in doc.xml_xpath(u'//outline/@xmlUrl') ]
    if not feeds:
        #then try XBEL
        feeds = [ unicode(f) for f in doc.xml_xpath(u'//bookmark/@href') ]
    return feeds

old_feeds = sets.Set(get_feeds(old_feedlist))
new_feeds = sets.Set(get_feeds(new_feedlist))

added = new_feeds.difference(old_feeds)
for a in added: print a

I then send a subscription request for each new item as follows:

$ dbus-send   --dest=org.gnome.feed.Reader /org/gnome/feed/Reader \
  org.gnome.feed.Reader.Subscribe \
  "string:http://feeds.feedburner.com/DrMacrosXmlRants"

The first time I got an error "Failed to open connection to session message bus: Unable to determine the address of the message bus". I did an apropos dbus and found dbus-launch. I added the suggested stanza to my .bash_profile:

if test -z "$DBUS_SESSION_BUS_ADDRESS" ; then
    ## if not found, launch a new one
    eval ‘dbus-launch --sh-syntax --exit-with-session‘
    echo "D-BUS per-session daemon address is: $DBUS_SESSION_BUS_ADDRESS"
fi

After running dbus-launch the dbus-send worked and Liferea immediately popped up a properties dialog box for the added feed, and stuck it into the feeds tree at the point I happened to last be browsing in Liferea (not sure I like that choice of location). Simple drag&drop to put it where I want. Neat.

[Uche Ogbuji]

via Copia

Recipe for freezing 4Suite or Amara apps (cross-platform)

Updated based on user experience.

Recently a user mentioned having trouble freezing an Amara app. This question comes up every six months or so, it seems, so I decided to make sure I have a recipe for easy reference. I also wanted to make sure that successful freezing would not require any changes in 4Suite before the next release. I started with the most recent success report I could find, by Roman Yakovenko. Actually, his recipe ran perfectly well as is. All I'm doing here is expanding on it.

Recipe: freezing 4Suite or Amara apps

Grab cxFreeze. I used the 3.0.1 release, which I built from source on Fedora Core 4 Linux and Python 2.4.1). Updated: I've updated freezehack.py to work with cxFreeze 3.0.2, thanks to Luis Miguel Morillas.

Grab freezehack.py, which was originally put together by Roman. Add it to your PYTHONPATH.

Add import freezehack to your main Python module for the executable to be created. update actually, based on Mike Powers' experience you might have to add this import to every module that imports amara or Ft.

Freeze your program as usual. Run FreezePython.exe (or FreezePython on UNIX).

See the following sample session:

$ cat main.py
import freezehack
import amara
diggfeed = amara.parse("http://www.digg.com/rss/index.xml")
print diggfeed.rss.channel.item.title

$ FreezePython --install-dir dist --target-name testexe main.py
[SNIP]
Frozen binary dist/testexe created.

$ ./dist/testexe
Guess-the-Google - Simple but addictive game

In order to share the executable you have to copy the whole dist directory to the target machine, but that's all you should need to do. Python, 4Suite, Amara and any other such dependencies are bundled automatically.

Now back to the release.

[Uche Ogbuji]

via Copia

Agile Web #3: "Scripting Flickr with Python and REST"

"Scripting Flickr with Python and REST"

In his latest Agile Web column, Uche Ogbuji shows us how to use Python to interact with Flickr as a lightweight web service.

This Agile Web installment is fairly straightforward. I look at the several Python libraries for accessing Flickr from programs. They range from low level, thin veneers over the official Flickr API to the one higher level, more Pythonic library. And of course there's the obligatory package I just can't get to work.

[Uche Ogbuji]

via Copia

Merging Atom 1.0 feeds with Python

mergeatom.py

At the heart of Planet Atom is the mergeatom module. I've updated mergeatom a lot since I first released it. It's still a simple Python utility for merging multiple Atom 1.0 feeds into an aggregated feed. Some of the features:

  • Reads in a list of atom URLs, files or content strings to be merged into a given target document
  • Puts out a complete, merged Atom document (duplicates by atom:id are suppressed).
  • Collates the entries according to date, allowing you to limit the total. WARNING: Entries from the original Atom feed may be deleted according to ID duplicate removal or entry count limits.
  • Allows you to set the sort order of resulting entries
  • Uses atom:source elements, according to the spec, to retain key metadata from the originating feeds
  • Normalizes XML namespaces prefixes for output Atom elements (uses atom:*)
  • Allows you to limit contained entries to a date range
  • Handles base URIs fixup intelligently (Base URIs on feed elements are) migrated on to copied entries so that contained relative links remain correct

It requires atomixlib 0.3.0 or more recent, and Amara 1.1.6 or more recent

[Uche Ogbuji]

via Copia

CherryPy QOTW

OK there are many fun zingers from the latest CherryPy/WSGI/Paste tiff, but this one resonated soundly with me:

In times BC (Before CherryPy ;) I would simply write stuff in PHP because it was easier than fitting my square mind in some Python framework's tetrahegon shaped hole.

—Christian Dowski, Captain Buffet

Such an L7. :-) And I don't know why that puts in my head the nonce construction "techno-hadron".

[Uche Ogbuji]

via Copia

Planet Atom

Planet Atom is now live.

Planet Atom focuses Atom streams from authors with an affinity for syndication and Atom-specific issues. This site was developed by Sylvain Hellegouarch, Uche Ogbuji, and John L. Clark. Please visit the Planet Atom development site if you are interested in the source code. The complete list of sources is maintained in XBEL format (with some experimental extensions); please contact one of the site developers if you want to suggest a modification to this list.

John, Sylvain and I have been working at this on and off for over a month now (we've all been swamped with other things—the actual development of the site was fairly straightforward). Planet Atom is built on an aggregation from Atom 1.0 feeds into one larger feed (with entries collated, trimmed etc.) It's built on 4Suite (for XSLT processing), CherryPy (for Web serving), Amara (for Atom feed slicing and dicing), atomixlib (for building the aggregate feed) and dateutil (for date wrangling), with Python and XML as the twin foundations, of course. Thanks to folks on the #atom and #swhack IRC channels for review and feedback.

[Uche Ogbuji]

via Copia

ElementTree in Python stdlib

The big news in Python/XML in December was the checking in of ElementTree into Python's standard library, which means the package will be part of Python 2.5. The python-dev summary has a good account of the move. I'm surprised it took so long. Even though I've been involved in at least one ElementTree versus 4Suite battle (I also wrote the earliest and possibly only comprehensive treatment of ElementTree), I've always appreciated that ElementTree's relative weightlessness makes it a decent option for Pythoneers who really don't care much about all the intricacies of XML and just have to reluctantly deal with a mess of angle brackets. That's the sort of library that ends up in the stdlib, and for good reason. And for sure, developers have needed a stdlib alternative to SAX and DOM for ages.

A very useful side effect of this move is a step towards merging PyXML into Python SVN, and eliminating the _xmlplus hack that was used to have PyXML installs shadow the stdlib originals. XML-SIG has argued about this for years, but there was never a stimulus for anyone to actually get something done about it. To quote from my article "Gems from the Mines: 2002 to 2003".

In early 2003] Martijn Faassen kicked off a long discussion on the future of PyXML by [complaining about the "_xmlplus hack" that PyXML uses to serve as a drop-in replacement for the Python built-in XML libraries. After he reiterated the complaint the discussion turned to a very serious one that underscored the odd in-between status of PyXML, and in what ways the PyXML package continued to be relevant as a stand-alone. Most of these issues are still not resolved, so this thread is an important airing of considerations that affect many Python/XML users to this day.

Congrats to Fredrik on this culmination of his hard work. And here's to the continued diversity of developer options in the very complex field of XML processing.

[Uche Ogbuji]

via Copia

Updates to Python style guidelines (PEP 8)

There has been a lot of discussion about updates to PEP 8. The contributing discussion is summarized in this python-dev summary, although the link to the work-in-progress summary is broken. I use instead the subversion snapshot for the PEP. A lot of the discussion makes sense to me, although I personally are in the "deprecate leading double underscore as private camp", despite Tim Peters's sensible arguments. I think the Timbot's arguments point instead to a need for a clearer declarative mechanism for privates variables. I'll be burned at the stake for making a suggestion that could whiff of C++ or Java envy, but I'll counter that even if the language has such a mechanism, I'll likely never use it.

[Uche Ogbuji]

via Copia

Expat 2.0 (featured in 4Suite CVS)

Expat 2.0 has made it to the world, after a long incubation. Expat is, of course, the very popular XML parser in C originally developed by James Clark. The first I learned of this development was an announcement by Jeremy Kloth. His announcement also mentioned that current 4Suite CVS includes Expat 2.0, and that it's probably the first outside project to do so. In fact, the most recent 4Suite beta release included an Expat CVS snapshot that was for all practical purposes 2.0.

This is a very important milestone as it will allow Expat development to move on to more innovative pastures. And of course it adds the essential—support for AmigaOS (I'm going to get hate mail from my Amiga booster friends from college).

[Uche Ogbuji]

via Copia

Learn how to invent XML languages, then do so

There has been a lot of chatter about Tim Bray's piece "Don’t Invent XML Languages". Good. I'm all for anything that makes people think carefully about XML language design and problems of semantic transparency (communicated meaning of XML structure). I'm all for it even though I generally disagree with Tim's conclusions. Here are some quick thoughts on Tim's essay, and some of the responses I've seen.

Here’s a radical idea: don’t even think of making your own language until you’re sure that you can’t do the job using one of the Big Five: XHTML, DocBook, ODF, UBL, and Atom.—Bray

This is a pretty biased list, and happens to make sense for the circles in which he moves. Even though I happen to move in much the same circles, the first thing I'd say is that there could hardly ever be an authoritative "big 5" list of XML vocabs. There is too much debate and diversity, and that's too good a thing to sweep under the rug. MS Office XML or ODF? OAGIS or UBL? RSS 2.0 or Atom? Sure I happen to plump for the latter three, as Tim does, but things are not so clear cut for the average punter. (I didn't mention TEI or DocBook because it's much less of a head to head battle).

I made my own list in "A survey of XML standards: Part 3—The most important vocabularies" (IBM developerWorks, 2004). It goes:

  • XHTML
  • Docbook
  • XSL-FO
  • SVG
  • VoiceXML
  • MathML
  • SMIL
  • RDF
  • XML Topic Maps

And in that article I admit I'm "just scratching the surface". The list predates first full releases of Atom and ODF, or they would have been on it. I should also mention XBEL, which is, I think, not as widely trumpetd, but just about as important as those other entrants. BTW, see the full cross-reference of my survey of XML standards.

Designing XML Languages is hard. It’s boring, political, time-consuming, unglamorous, irritating work. It always takes longer than you think it will, and when you’re finished, there’s always this feeling that you could have done more or should have done less or got some detail essentially wrong.—Bray

This is true. It's easy to be flip and say "sure, that's true of programming, but we're not being advised to write no more programs". But then I think this difficulty is even more true of XML design than of programming, and it's worth reminding people that a useful XML vocabulary is not something you toss off in the spare hour. Simon St.Laurent has always been a sound analyst of the harm done by programmers who take shortcuts and abuse markup in order to suite their conventions. The lesson, however, should be to learn best practices of markup design rather than to become a helpless spectator.

If you’re going to design a new language, you’re committing to a major investment in software development. First, you’ll need a validator. Don’t kid yourself that writing a schema will do the trick; any nontrivial language will have a whole lot of constraints that you can’t check in a schema language, and any nontrivial language needs an automated validator if you’re going to get software to interoperate.

If people would just use decent schema technology, this point would be very much weakened. Schema designers rarely see beyond plain W3C XML Schema or RELAX NG. Too bad. RELAX NG plus Schematron (with XPath 1.0/XSLT 1.0 drivers) covers a huge number of constraints. Add in EXSLT 1.0 drivers for Schematron and you can cover probably 95+% of Atom's constraints (probably more, actually). Throw in user-defined extensions and you have a very powerful and mostly declarative validation engine. We should do a better job of rendering such goodness to XML developers, rather than scaring them away with duct-tape-validator bogeymen.

Yes, XHTML is semantically weak and doesn’t really grok hierarchy and has a bunch of other problems. That’s OK, because it has a general-purpose class attribute and ignores markup it doesn’t know about and you can bastardize it eight ways from center without anything breaking. The Kool Kids call this “Microformats”...

This understated bit is, I think, the heart of Tim's argument. The problem is that I still haven't been able to figure out why Microformats have any advantage in Semantic transparency over new vocabularies. Despite the fuzzy claims of μFormatters, a microformat requires just as much specification as a small, standalone format to be useful. It didn't take me long kicking around XOXO to solve a real-world problem before this became apparent to me.

Some interesting reactions to the piece

Dare Obasanjo. Dare indirectly brought up that Ian Hickson had argued against inventing XML vocabularies in 2003. I remember violently and negatively reacting to the idea that everyone should stick to XHTML and its elite companions. Certainly such limitations make sense for some, but the general case is more nuanced (thank goodness). Side note: another pioneer of the pessimistic side of this argument is Mark Pilgrim http://www.xml.com/pub/au/164. Needless to say I disagree with many of his points as well.

I've always considered it a gross hack to think that instead of having an HTML web page for my blog and an Atom/RSS feed, instead I should have a single HTML page with <div class="rss:item"> or <h3 class="atom:title"> embedded in it instead. However given that one of the inventors of XML (Tim Bray) is now advocating this approach, I wonder if I'm simply clinging to old ways and have become the kind of intellectual dinosaur I bemoan.—Obasanjo

Dare is, I think, about as stubborn and tart as I am, so I'm amazed to see him doubting his convictions in this way. Please don't, Dare. You're quite correct. Microformats are just a hair away from my pet reductio ad absurdum<tag type="title"> rather than just <title>. I still haven't heard a decent argument for such periphrasis. And I don't see how the fact that tag is semantically anchored does anything special for the stepchild identifier title in the microformats scenario.

BTW, there is a priceless quote in comments to Dare:

OK, so they're saying: don't create new XML languages - instead, create new HTML languages. Because if you can't get people to [separate presentation from data], hijack the presentation!—"Steve"

Wot he said. With bells on.

Danny Ayers .

I think most XML languages have been created by one of three processes - translating from a legacy format; mapping directly from the domain entities to the syntax; creating an abstract model from the domain, then mapping from that to the XML. The latter two of these are really on a greyscale: a language designer probably has the abstract entities and relationships in mind when creating the format, whether or not they have been expressed formally.—Ayers

I've had my tiffs with RDF gurus lately, but this is the sort of point you can trust an RDF guru to nail, and Danny does so. XML languages are, like all languages, about expression. The farther the expression lies from the abstraction being expressed (the model), the more expensive the maintenance. Punting to an existing format that might have some vague ties to the problem space is a much worse economic bet than the effort of designing a sound and true format for that problem space.

To slightly repurpose another Danny quote towards XML,

...in most cases it’s probably best to initially make up afresh a new representation that matches the domain model as closely as possible(/appropriate). Only then start looking to replacing the new terms with established ones with matching semantics. But don’t see reusing things as more important than getting an (appropriately) accurate model.—Ayers

Ned Batchelder. He correctly identifies that Tim Bray's points tend to be most applicable to document-style XML. I've long since come to the conclusion (again with a lot of influence from Simon St.Laurent) that XML is too often the wrong solution for programmer-data-focused formats (including software configuration formats). Yeah, of course I've already elaborated in the Python context.

[Uche Ogbuji]

via Copia