Python/XML column #37 (and out): Processing Atom 1.0

In his final Python-XML column, Uche Ogbuji shows us three ways to process Atom 1.0 feeds in Python. [Sep. 14, 2005]

I show how to parse Atom 1.0 using minidom (for those who want no additional dependencies), Amara Bindery (for those who want an easier API) and Universal Feed Parser (with a quick hack to bring the support in UFP 3.3 up to Atom 1.0). I also show how to use DateUtil and Python 2.3's datetime to process Atom dates.

As the teaser says, we've come to the end of the column in its present form, but it's more of a transition than a termination. From the article:

And with this month's exploration, the Python-XML column has come to an end. After discussions with my editor, I'll replace this column with one with a broader focus. It will cover the intersection of Agile Languages and Web 2.0 technologies. The primary language focus will still be Python, but there will sometimes be coverage of other languages such as Ruby and ECMAScript. I think many of the topics will continue to be of interest to readers of the present column. I look forward to continuing my relationship with the XML.com audience.

It is too bad that I don't get to some of the articles that I had in the queue, including coverage of lxml pygenx, XSLT processing from Python, the role of PEP 342 in XML processing, and more. I can still squeeze some of these topics into the new column, I think, as long as I make an emphasis on the Web. I'll also try to keep up my coverage of news in the Python/XML community here on Copia.

Speaking of such news, I forgot to mention in the column that I'd found an interesting resource from John Shipman.

[F]or my relatively modest needs, I've written a more Pythonic module that uses minidom. Complete documentation, including the code of the module in 'literate programming' style, is at:

http://www.nmt.edu/tcc/help/pubs/pyxml/

The relevant sections start with section 7, "xmlcreate.py".

[Uche Ogbuji]

via Copia

3 responses

That FeedParser hack will only allow very rudimetary consumption of Atom feeds because several elements were renamed since 0.3 and the content model has completely changed. Only titles and links will work well; there is, however, a FeedParser Atom 1.0 support patch with decent standard compliance.

I already mentioned this over at XML.com, (and blogged about this before), but I repeat it here for some more Googlejuice -- that patch is still difficult to find.

— Aristotle Pagaltzis

Aristotle,

Thanks for the plug. FuCoder.com was a new site that I made. It only went live near the end of August, and hasn't really been graced by Google. You probably found the patch from a link from my personal blogsite.

I have also posted the patch (which was actually quite crude) to Mark and SourceForge. Haven't got anything back yet. What I like about feedparser is that the parsing is actually quite forgiving. After all it is not a validator, but a parser that needs to work with millions of malformed feeds out there.

— Scott Yang

Yeah, I wondered about that. Google quarantines new domains for up to six months to evade spam these days. Plug given gladly; there’s a bunch of stuff that relies on FeedParser, and until FeedParser gets fixed none of them will be able to consume Atom 1.0, so the patch is greatly needed. I did indeed find it via the trackback at scott.yang.id.au – pretty much by chance.

Uche: btw, you write in the article that you have serious trouble with some design decisions. What are they? I’d be very interested to hear more about that.