Fancy Web

OK, so in the past week Dare has flipped the bozo bit on the "Web 2.0" term. Chime has put the term in front of the firing squad. I get the point, already. I've always thought that it's a silly term, but beyond the hype I do think it's useful to have a term for all those emerging aspects of the Web that we did not see a lot of, say, three years ago. Let's be real, a lot about the Web is changing, not necessarily all for the better, but the phenomenon does still need to be described. So for my part, I'll ditch the "Web 2.0" moniker and adopt something more appropriately tongue-in-cheek—"Fancy Web". (I have to stop myself from extrapolating to "Antsy Web", "Chancy Web", "Dancy Web" or the not-very-P.C. "Nancy Web").

BTW, the first person to suggest that it should instead be "FancyWeb" gets the gas-face (n/m).

[Uche Ogbuji]

via Copia

500 Web feed readers, and none dead on

I started out using straw for reading Web feeds. I found it a bit cumbersome in terms of death by keystroke, having to tap through each entry in each feed. I figured there had to be a better way. Narval (whatever happened to Narval, anyways?) was combining news sources into a virtual newspaper five years ago. I found something the like in Lektora. I quickly found an assortment of annoyances, and I now use a combo of Straw and Lektora. I see that Lektora still barely supports Linux (they have an October 13th release for Windows, June 7th for Mac and the same old March 18th for Linux that I downloaded earlier this year), so it's time to move on.

Parand Tony Darugar suggested Bloglines, and I had a look. In UI, it's just like Lektora, but implemented over the Web rather than as a browser extension. I think it would be perfect except that the over-the-Web functionality makes it rather slow. I've read of a lot of folks who started on Bloglines jumped for Firefox's Sage extension. I'd tried it before and found it to be just Straw in the Web browser, and when I checked again, it's still just that. I'd rather not go back to death by keystroke or mouse click. OK, to be fair, Sage is much less clicky than Straw. It's actually quite clever in how it lays out all the entries for each feed. I just wish it could do that for groups of feeds rather than individual ones.

So it looks as if I'm headed for Bloglines, but first of all I'd like to throw out a lazy-Web check for any other suggestions. I'm up for any browser-based or Linux tool. I'm willing to pay (within reason) for a really solid tool. I prefer the newspaper-like format (if you couldn't tell), where I can just group my Web feeds and then open each group together and mostly just have to scroll down to read all updated entries in that group. I have scrolled through the bewildering array in the RSS Compendium and tried some of them, but the ones I tried didn't really impress me.

Any ideas, friends? TIA.

[Uche Ogbuji]

via Copia

Processing "Web 2.0" using XSLT document() variants? No thanks.

Mark Nottingham has written an intriguing piece "XSLT for the Rest of the Web". It's drummed up some interest, some of which has even leaked into the 4Suite mailing list thanks to the energetic Sylvain Hellegouarch. Mark says:

I’ve raved before about how useful the XSLT document() function is, once you get used to it. However, the stars have to be aligned just so to use it; the Web site can’t use cookies for anything important, and the content you’re interested in has to be available in well-formed XML.

He goes on to present a set of extension functions he's created for libxslt. They are basically smarter document() functions that can do fancy Web things, including HTTP POST, and using HTML Tidy to grab tag soup HTML as XHTML.

As I read through it, I must say my strong impression was "been there, done that, probably never looking back". Certainly no diss of Mark intended there. He's one of the sharper hackers I know. I guess we're just at different points in our thinking of where XSLT fits into the Web-savvy apps toolkit.

First of all, I think the Web has more dragons than you could easily tame with even the mightiest XSLT extension hackery. I think you need general-purpose programming language to wrangle "Web 2.0" without drowning in tears.

More importantly, if I ever needed XSLT's document() function to process anything more than it's spec'ed to, I would consider that a pretty strong indicator that it's time to rethink part of my application architecture.

You see, I used to be a devotee of XSLT all over the place, and XSLT extensions for just about every limitation of the language. Heck, I wrote a whole framework of such things into 4Suite Repository. I've since reformed. These days I take the pipeline approach to such processing, and I keep XSLT firmly in the narrow niche for which it was designed. I have more on this evolution of thinking in "Lifting XSLT into application domain with extension functions?".

But back to Mark's idea. I actually implemented 4Suite XSLT extensions to use HTTP POST and to tidy tag soup HTML into XHTML, but I wouldn't dream of using these extensions any more. Nowadays, I use Python to gather and prepare data into a model representation that I then hand over to XSLT for pure presentation processing. Complex logical tasks such as accessing Web data beyond trivially fetched XML are matters for the model layer, and not the presentation logic. For example, if I need to tidy something, I tidy it at the Python level and put what I need of the resulting XHTML into the model XML before passing it to XSLT. I use Amara XML Toolkit with John Cowan's TagSoup for my tidying needs. I prefer TagSoup rather than tidy because I find it's faster and more robust.

Even if you use the libxml2 family of tools, I still think it's better to use libxml, and perhaps the libxml HTML parser to do the model processing and hand over resulting XML to libxslt in a separate step.

XSLT is pretty cool, but these days rather than reproduce all of Python's dozens of Web processing libraries therein, I plump for Python itself.

[Uche Ogbuji]

via Copia

More on the PyBlosxom del.icio.us plug-in, and introducing task_control.py, a a pseudo cron plug-in for PyBlosxom

Micah put my del.icio.us daily links tool to immediate use on his blog. He uncovered a bug in the character handling, which is now fixed in the posted amara_delicious.py file.

I usually invoke the script from cron, but Micah asked if there was an alternative. I've been meaning to hack up a poor man's cron for PyBlosxom and this gave me an additional push. The result is task_control.py.

A sort of poor man's cron for PyBlosxom, this plug-in allows you to specify tasks (as Python scripts) to be run only at certain intervals Each time the plug-in is invoked it checks a set of tasks and the last time they were run. It runs only those that have not been run within the specified interval.

To run the Amara del.icio.us daily links script once a day, you would add the following to your config file:

py["tasks"] = {"/usr/local/bin/amara_delicious.py": 24*60*60}
py["task_control_file"] = py['datadir'] + "/task_control.dat"

You could of course have multiple script/interval mappings in the "tasks" dict. The scripts are run with variables request and config set, so, for example, if running from task_control.py, you could change the line of amara_delicious.py from

BASEDIR = '/srv/www/ogbuji.net/copia/pyblosxom/datadir'

to

BASEDIR = config['datadir']

[Uche Ogbuji]

via Copia

del.icio.us daily links, using Amara

I added a new feature on Copia: Every day there will be an automated posting with mine and Chime's del.icio.us links from the previous day. You can see, in the previous Copia entry to this one, an example of the results.

What I think most cool is how easy it was to write, and how easy the resulting code is to understand. It's just 35 lines (including 7 lines of imports) , and in that it packs some useful features I haven't found in other such scripts, including:

  • Full Unicode safety (naturally, I wouldn't have it any other way)
  • support for multiple del.icio.us feeds, with tag by author
  • tagging the PyBlosxom entry with the aggregated/unique tags from the del.icio.us entries

Here's the code. The only external requirement is Amara:

import os
import sets
import time
import codecs
import itertools
from datetime import date, timedelta

from amara import binderytools

TAGBASE = 'http://del.icio.us/tag/'

#Change BASEDIR and FEEDS to customize
BASEDIR = '/srv/www/ogbuji.net/copia/pyblosxom/datadir'
FEEDS = ['http://del.icio.us/rss/uche', 'http://del.icio.us/rss/chimezie']

now = time.gmtime()
timestamp = unicode(time.strftime('%Y-%m-%dT%H:%M:%SZ', now))
targetdate = (date(*now[:3]) - timedelta(1)).isoformat()

#Using Amara.  Easy to just grab the RSS feed
docs = map(binderytools.bind_uri, FEEDS)
items = itertools.chain(*[ doc.RDF.item for doc in docs ])
current_items = [ item for item in items
                       if unicode(item.date).startswith(targetdate) ]
if current_items:
    # Create a Markdown page with the daily bookmarks.
    dir = '%s/%s' % (BASEDIR, targetdate)
    if not os.path.isdir(dir):
        os.makedirs(dir)
    f = codecs.open('%s/%s/del.icio.us.links.txt' % (BASEDIR, targetdate), 'w', 'utf-8')

    # Pyblosxom Title
    f.write(u'del.icio.us bookmarks for %s\n' % targetdate)

    tags = sets.Set()
    for item in current_items:
        tags.update([ li.resource[len(TAGBASE):] for li in item.topics.Bag.li ])
    f.write(u'#post_time %s\n'%(timestamp))
    f.write(u'<!--keywords: del.icio.us,%s -->\n'%(u','.join(tags)))

    for item in current_items:
        # List of links in Markdown.
        title = getattr(item, 'title', u'')
        href = getattr(item, 'link', u'')
        desc = getattr(item, 'description', u'')
        creator = getattr(item, 'creator', u'')
        f.write(u'* "[%s](%s)": %s *(from %s)*\n' % (title, href, desc, creator))

    f.close()

Or download amara_delicious.py.

You can see how easily you can process RSS 1.0 in Amara. I don't think actual RDF parsing/processing is a bit necessary. That extra layer is the first thing that decided me against Matt Biddulph's module, in addition to his use of libxml for XML processing, which is also used in Roberto De Almeida's.

[Uche Ogbuji]

via Copia

Nofollow-free Copia

I read this interesting post by Darryl VanDorp:

Hey people new to this whole blog thing. If you want me to actually leave a comment on that new blog of yours. Turn off that damn default nofollow in your shiny wordpress installation ... If I take the time to leave a comment give me some juice!

I quite agree. Yes, I know nofollow came about as a way to discourage Weblog spammers, but I don't believe it will ever work. It is just too easy to write robots that leave comment spam, and unless 100% of systems use nofollow, which is never likely, it will always be worth the spammers' while to keep trying on the off chance that they find sites without nofollow. Also, spammers are probably happy to have their messages on Weblogs, google juice or no.

Look, we don't want comment spam on Copia, period. It's not enough to deprive them of google juice. We work to keep the site spam-free. All comments go in an approval queue, and we have a lot of handy little tools to help eliminate comment spam, so we can batch approve the remaining goodness. I think this takes a lot less effort than the actual work of composing entries on the blog (and I'm a fast writer). What's more relevant, it's less effort than it takes for visitors to compose their comments. As a result of this effort there is no need to use nofollow on Copia, and we don't do so. I don't know whether commenters get significant juice from Copia, but what juice we do have to give we shall not stint our correspondents. (Not even if they are here to engage in the heresy of disputing our positions.)

I can't help thinking of the gardening analogy. If you want a nice garden, you have to roll up your sleeves and pull up your weeds. There is no quick fix to preventing weeds: the conditions that make your soil attractive for germination of desirable flowers also make it attractive to Uncle Darwin's rogues.

[Uche Ogbuji]

via Copia

XHTML tutorial pubbed

"XHTML, step-by-step"

Start working with Extensible Hypertext Markup Language. In this tutorial, author Uche Ogbuji shows you how to use XHTML in practical Web sites.

Get started working with Extensible Hypertext Markup Language. XHTML is a language based on HTML, but expressed in well-formed XML. But XHTML is much more than just regularizing tags and characters -- XHTML can alter the way you approach Web design. This tutorial gives step-by-step instruction for developers familiar with HTML who want to learn how to use XHTML in practical Web sites.

In this tutorial

  • Tutorial introduction
  • Anatomy of an XHTML Web page
  • Understand the ground rules
  • Replace common HTML idioms
  • Some practical considerations
  • Wrap up

[Uche Ogbuji]

via Copia

Kumite! Python vs. Javascript! script vs. XForms! declarativity vs. wizards!

Sylvain's question about Javascript-without-tears certainly kicked off a chain of interesting dialogue, for me. It also happens to dovetail with some interesting dialog elsewhere that started independently.

Kurt had an interesting take in his follow-up "Javascript and Python"

To be blunt, I would prefer to see a Python interpreter being standard issue in all browsers in addition to a Javascript one, as I believe the language to be much more expressive, more object oriented, more secure, easier to write, and in general better suited to contemporary needs. Unfortunately, browsers in particular are informed as much by business and political decisions, some, perhaps even most, based less upon the best technology for a problem and more based upon what will provide the best backward compatibility to insure that existing websites do not break or that download sizes remain under some critical value.

I like Python as well, but boy do I shudder to imagine the political conflagration that would ensue if Python were elevated in Web programming over peers such as Perl and Ruby. Javascript appears to have carved out a niche as the Switzerland of dynamic languages.

I would like to see diversity in Web scripting languages, so that others have choices besides Javascript, and as Kurt says, Mozilla does seem to be taking the lead in this. It's really cool to see the Mozilla Wiki entry "Breaking the grip JS has on the DOM". And you gotta love the opening lines:

We want to change the grip JS has on the DOM and on XUL. We will do this in 2 steps:

Ideally, the first step could be done without consideration for the second, in the assumption at the implementation should be truly language neutral.

But regardless of scripting language, there is the fact that XForms is out there looking to take on the very need for scripting in many of the browser use cases. Kurt says:

I am a major advocate of XML, precisely because it is much more difficult to isolate a language when a mapping is essentially an XSLT transformation away. For this reason, XForms is a very attractive model to be moving towards, certainly, and I look forward to the day that I can build XForms applications that work in all browsers equally. However, XForms is not even remotely widely implemented yet nor are there standard forms of declarative binding languages along the lines of XBL (sXBL is getting there, but so far there are perhaps two still very much test implementations in existence).

Chime predictably takes exception to this (in a response to Kurt). He's been a very involved early adopter of XForms, and I've been amazed to see how productive XForms has made him, so I take his point very seriously when he says:

I beg to differ. Though it may be true that it doesn't quite have the traction that Flash has at the moment, I wouldn't go as far as saying it isn't remotely, widely implemented. There are several very mature implementations...
[...] [re:eliminating the need for an imperative language] Once again, I have yet find myself in a situation where I needed javascript for UI-related capabilities that weren't covered by XForms event processing, instance binding, and other such [programmatic] components. The only time I did was when I had to encode XML content as base64 encoded binary (see: http://copia.ogbuji.net/blog/2005-08-19/BinaryEncodingAndXMLRPCs) and had little to do with XForms but more with the means of remote communication (SOAP). I'm not suggesting that frameworks such as XForms will eliminate the need for an imperative language, but rather that the need will be more like the reverse of your 80/20 proposition.

To be fair, I think Kurt was saying that script is only needed for the 20% case, and not the 80% case. He just felt that declarative solutions architecture "is hideously inappropriate for the remaining 20%." I do agree with that, but I think that people (not Kurt) tend to exaggerate this fact as an argument against declarative programming.

Chime wraps up:

I must give the disclaimer that I'm not suggesting XForms will be a user interface / browser-based application building [panacea], but rather that the potential it has to eliminate the unportable, architecturally unsound code that often drive DHTML web-sites with minimal complexity is very much overlooked.

Based on what I've seen of XForms, I tend to agree. I actually think Chime and Kurt are more in agreement than they sound, except maybe on the matter of the maturity of XForms engines.

At almost the same time another script versus XML exchange was going on in Mark Birbeck's blog, and in particular "On Adobe and XForms via Declarative Programming, Wizards and Aspects". That article is well worth reading in its entirety, but I'll highlight what he says about Wizards:

...in nearly all cases I find the 'wizard approach' is great to get you started, but then very quickly gets complex again. Anyone who uses Microsoft's Visual Studio, for example, will know that getting a C++ application up and running quickly, with support for multiple windows, toolbars, printing and file saving, is a snip. But then when you want to modify that code and move away from the wizard, you are very soon into normal C++ territory.

I tend to put this point even more strongly, as I do in my article "The worry about program wizards", but I'm always happy to have it reinforced.

In some ways I see wizards as the shoddy high street knock-off of declarative systems. Well designed declarative systems fully encapsulate modal aspects of the application in development, and they expose slots for ready extension in imperative implementation, if needed. Wizards, on the other hand, do focus on parameters in a way tantalizingly like declarative systems, but then ruin the entire plot by handing the programmer a hairball of imperative code that they have to hack at arbitrarily in order to complete the application. It's the difference between just plugging a device into a USB port to add capability to your PC, rather than having the motherboard thrust in your face so that you can find the right place to solder in the leads.

Chimezie Ogbuji

via Copia