A few of my friends have launched the beta of a Web site from which you can send informal notes to track debts (in cash, goods or deeds). Say a business colleague buys you lunch and you want to combine a thank-you note with an offer to reciprocate. IOU Note makes it easy to do so. Interesting idea, and I think they've executed it well, although I'm of course biased. I did send them a bunch of suggestions and I know they're furiously working to improve the service. It will be fun to watch their progress.
“Real Web 2.0: Bookmarks? Tagging? Delicious!”
Subtitle: Learn how real-world developers and users gain value from a classic Web 2.0 site
Synopsis: In this article, you'll learn how to work with del.icio.us, one of the classic Web 2.0 sites, using Web XML feeds and JSON, in Python and ECMAScript. When you think of Web 2.0 technology, you might think of the latest Ajax tricks, but that is just a small part of the picture. More fundamental concerns are open data, simple APIs, and features that encourage users to form social networks. These are also what make Web 2.0 a compelling problem for Web architects. This column will look more than skin deep at important real-world Web 2.0 sites and demonstrate how Web architects can incorporate the best from the Web into their own Web sites.
This is the first installment of a new column, Real Web 2.0. Of course "Web 2.0" is a hype term, and as has been argued to sheer tedium, it doesn't offer anything but the most incremental advances, but in keeping with my tendency of mildness towards buzzwords I think that anything that helps focus Web developers on collaborative features of Web sites is a good thing. And that's what this column is about. It's not about the Miss AJAX pageant, but rather about open data for users and developers. From the article:
The substance of an effective Web 2.0 site, and the points of interest for Web architects (as opposed to, say, Web designers), lie in how readily real developers and users can take advantage of open data features. From widgets that users can use to customize their bits of territory on a social site to mashups that developers can use to create offspring from Web 2.0 parents, there are ways to understand what leads to success for such sites, and how you can emulate such success in your own work. This column, Real Web 2.0, will cut through the hype to focus on the most valuable features of actual sites from the perspective of the Web architect. In this first installment, I'll begin with one of the ancestors of the genre, del.icio.us.
And I still don't want that that monkey-ass Web 1.0. Anyway, as usual, there's lots of code here. Python, Amara, ECMAScript, JSON, and more. That will be the recipe (mixing up the ingredients a bit each time) as I journey along the poster child sites for open data.
I found Ryan Tomayko's How I Explained REST to My Wife very clever and amusing, but one bit left me begging for more imagination.
Ryan: Actually, representations is one of these things that doesn't get used a lot. In most cases, a resource has only a single representation. But we're hoping that representations will be used more in the future because there's a bunch of new formats popping up all over the place.
Wife: Like what?
Ryan: Hmm. Well, there's this concept that people are calling Web Services. It means a lot of different things to a lot of different people but the basic concept is that machines could use the web just like people do.
Ay-ay-ay. I don't know any non-techie who would be satisfied with such an explanation. And I think the idea of alternative representations is one of the least geeky ideas Ryan is trying to communicate to his wife, so it seems a slam-dunk for a more interesting example.
My wife knows that if she goes to http://tvguide.com
on her computer
she gets a listing of the TV channels in a table on the Web page. What
if our TV suddenly gained Web capabilities? When you went to
http://tvguide.com
on the TV you should probably get the live feed for
the TV Guide preview channel, the one with a scrolling list on what's on
each channel of the boob tube. Of course, since it's a Web-stylie TV,
forget channel numbers. Each item in the list should be a link you can
just actuate with your remote. Click to jump to, say
http://www.vh1.com/channels/vh1_soul/channel.jhtml
when you saw that
the new John Legend video was playing. And you'd get the live channel
there, since it's a Web-stylie TV, rather than the Web page bragging
about the video. So there you go: live TV as an alternative
representation of the resource. And you know that wifey's going to
get any example that includes John Legend (OK, for some wives substitute
Justin Timberlake).
If your phone were a web-stylie phone, you could go to
http://tvguide.com
(better have that on speed dial) and have some
robot voice reciting the "what's on now" for the channels. And so the
audio message is an alternative representation of the resource. Oh, and
going a bit Jetsons with the whole thing, if the TV Guide site started
throwing up 404s on your web-stylie TV or phone, and you wanted to go to
the office to give them a piece of your mind in person, you could hail a
web-stylie taxi cab and tell it http://tvguide.com
and, you guessed
it, you'll be whisked to headquarters. But pay attention, now, the
representation of the resource is probably not the destination building,
in this case, but rather the street address or directions to that
location as retrieved from the URL http://tvguide.com
in the
web-stylie cab. Or something like that.
Now we're talking alternative representations. Rather than
remembering the channel number, hot-line phone number and physical
location for TV Guide, it's all available web-style from the URL
http://tvguide.com
. Of course, as long anyone needs to deal with
"aitch-tee-tee-pee-colon-slash-slashes" ain't no way this scenario is
playing out in real life. But then again neither is Ryan's example--Web
services. Oooooh!
I recently needed some code to quickly scrape the metadata from XHTML Web pages, so I kicked up the following code:
import amara XHTML1_NS = u'http://www.w3.org/1999/xhtml' PREFIXES = { u'xh': XHTML1_NS } def get_xhtml_metadata(source): md = {} for node in amara.pushbind(source, u'/xh:html/xh:head/*', prefixes=PREFIXES): if node.localName == u'title': md[u'title'] = unicode(node) if node.localName == u'link': linkinfo = dict([ (attr.name, unicode(attr)) for attr in node.xml_xpath(u'@*') ]) md.setdefault(u'links', []).append(linkinfo) elif node.xml_xpath(u'self::xh:meta[@name]'): md[node.name] = unicode(node.content) return md if __name__ == "__main__": import sys, pprint source = sys.argv[1] pprint.pprint(get_xhtml_metadata(source))
So, for example, scraping planet XML:
$ python xhtml-metadata.py http://planet.xmlhack.com/ {u'links': [{u'href': u'planet.css', u'media': u'screen', u'rel': u'stylesheet', u'title': u'Default', u'type': u'text/css'}, {u'href': u'/index.rdf', u'rel': u'alternate', u'title': u'RSS', u'type': u'application/rss+xml'}], u'title': u'Planet XMLhack: Aggregated weblogs from XML hackers and commentators'}
...simple - just don't use script in XSLT unless you really really really have to. Especially on the server side - XSLT script and ASP.NET should never meet. Use XSLT extension objects instead. As simple as it is.
—Oleg Tkachenko—"XSLT scripting (msxsl:script) in .NET - pure evil"""
Amen, f'real. When XSLT 1.1 first emerged the first thing that jumped out from the spec and punched me in the face was the embedded script facility. I made a fuss about it:
In general, I think the re-introduction of xml:script is execrable. XSLT 1.0 had perhaps the most elegant extension model possible, and xsl:script ruins this by destroying the opacity of extensions to XSLT processors. Language bindings may make sense in the realm of CORBA or DOM, where the actual expression of the program is done in the bound language, but XSLT is XSLT, and introducing the need for language bindings only reduces general interoperability while giving a small boost to interoperability between small axes of implementations.
I even worked with some like-minded folk to put together a petition. I have no idea whether that was instrumental in any way, but soon enough XSL 1.1 was dead and replaced with XSLT 2.0, which was built on XPath 2.0 and thus had other big problems, but at least no xsl:script.
xsl:script does live on in some implementations, and notably MSXML, as you can see from Oleg's post. You can also see some of the problems. XSLT and many more general-purpose languages make for uncomfortable fit and it can be hard for platform developers and users make things work smoothly and reliably. More important than memory leaks, script-in-xsl is a huge leak of XSLT's neat abstraction, and I think this makes XSLT much less effective. For one thing users are tempted to take XSLT to places where it does not fit. XSLT is not a general-purpose language. At the same time users tend not to learn good XSLT design and techniques because they scripting becomes an escape hatch. So an script user in XSLT generally cripples the language at the same time he is over-using it. An unfortunate combination indeed.
Oleg advocates XSLT extensions rather than scripting, which is correct, but I do want to mention that once you get used to writing extensions, it can be easy to slip into habits as bad as scripting. I've never been tempted to implement a Python scripting extension in 4XSLT, which would be easy, but that didn't stop me from going through a phase of overusing extensions. I think I've fully recovered, and the usage pattern I definitely recommend is to write the general-purpose code in a general-purpose language (Python, C#, whatever) and then call XSLT for the special and narrow purpose of transforming XML, usually for the last mile of presentation. It seems obvious, and yet it's a lesson that seems to require constant repetition.
Bruce D'Arcus commented on my entry "Creating JSON from XML using XSLT 1.0 + EXSLT", and following up on his reply put me on a bit of a journey. Enough so that the twists merit an entry of their own.
Bruce pointed out that libxslt2 does not support the str:replace
function. This recently came up in the EXSLT mailing list, but I'd forgotten. I went through this thread. Using Jim's suggestion for listing libxslt2 supported extensions (we should implement something like that in 4XSLT) I discovered that it doesn't support regex:replace
either. This is a serious pain, and I hope the libxslt guys can be persuaded to add implementations of these two very useful functions (and others I noticed missing).
That same thread led me to a workaround, though. EXSLT provides a bootstrap implementation of str:replace
, as it does for many functions. Since libxslt2 does support the EXSLT functions module, it's pretty easy to alter the EXSLT bootstrap implementation to take advantage of this, and I did so, creating an updated replace.xsl for processors that support the Functions module and exsl:node-set
. Therefore a version of the JSON converter that does work in libxslt2 (I checked) is:
<?xml version="1.0" encoding="UTF-8"?> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:func="http://exslt.org/functions" xmlns:str="http://exslt.org/strings" xmlns:js="http://muttmansion.com" extension-element-prefixes="func"> <xsl:import href="http://copia.ogbuji.net/files/code/replace.xsl"/> <xsl:output method="text"/> <func:function name="js:escape"> <xsl:param name="text"/> <func:result select='str:replace($text, "'", "\'")'/> </func:function> <xsl:template match="/"> var g_books = [ <xsl:apply-templates/> ]; </xsl:template> <xsl:template match="book"> <xsl:if test="position() > 1">,</xsl:if> { id: <xsl:value-of select="@id" />, name: '<xsl:value-of select="js:escape(title)"/>', first: '<xsl:value-of select="js:escape(author/first)"/>', last: '<xsl:value-of select="js:escape(author/last)"/>', publisher: '<xsl:value-of select="js:escape(publisher)"/>' } </xsl:template> </xsl:transform>
One more thing I wanted to mention is that there was actually a bug in 4XSLT's str:replace
implementation. I missed that fact because I had actually tested a variation of the posted code that uses regex:replace
. Just before I posted the entry I decided that the Regex module was overkill since the String module version would do the trick just fine. I just neglected to test that final version. I have since fixed the bug in 4Suite CVS, and you can now use either str:replace
or regex:replace
just fine. Just for completeness, the following is a version of the code using the latter function:
<?xml version="1.0" encoding="UTF-8"?> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:func="http://exslt.org/functions" xmlns:regex="http://exslt.org/regular-expressions" xmlns:js="http://muttmansion.com" extension-element-prefixes="func"> <xsl:output method="text"/> <func:function name="js:escape"> <xsl:param name="text"/> <func:result select='regex:replace($text, "'", "g", "\'")'/> </func:function> <xsl:template match="/"> var g_books = [ <xsl:apply-templates/> ]; </xsl:template> <xsl:template match="book"> <xsl:if test="position() > 1">,</xsl:if> { id: <xsl:value-of select="@id" />, name: '<xsl:value-of select="js:escape(title)"/>', first: '<xsl:value-of select="js:escape(author/first)"/>', last: '<xsl:value-of select="js:escape(author/last)"/>', publisher: '<xsl:value-of select="js:escape(publisher)"/>' } </xsl:template> </xsl:transform>
The article “Generate JSON from XML to use with Ajax”, by Jack D Herrington, is a useful guide to managing data in XML on the server side, and yet using JSON for AJAX transport for better performance, and other reasons. The main problem with the article is that it uses XSLT 2.0. Like most cases I've seen where people are using XSLT 2.0, there is no reason why XSLT 1.0 plus EXSLT doesn't do the trick just fine. One practical reason to prefer the EXSLT approach is that you get the support of many more XSLT processors than Saxon.
Anyway, it took me all of 10 minutes to cook up an EXSLT version of the code in the article. The following is listing 3, but the same technique works for all the XSLT examples.
<?xml version="1.0" encoding="UTF-8"?> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:func="http://exslt.org/functions" xmlns:str="http://exslt.org/strings" xmlns:js="http://muttmansion.com" extension-element-prefixes="func"> <xsl:output method="text" /> <func:function name="js:escape"> <xsl:param name="text"/> <func:result select='str:replace($text, "'", "\'")'/> </func:function> <xsl:template match="/"> var g_books = [ <xsl:apply-templates/> ]; </xsl:template> <xsl:template match="book"> <xsl:if test="position() > 1">,</xsl:if> { id: <xsl:value-of select="@id" />, name: '<xsl:value-of select="js:escape(title)"/>', first: '<xsl:value-of select="js:escape(author/first)"/>', last: '<xsl:value-of select="js:escape(author/last)"/>', publisher: '<xsl:value-of select="js:escape(publisher)"/>' } </xsl:template> </xsl:transform>
I also converted the code to a cleaner, push style from what's in the article.
I updated my old overview page of XSLT processors APIs in Python with an MSXML 4.0 example I found on comp.lang.python today.
For the past few months in my day job (consulting for Sun Microsystems) I've been working on what you can call a really big (and hairy) enterprise mashup. I'm in charge of the kit that actually does the mashing-up. It's an XML pipeline that drives merging, processing and correction of data streams. There are a lot of very intricately intersecting business rules and without the ability to make very quick ad-hoc reports from arbitrary data streams, there is no way we could get it all sorted out given our aggressive deadlines.
This project benefits greatly from a side task I had sitting on my hard drive, and that I've since polished and worked into the Amara 1.1.9 release. It's a command-line tool called trimxml which is basically a reporting tool for XML. You just point it at some XML data source and give it an XSLT pattern for the bits of interest and optionally some XPath to tune the report and the display. It's designed to only read as much of the file as needed, which helps with performance. In the project I discussed above the XML files of interest range from 3-100MB.
Just to provide a taste using Ovidiu Predescu's old Docbook example, you could get the title as follows:
trimxml http://xslt-process.sourceforge.net/docbook-example.xml book/bookinfo/title
Since you know there's just one title you care about you can make sure trimxml stops looking after it finds it
trimxml -c 1 http://xslt-process.sourceforge.net/docbook-example.xml book/bookinfo/title
-c
is a count of results and you can set it to other than 1, of course.
You can get all titles in the document, regardless of location:
trimxml http://xslt-process.sourceforge.net/docbook-example.xml title
Or just the titles that contain the string "DocBook":
trimxml http://xslt-process.sourceforge.net/docbook-example.xml title "contains(., 'DocBook')"
The second argument is an filtering XPath expression. Only nodes that satisfy that condition are reported.
By default each entire matching node is reported, so you get an output
such as "". You can specify
something different to display for each match using the -d
flag. For
example, to just print the first 10 characters of each title, and not
the title
tags themselves, use:
trimxml -d "substring(., 0, 10)" http://xslt-process.sourceforge.net/docbook-example.xml title
There are other options and features, and of course you can use the tool on local files as well as Web-based files.
In another useful development in the 4Suite/Amara world, we now have a Wiki.
With 4Suite, Amara, WSGI.xml, Bright Content and the day job I have no idea when I'll be able to get back to working on Akara, so I finally set up some Wikis for 4Suite.org. The main starting point is:
Some other useful starting points are
http://notes.4suite.org/AmaraXmlToolkit
http://notes.4suite.org/WsgiXmlAs a bit of an extra anti-vandalism measure I have set the above 3 entry pages for editing only by 4Suite developers. [...] Of course you can edit and add other pages in usual Wiki fashion. You might want to start with http://notes.4suite.org/4SuiteFaq which is a collaborative addendum to the official FAQ.
Earlier this year I posted an off-hand entry about a scam call I received. I guess it soon got a plum Google spot for the query "Government grants scam" and it's been getting almost one comment a day ever since. Today I came across a comment whose author was requesting permission to use the posting and sibling comments in a book.
I have written a book on Winning Grants, titled "The Grant Authority," which includes a chapter on "Avoiding Grant Scams." It is in final stages of being (self)- published. I want to include comments and complaints about government grant scams on this Copia blog. I think the book's readers will learn alot from them.
How can I get permission to include written comments on this blog site in this book?
I'd never really thought about such a matter before. I e-mailed the correspondent permission, based on Copia's Creative Commons Attribution licensing, but considering he seemed especially interested in the comments, I started wondering. I don't have some warning on the comment form that submitted comments become copyright Copia's owners and all that, as I've seen on some sites. If I really start to think about things I also realize that our moderating comments (strictly to eliminate spam) might leave us liable for what others say. It all makes me wonder whether someone has come up with a helpful (and concise) guide to IP and tort concerns for Webloggers. Of course, I imagine such a read might leave my hair standing on end so starkly that I'd never venture near the 21st century diarist's pen again.
BTW, for a fun battle scene viewed in the cold, claret light of pedantry, inquire as to the correct plural of "conundrum".