Mark Nottingham has written an intriguing piece "XSLT for the Rest of
the Web". It's
drummed up some interest, some of which has even leaked into the 4Suite
mailing
list thanks to the energetic Sylvain Hellegouarch. Mark says:
I’ve raved before about how useful the XSLT document() function
is, once you get
used to it. However, the stars have to be aligned just so to use it; the
Web site can’t use cookies for anything important, and the content
you’re interested in has to be available in well-formed XML.
He goes on to present a set of extension functions he's created for
libxslt. They are basically smarter document()
functions that can do
fancy Web things, including HTTP POST, and using HTML Tidy to grab tag
soup HTML as XHTML.
As I read through it, I must say my strong impression was "been there,
done that, probably never looking back". Certainly no diss of Mark
intended there. He's one of the sharper hackers I know. I guess we're
just at different points in our thinking of where XSLT fits into the
Web-savvy apps toolkit.
First of all, I think the Web has more dragons than you could easily
tame with even the mightiest XSLT extension hackery. I think you need
general-purpose programming language to wrangle "Web 2.0" without
drowning in tears.
More importantly, if I ever needed XSLT's document()
function to
process anything more than it's spec'ed to, I would consider that a
pretty strong indicator that it's time to rethink part of my application
architecture.
You see, I used to be a devotee of XSLT all over the place, and XSLT extensions
for just about every limitation of the language. Heck, I wrote a whole
framework of such things into 4Suite Repository. I've since reformed.
These days I take the pipeline approach to such processing, and I keep
XSLT firmly in the narrow niche for which it was designed. I have more
on this evolution of thinking in "Lifting XSLT into application domain
with extension
functions?".
But back to Mark's idea. I actually implemented 4Suite XSLT extensions
to use HTTP POST and to tidy tag soup HTML into XHTML, but I wouldn't
dream of using these extensions any more. Nowadays, I use Python to
gather and prepare data into a model representation that I then hand
over to XSLT for pure presentation processing. Complex logical tasks
such as accessing Web data beyond trivially fetched XML are matters for
the model layer, and not the presentation logic. For example, if I need
to tidy something, I tidy it at the Python level and put what I need of
the resulting XHTML into the model XML before passing it to XSLT. I use
Amara XML Toolkit with John Cowan's
TagSoup for my
tidying needs. I prefer TagSoup rather than tidy because I find it's
faster and more robust.
Even if you use the libxml2 family of tools, I still think it's better
to use libxml, and perhaps the libxml HTML
parser to do the model
processing and hand over resulting XML to libxslt in a separate step.
XSLT is pretty cool, but these days rather than reproduce all of
Python's dozens of Web processing libraries therein, I plump for Python
itself.
[Uche Ogbuji]