LazyWeb Ho! Detecting whether a browser supports XML+XSLT

I'm wrapping up applyxslt, a WSGI middleware module to serve separate XML and XSLT to browser that can handle it (using the stylesheet PI. For browsers that can't it would intercept the response and perform the XSLT transform for the browser, sending on the result. BTW, for more on WSGI Middleware, see “Mix and match Web components with Python WSGI”.

My biggest uncertainty is the best way to determine whether a browser can handle XML+XSLT. I doubt anything in the Accept header would help, so I'm left having to list all User-Agent strings for browsers that I know can handle this (basically Firefox, Opera, and recent Mozilla, Safari and MSIE).

So far I'm deriving my User-Agent list from several sources, including

Wikipedia (the daddy of all User-Agent lists I've seen)
"Masquerading Your Browser", by Eric Giguere (the "Common User-Agent Values" section) and
"Understanding user-agent strings"

Really the Wikipedia list is all I needed, but I found and worked with the other ones first.

So based on that here is the list of User-Agent string patterns I am treating as evidence the browser does understand XML+XSLT (Python/Perl regex):

.*MSIE 5.5.*
.*MSIE 6.0.*
.*MSIE 7.0.*
.*Gecko/2005.*
.*Gecko/2006.*
.*Opera/9.*
.*AppleWebKit/31.*
.*AppleWebKit/4.*

Note: this hoovers up a few browser versions I'm not entirely sure of: Minimo, AOL Explorer and OmniWeb. I'm fine with some such uncertainty, but if anyone has any suggestions for further refinement of this list, let me know. I'd like to keep it updated.

[Uche Ogbuji]

via Copia

4 responses

Uche,

It's even easier than this... no browser sniffing is required.

I'll put together a sample and send it to you in private email a bit later this morning. The project I am currently working on is something I will send a link to you as well. Though not directly related, I think you'll like what you see none-the-less :D

— M. David Peterson

As per my email, I am copying this over in hopes of generating some good community conversation in regards to the cheapest, most efficient way to handle all of this such that the least amount of server-side work can be implemented such that fewer resources would be enabled to serve a greater number of requests.

From the email,

Quick description: Using an XHTML-based XML document, and extending from the code-base from this <http://www.oreillynet.com/xml/blog/2006/07/no_sign_of_document_function_b.html > post (the code of which I originally published to my personal blog last December), if client side XSLT is present, the PI will catch-it, and begin processing. If its not, it will (obviously) continue parsing the document. Dependent upon the browser, you can do one of two things at this stage: meta-refresh to a static, pre-rendered web page, or use Javascript to write out the page using either inline Javascript, or an imported Javascript file.

A meta-refresh adds an extra GET to the mix, but one extra GET for 1 out of every 20-30 visitors is not a huge thing.

The only part of this that hasn't been thouroughly tested is the content type. Some of the older browsers will ignore the content type in the header as long as they understand what to do with the content. However, this obviously could present a fairly major flaw in using this method as text/xml (which is the content-type that you have to use for IE, and while Fx/Moz will also except application/xml, to ensure compatibility across platforms, as you probably already know, they will process a PI of document with a content-type text/xml as well. This is true of Safari and Opera as well) will immediatelly throw some browsers for a loop, and they won't render anything. Of course testing this proves to be a real pain in the a$$, as getting all versions of all browsers tested in-house is not exactly an easy thing to do, though not impossible either. > http://browsers.evolt.org/

It seems that the ultimate solution would be one that does the lightest amount of server side processing as possible. If the same base XHTML document is sent to all browsers, then setting the content-type to text/html for those who's user-agent string doesn't contain one of the known XSLT browsers would be cheaper than transforming the document in real time. Of course, in this scenario it might be easier to redirect to the proper pre-transformed HTML file as part of the header.

---

Not in the email, but something that just occured to me: There is one other gotcha that the above method hasn't been tested (I have no idea if the old MSXML 2.0 processor looks for PI, but I assume it does, and therefore this could present a problem.) See this <http://www.oreillynet.com/xml/blog/2006/06/opera_90_final_released.html#comment-49383> comment from a while back for more detail.

---

Comments?

It just occurred to me that I might be mistaken on the content-type. It's been a while since I last played with that part of the code-base the above solution extends from, though I do know there are some peculiarities with what content-type will work with IE and XML PI's those that won't. I'll verify and report back.

I think we're talking about completely different things. In my case I'm writing server-side middleware that is independent of any trickery on the client side. The idea is to use the basic header declarations to decide whether or not it should take care of XSLT processing for the client.

You are talking (I think) about client-side logic for cross-platform XSLT application, which is very different. A robust server-side solution should be independent of this. I suspect that someone might be able to combine both approaches in a webapp, but that's at the integration phase, and is not relevant to the implementation of the separate client or server capabilities.

If I'm misunderstanding you, please let me know.

— Uche