This unnecessary screw-up comes from the Mozilla project, of all places.
Mozilla's XML support is improving all the time, as I discuss in my
article on XML in
Firefox, but the developer resources seem to lag the implementation, and this often leads to needless confusion. One that I ran into recently could perhaps be given the summary: "not everything in the Mozilla FAQ is accurate". From the Mozilla FAQ:
In older versions of Mozilla as well as in old Mozilla-based products,
there is no pseudo-DTD catalog and the use of entities (other than the
five pre-defined ones) leads to an XML parsing error. There are also
other XHTML user agents that do not support entities (other than the
five pre-defined ones). Since non-validating XML processors are not
required to support entities (other than the five pre-defined ones),
the use of entities (other than the five pre-defined ones) is inherently
unsafe in XML documents intended for the Web. The best practice is to
use straight UTF-8 instead of entities. (Numeric character references
are safe, too.)
See the part in bold. Someone either didn't read the spec, or is
intentionally throwing up a spec distortion field. The XML 1.0 spec
provides a table in section 4.4: "XML Processor Treatment of Entities
and References" which tells you
how parsers are allowed to treat entities, and it flatly contradicts the
bogus Mozilla FAQ statement above.
The main reason for the "WTF" is the fact that the Mozilla
implementation actually gets it right. That it should. It's based
on Expat. AFAIK Expat has always got
this right (I've been using Expat about as long as the Mozilla project
has been), so I'm not sure what inspired the above error. Mozilla
should be touting its correct and useful behavior, rather than giving
bogus excuses to its competitors.
This came up last week in the IBM developerWorks forum where a user was
having problems with internal entities in
XHTML. It turns out that he was missing an XHTML namespace (and based on my experimentation was probably serving up XHTML as text/html
which is generally a no-no). It should have been a clear case of "Mozilla gets this right, and can we please get other browsers to fix their bugs?" but he found that FAQ entry and we both ended up victims of the red herring for a little while.
I didn't realize that the Mozilla implementation was right until I wrote
a careful test case in preparation for my next Firefox/XML article. The
following CherryPy code is a test server
set-up for browser rendering of XHTML.
import cherrypy
INTENTITYXHTML = '''\
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml/DTD/xhtml1-strict.dtd" [
<!ENTITY internal "This is text placed as internal entity">
]>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
<head>
<title>Using Entity in xhtml</title>
</head>
<body>
<p>This is text placed inline</p>
<p>&internal;</p>
<abbr title="&internal;">Titpaie</abbr>
</body>
</html>
'''
class root:
@cherrypy.expose
def text_html(self):
cherrypy.response.headerMap['Content-Type'] = "text/html; charset=utf-8"
return INTENTITYXHTML
@cherrypy.expose
def text_xml(self):
cherrypy.response.headerMap['Content-Type'] = "text/xml; charset=utf-8"
return INTENTITYXHTML
@cherrypy.expose
def app_xml(self):
cherrypy.response.headerMap['Content-Type'] = "application/xml; charset=utf-8"
return INTENTITYXHTML
@cherrypy.expose
def app_xhtml(self):
cherrypy.response.headerMap['Content-Type'] = "application/xhtml+xml; charset=utf-8"
return INTENTITYXHTML
cherrypy.root = root()
cherrypy.config.update({'server.socketPort': 9999})
cherrypy.config.update({'logDebugInfoFilter.on': False})
cherrypy.server.start()
As an example, this code serves up a content type text/html
when
accessed through a URL such as http://localhost:9999/text_html
. You
should be able to work out the other URL to content type mappings from
the code, even if you're not familiar with CherryPy or Python.
Firefox 1.0.7 handles all this very nicely. For text_xml
, app_xml
and app_xhtml
you get just the XHTML rendering you'd expect, including
the correct text in the attribute value with the mouse hovered over
"Titpaie".
IE6 (Windows) and Safari 1.3.1 (OS X Panther) both have a lot of trouble
with this.
IE6 in the text_xml
and app_xml
cases complains that it can't find
http://www.w3.org/TR/xhtml/DTD/xhtml1-strict.dtd
. In the app_xhtml
case it treats the page as a download, which is reasonable, if not
convenient.
Safari in the text_xml
, app_xml
and app_xhtml
cases complains that
the entity internal
is undefined (??!!).
IE6, Safari and Mozilla in the text_html
case all show the same output
(looking, as it should, like busted HTML). That's just what you'd
expect for a tag soup mode, and emphasizes hat you should leave
text_html
out of your XHTML vocabulary.
All this confusion and implementation difference illustrates the
difficulty for folks trying to deploy XHTML, and why it's probably not
yet realistic to deploy XHTML without some sort of browser sniffing
(perhaps by checking the Accept
header, though it's well known that
browsers are sometimes dishonest with this header). I understand that
the MSIE7 team hopes to address such problems. I don't know whether to
expect the same from Safari. My focus in research and experimentation
has been on Firefox.
One final note is that Mozilla does not support external parsed
entities. This is legal (and some security experts claim even prudent).
The relevant part of the XML 1.0 spec is section
4.4.3:
When an XML processor recognizes a reference to a parsed entity, in
order to validate the document, the processor MUST include its
replacement text. If the entity is external, and the processor is not
attempting to validate the XML document, the processor MAY, but need
not, include the entity's replacement text. If a non-validating
processor does not include the replacement text, it MUST inform the
application that it recognized, but did not read, the entity.
I would love Mozilla to adopt the idea in the next spec paragraph:
Browsers, for example, when encountering an external parsed entity
reference, might choose to provide a visual indication of the entity's
presence and retrieve it for display only on demand.
That would be very useful. I wonder whether it would be possible
through a Firefox plug-in (probably not: I guess it would require very
tight Expat integration for plug-ins).
[Uche Ogbuji]