This unnecessary screw-up comes from the Mozilla project, of all places.
Mozilla's XML support is improving all the time, as I discuss in my
article on XML in
Firefox, but the developer resources seem to lag the implementation, and this often leads to needless confusion. One that I ran into recently could perhaps be given the summary: "not everything in the Mozilla FAQ is accurate". From the Mozilla FAQ:
In older versions of Mozilla as well as in old Mozilla-based products,
there is no pseudo-DTD catalog and the use of entities (other than the
five pre-defined ones) leads to an XML parsing error. There are also
other XHTML user agents that do not support entities (other than the
five pre-defined ones). Since non-validating XML processors are not
required to support entities (other than the five pre-defined ones),
the use of entities (other than the five pre-defined ones) is inherently
unsafe in XML documents intended for the Web. The best practice is to
use straight UTF-8 instead of entities. (Numeric character references
are safe, too.)
See the part in bold. Someone either didn't read the spec, or is
intentionally throwing up a spec distortion field. The XML 1.0 spec
provides a table in section 4.4: "XML Processor Treatment of Entities
and References" which tells you
how parsers are allowed to treat entities, and it flatly contradicts the
bogus Mozilla FAQ statement above.
The main reason for the "WTF" is the fact that the Mozilla
implementation actually gets it right. That it should. It's based
on Expat. AFAIK Expat has always got
this right (I've been using Expat about as long as the Mozilla project
has been), so I'm not sure what inspired the above error. Mozilla
should be touting its correct and useful behavior, rather than giving
bogus excuses to its competitors.
This came up last week in the IBM developerWorks forum where a user was
having problems with internal entities in
XHTML. It turns out that he was missing an XHTML namespace (and based on my experimentation was probably serving up XHTML as
text/html which is generally a no-no). It should have been a clear case of "Mozilla gets this right, and can we please get other browsers to fix their bugs?" but he found that FAQ entry and we both ended up victims of the red herring for a little while.
I didn't realize that the Mozilla implementation was right until I wrote
a careful test case in preparation for my next Firefox/XML article. The
following CherryPy code is a test server
set-up for browser rendering of XHTML.
INTENTITYXHTML = '''\
<?xml version="1.0" encoding="utf-8"?>
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<!ENTITY internal "This is text placed as internal entity">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
<title>Using Entity in xhtml</title>
<p>This is text placed inline</p>
cherrypy.response.headerMap['Content-Type'] = "text/html; charset=utf-8"
cherrypy.response.headerMap['Content-Type'] = "text/xml; charset=utf-8"
cherrypy.response.headerMap['Content-Type'] = "application/xml; charset=utf-8"
cherrypy.response.headerMap['Content-Type'] = "application/xhtml+xml; charset=utf-8"
cherrypy.root = root()
As an example, this code serves up a content type
accessed through a URL such as
should be able to work out the other URL to content type mappings from
the code, even if you're not familiar with CherryPy or Python.
Firefox 1.0.7 handles all this very nicely. For
app_xhtml you get just the XHTML rendering you'd expect, including
the correct text in the attribute value with the mouse hovered over
IE6 (Windows) and Safari 1.3.1 (OS X Panther) both have a lot of trouble
IE6 in the
app_xml cases complains that it can't find
http://www.w3.org/TR/xhtml/DTD/xhtml1-strict.dtd. In the
case it treats the page as a download, which is reasonable, if not
Safari in the
app_xhtml cases complains that
internal is undefined (??!!).
IE6, Safari and Mozilla in the
text_html case all show the same output
(looking, as it should, like busted HTML). That's just what you'd
expect for a tag soup mode, and emphasizes hat you should leave
text_html out of your XHTML vocabulary.
All this confusion and implementation difference illustrates the
difficulty for folks trying to deploy XHTML, and why it's probably not
yet realistic to deploy XHTML without some sort of browser sniffing
(perhaps by checking the
Accept header, though it's well known that
browsers are sometimes dishonest with this header). I understand that
the MSIE7 team hopes to address such problems. I don't know whether to
expect the same from Safari. My focus in research and experimentation
has been on Firefox.
One final note is that Mozilla does not support external parsed
entities. This is legal (and some security experts claim even prudent).
The relevant part of the XML 1.0 spec is section
When an XML processor recognizes a reference to a parsed entity, in
order to validate the document, the processor MUST include its
replacement text. If the entity is external, and the processor is not
attempting to validate the XML document, the processor MAY, but need
not, include the entity's replacement text. If a non-validating
processor does not include the replacement text, it MUST inform the
application that it recognized, but did not read, the entity.
I would love Mozilla to adopt the idea in the next spec paragraph:
Browsers, for example, when encountering an external parsed entity
reference, might choose to provide a visual indication of the entity's
presence and retrieve it for display only on demand.
That would be very useful. I wonder whether it would be possible
through a Firefox plug-in (probably not: I guess it would require very
tight Expat integration for plug-ins).