Yes! Markdown needs attributes, not footnotes

I tried to post this to the Markdown mailing list, but they have a policy of rejecting (not just moderating) non-member mailings. Seems a bit harsh to me, considering that a combination of moderation and spam filtering is not really that much of a burden (as I know from copious experience), but whatevah. I really don't have room to join yet another mailing list.

In response to a thread that started out discussing internal links, and then falling back to footnotes, Arisotle Pagaltzis made this brilliant suggestion:

Lately I’ve increasingly been thinking that it would be nice, perfectly sufficient for footnotes, but also useful for many other uses, to have a {@attr=value} syntax (or something similar) which attaches the given attribute to its surrounding tag. So this example attribute {@id=foo} would be ganked from where it is and dattached to the tag for this paragraph: <p id="foo">; whereas *this {@class=shout}* would attach to the emphasis tag: <em class="shout">

I want to say to the Markdown folks: please, please listen to Aristotle's suggestion. It is exactly what Markdown needs now. It would add much of the power and flexibility that is currently lacking when I consider Markdown against reStructuredText, and it would do so without the welter of punctuation in ReST. A complete win. He also gives an example of how this would enable useful image elements for Markdown:

![{@width=200} Or image metadata.](bar.png) {@align=center}

Currently it is not possible to use Markdown syntax to produce images that meet general usability guidelines. You have to embed a full img tag. Aristotle's suggestion elegantly solves that problem as well.

I think that this is far more important than such minutiae as footnotes, which are all the current rage on the Markdown list. This is not to say that the need for footnotes is minute, but rather that the need for footnotes can easily be seen as one aspect of a more general problem in expression. If Markdown had a way of asserting ID attributes, there would be flexible support for the many different ways people want to express footnotes, margin notes, biblio refs, etc. But by solving footnotes as a complete problem in themselves, Markdown is just heading down the path to punctuation madness.

I suggest starting with flexible attribute syntax such as Aristotle suggests, using this for footnotes as far as it goes, and then, if usage experience shows that something yet more convenient is needed, reconsider specialized footnote syntax.

I also think it's worth explicitly supporting the attribute syntax at the top of a markdown document as a way to express overall document metadata, which is another gap I see in Markdown.

In his concurrence Jelks Cabaniss suggests:

The only thing I might add are "shortcuts" when the attributes are ID and CLASS: {#foo} as an alias of {@id=foo}, and {.bar} for {@class=bar}.

But I think these should be left off for the moment, and maybe added later if they seem especially useful. Again I'm just wary of increasing the amount of punctuation one has to remember. Jelks does wrap up with what I consider the knockout point:

BTW, take a look at http://daringfireball.net/projects/markdown/syntax.text. It's sprinkled full of "real HTML" precisely because of this deficiency. (There aren't any CLASS attributes in that particular document, but even there, if there were Markdownish equivalents of ID and TITLE attributes, that would all go away.)

I'm pretty sure I'll implement this into my Python Markdown tools even if it doesn't become part of the "official" Markdown spec. I need it too badly (I really don't want to have to move to ReST just yet).

[Uche Ogbuji]

via Copia

Domlette and Saxlette: huge performance boosts for 4Suite (and derived code)

For a while 4Suite has had an 80/20 DOM implementation completely in C: Domlette (formerly named cDomlette). Jeremy has been making a lot of performance tweaks to the C code, and current CVS is already 3-4 times faster than Domlette in 4Suite 1.0a4.

In addition, Jeremy stealthily introduced a new feature to 4Suite, Saxlette. Saxlette uses the same Expat C code Domlette uses, but exposes it as SAX. So we get SAX implemented completely in C. It follows the Python/SAX API normally, so for example the following code uses Saxlette to count the elements:

from xml import sax

furi = "file:ot.xml"

class element_counter(sax.ContentHandler):
    def startDocument(self):
        self.ecount = 0

    def startElementNS(self, name, qname, attribs):
        self.ecount += 1

parser = sax.make_parser(['Ft.Xml.Sax'])
handler = element_counter()
parser.setContentHandler(handler)
parser.parse(furi)
print "Elements counted:", handler.ecount

If you don't care about PySax compatibility, you can use the more specialized API, which involves the following lines in place of the equivalents above:

from Ft.Xml import Sax
...
class element_counter():
....
parser = Sax.CreateParser()

The code changes needed from the first listing above to regular PySax are minimal. As Jeremy puts it:

Unlike the distributed PySax drivers, Saxlette follows the SAX2 spec and defaults feature_namespaces to True and feature_namespace_prefixes to False both of which are not allowed to be changed (which is exactly what SAX2 says is required). Python/SAX defaults to SAX1 behavior and Saxlette defaults to SAX2 behavior.

The following is a PySax example:

from xml import sax

furi = "file:ot.xml"

#Handler has to derive from sax.ContentHandler,'
#or, in practice, implement all interfaces
class element_counter(sax.ContentHandler):
    def startDocument(self):
        self.ecount = 0

    #SAX1 startElement by default, rather than SAX2 startElementNS
    def startElement(self, name, attribs):
        self.ecount += 1

parser = sax.make_parser()
handler = element_counter()
parser.setContentHandler(handler)
parser.parse(furi)
print "Elements counted:", handler.ecount

The speed difference is huge. Jeremy did some testing with timeit.py (using more involved test code than the above), and in those limited tests Saxlette showed up as fast as, and in some cases a bit faster than cElementTree and libxml/Python (much, much faster than xml.sax in all cases). Interestingly, Domlette is now within 30%-40% of Saxlette in raw speed, which is impressive considering that it is building a fully functional DOM. As I've said in the past, I'm done with the silly benchmarks game, so someone else will have to pursue matters to further detail if they really can't do without their hot dog eating contests.

In another exciting development Saxlette has gained a generator mode using Expat's suspend/resume capability. This means you can have a Saxlette handler yield results from the SAX callbacks. It will allow me, for example, to have Amara's pushdom and pushbind work without threads, eliminating a huge drag on their performance (context switching is basically punishment). I'm working this capability into the code in the Amara 1.2 branch. So far the effects are dramatic.

[Uche Ogbuji]

via Copia

Xampl, re: "XML data bindings, static languages, dynamic languages"

In response to XML data bindings, static languages, dynamic languages Bob Hutchison posted some thoughts. As I used Amara as the kernel of my demonstrations, Bob used his project xampl as the kernel of his. He introduces xampl in another entry which was inspired by my own article on EaseXML.

Xampl is a an XML data binding. As Bob writes:

Secondly, there are versions of xampl for Java and Common Lisp. I’ve got an old (summer 2002) version for Ruby that needs updating (I wrote the xampl-pp pull parser to support this experiment).

Bob says that Xampl also deals with things that Elliotte Harold mentions as usual scourges for Java data bindings: mixed content, repeated elements, omitted elements, and element order. Of course these things should be food and drink to any XML tool, and I'm glad folks are finally plugging such gaping holes. Eric van der Vlist is also in the game with TreeBind, and it seems some Java tools try to wriggle out of the pinch by using XQuery.

Based on Bob's snippets, Xampl looks handy. Rather verbose, but no more so than Java pretty much requires. One thing that strikes me in Bob's examples is that Xampl appears to require and create a bogon namespace (http://www.xampl.com/labelsExample). It seems maybe it has something to do with Java packaging or something, but regardless of the role of this fake namespace, the XML represented by Xampl in Bob's snippets is not the same as the XML in the original source examples. An unprefixed element in a namespace is of course not the same thing as an element in the null namespace. I would not accept any tool that involves such a mix-up. It's quite possible that Xampl does not do so, and I'm just misunderstanding Bob's examples.

Bob provides Xampl code to match the EaseXML snippets in my article. Similarly to how EaseXML requires Python framework code, Xampl requires XML framework code. Since "XML situps" have been on the wires lately, they come to mind for a moment, but hey, if you're already processing XML with Xampl, I suppose you might not flinch at one more XML. I will point out that Amara does not require any framework code whatsoever besides the XML itself, not even an XML schema. It effectively provides dynamic code generation.

Xampl turns XML constructs into Java getters, e.g. html.getHead(). Amara uses the Python convention of properties rather than getters and setters, so you have html.head, and you can even assign to this property in order to mutate the XML. Xampl looks neat. The things that turn me off are largely things that are pretty much inevitable in Java, not least the very large amount of code generated by the binding. It supports XPath, as Amara does, and provides a "rhino" option to expose XML objects through Javascript, which offers you a bit more of the flexibility of Python (I don't know how much overhead to expect from Javascript through Java through XML, but it's a question I'd be quick to ask as a user).

It's good to have projects such as Xampl and Treebind and Nux. I'd rather use Python tools such as Amara, Gnosis and GenerateDS, but Java has the visibility and it's good for people to be aware that XML does not necessarily require greater imprisonment of expression than what comes with the application language. You don't need to accept crazy idioms and stifling limitations in matters as fundamental as mixed content and element ordering. XML and sanity can coexist.

[Uche Ogbuji]

via Copia

Python + XML = wary coexistence

There has been quite a bit of discussion triggered by my article "Python and XML: Should Python and XML Coexist?". This sort of thing always surprises me more than it should. I like to post code-heavy articles and leave the philosophy to the occasional entry, or to this very Weblog, but it seems that people respond more vocally to philosophy than to code. Perhaps I'll discuss with Kendall, my editor, what this suggests in terms of future directions for my Python/XML column.

Anyway, first I point to PJE's response. I used quotes from his Weblog as jumping-off points for my article.

Uche Ogbuji liberally quotes from and analyzes two of my XML-v.-Python rants, and actually gets it completely right. Since at least one of those rants has been cited as meaning I think XML is the spawn of Satan, I'm glad Uche read closely enough to get the context and nuance, without projecting things into it that I didn't say. Kudos!

I don't claim to know whom PJE speaks of when he refers to other commentary on his rant, but Martijn Faassen indicated his own response. I do think that Martijn missed some of PJE's intended nuance, but to be fair, it took me more than one reading to catch that nuance. I think that PJE could have saved himself a lot of misunderstanding, but hell, I've had my turn at thickly nuanced rant myself, so I see both sides. Looking more broadly at the landscape, Martijn puts succinctly what I've said in the past.

This disdain for XML technologies is very common among Python programmers.

But maybe that means something greater than petty rivalry. Mike Champion brought up my article on XML-DEV:

For some time now we've seen the JSON "fat-free alternative to XML" direction that some in the AJAX world are taking to address both XML's inefficiency and the mismatch with programming languages. Now I see that many in the Python community have a similar attitude toward XML and encourage its use only when necessary to exchange data with non-Python apps.

He followed with a list of thoughts, touching on the likely roles of JSON, Python, XML, and more, and I responded. To much to quote from the exchange. Read the originals yourself, if you like. I will mention the final thought in my response:

In many ways I think a vicious backlash from programming languages against XML is just what XML needs right now.

In saying that, I had in mind some of my other prosaic articles about the direction of XML, including:

I think that many XML folks have been working to encroach on the territory of languages such as Python, even if Python folks aren't always clear on this fact while complaining about XML. We'll just have to see how it all shakes out. I know what pattern of tool usage I'll stick to for now. Speaking of omni-tools, Dimitre Novatchev put in a plug for XSLT as general-purpose programming language, which he's also done here in Copia comments. I still think it's a bad idea to treat XSLT as anything other than a template language. XSLT in its place, Python (or Javascript, Ruby, or whatever) in its place.

In the comments on my article there are some interesting bits, including one correspondent's mention of the importance of open file formats, and the XML's role in this, followed bewilderingly by:

C++ is so powerful that with the right classes, many of the advantages of a scripting language are attainable.

Sounds like someone who badly needs to actually try Python.

[Uche Ogbuji]

via Copia

Windows prebuilt binary package for Amara

Several Amara users have mentioned trying to build for Windows but running into problems, or a requirement for .NET. Sylvain Hellegouarch comes to the rescue with a binary package he built using Amara 1.0, latest 4Suite CVS, Python 2.4.1 and .NET 1.1 (latest patches). Windows users can install this package without having to worry about compilers or any other such hassles. Sylvain posted it under the default name generated by distutils, but I renamed it as appropriate and I'm hosting it on the Amara home page, and in the Amara contrib FTP area. Please let Sylvain and me know (by posting to the 4Suite mailing list with "[Amara]" in the subject) if you have any trouble with this package.

Meanwhile work on the 1.2 branch continues. I've got up to 30% speed and memory usage improvement over Amara Bindery 1.0, in large part by moving from PySax to 4Suite's undocumented Saxlette library (all in highly optimized C). Jeremy is also working on suspend/resume for 4Suite's parser, which would allow for a huge performance boost for pushbind. I'll try to start working on RELAX NG features if I get a chance this week.

[Uche Ogbuji]

via Copia

Off the line by Gary Speed

So Newcastle United are pressing Bolton Wanderers hard early at The Reebok, Trotter fans in a hush, Toon Army baying for away blood, and all that, and all of a sudden, off runs Gary Speed to his right post, as if whispered a message by very wing-sandaled Hermes. Tactically, he has no reason to be at that spot, but before I (and probably millions of other viewers) can open our mouths to say "what the hell are you doing loitering on that post, you goat?", Lee Bowyer gets a cock eye at the ball and shapes in a wicked shot past Jaaskelainen and right onto the head of, yeah, you guessed it, Gary Speed. Cleared off the line by the lost boy. One of the wackiest sequences I've seen in Soccer this year, I must say. I find it easier to believe that Gary Speed was visited by sudden clairvoyance than to believe that anyone is that stupid-lucky.

[Uche Ogbuji]

via Copia

Python/XML column #36: Should Python and XML Coexist?

"Python and XML: Should Python and XML Coexist?"

In his latest Python and XML column, Uche Ogbuji claims that the costs of using XML as a little language in a Python application may outweigh the benefits of doing so. [Aug. 25, 2005]

In this article I discuss some of the recent round of complaints about XML in the Python community, trying to give perspective that Python and XML should serve very different domains. Treating them in competition for any particular task is often a more general problem of misunderstanding the basic nature of one technology or the other, and it often leads to overstated complaints.

A correspondent already asked one good follow-up question:

My question is simply: do you have a recommendation for an alternative language (or other protocol) that is more suitable for expressing data structures, preferably one that can be coded for reasonably quickly in Python?

YAML seems to be the front-runner for a cross-language data structure format, although JSON is hot these days, courtesy the AJAX hype. I tend to point to Paul Tchistopolskii's, "Alternatives to XML" for a more comprehensive list.

[Uche Ogbuji]

via Copia

What Are You Doing, Dave?

I just updated the 4Suite Repository Ontology (as an OWL instance). Specifically, I added appropriate documentation for most of the major components and added rdfs:subPropertyOf/rdfs:subClass/rdfs:seeAlso relationships with appropriate / related vocabularies (WordNet/Foaf/Dublin Core/Wikipedia). In addition, where appropriate, I've added links to 4suite literature (currently scattered between IBM Developer Works articles/tutorials and Uche's Akara sections).

There are some benefits:

  • This can serve as a framework for documenting the 4Suite repository (to augment the very sparse documentation that does exist)
  • Provide a formal model for the underlying RDF Graph that 'drives' the repository

This latter benefit might not be so obvious, but imagine being able to provide rules that cause implications identifying certain repository containers as RSS channels (and their child Xml documents / Rdf document as the corresponding RSS items) and associating Foaf metadata with repository users.

Some of the more powerful hooks to the System RDF graph (which the above ontology is a model of) - such as the starting/stopping of servers (currently triggered by the fchema:server.running property on fchema:server instances), purging of resources marked as temporary (by the fchema:time_to_live property), and triggering of an XSLT transform (by the fchema:run_on_strobe property) - can be further augmented by other associations in the graph, resulting in an almost 'sentient' content/application server. A little far-fetched?

[Uche Ogbuji]

via Copia