Itinerant Binds - Better Software Documentation

It was brought to my attention that my recent entry about Sparta/Versa/rdflib possibilities was a little vague/unclear. This tends to happen when I get caught up in an interest. Anyways,.. I renamed the module to Itinerant Binds (I liked the term), created a page on Metacognition for the recent rdflib/4Suite RDF work I've been doing with some more details on how the components works. I added an example that better demonstrates isolating RDF resources through targeted Versa queries and using the bound python result objects to modify / extend the underlying graph.

Chimezie Ogbuji

via Copia

PyBlosxom...CherryPy...hmmm

I have so much hacking to do on Copia's engine that it puts me off doing anything at all. The atom rendering bug I stumbled upon yesterday especially needs a look-in, but I suspect it would take a move from flavor to plug-in to fix it. I'm just honestly not all that bullish about hacking PyBlosxom right now. Not while I've been having so much fun with CherryPy lately.

A while ago Bill Mill said in a comment here:

I'm about halfway done with a "pyblosxom in cherrypy" thing I've been working on. It works, reads all my pyblosxom blogs, and can leave comments. I just need to refactor it to be more sensible and plugin-oriented.

That's the sort of sign I need in order to to hang on, but I do hope Bill works his way through the remaining half soon enough.

Speaking of CherryPy, recently spotted Cookbook entry: "A simple integration of a CherryPy web server, using Quixote template publishing, managed in its own thread."

[Uche Ogbuji]

via Copia

Convenience APIs for 4Suite Domlette parsing

I added some functions to make a baby step into Domlette parsing. I call these the brain-dead APIs (though we use more decorous terms officially). You can now get yourself a crisp DOM with as little effort as:

>>> from Ft.Xml import Parse
>>> doc = Parse("<hello>world</hello>")

And thence on to the usual fun stuff:

>>> print doc.xpath(u"string(hello)")
world

Parse also knows how to handle streams (file-like objects):

>>> doc = Parse(open("hello.xml"))

Do a help(Parse) to get a warning about using that function with XML that is not self-contained, and an example of how to parse such XML properly in 4Suite.

There is also ParsePath, which handles files paths and URLs:

>>> from Ft.Xml import ParsePath
>>> doc = ParsePath("hello.xml")
>>> doc = ParsePath("http://copia.ogbuji.net/blog/index.atom")

And what do you know? That last URL does not parse. My feed is ill-formed. ☠☠☠☠☠ (I love cursing in Unicode) Sigh. Gotta fix the PyBlosxom Atom generation. Maybe we need a touch of 4Suite (Amara, actually) there, as well.

[Uche Ogbuji]

via Copia

Kumite! Python vs. Javascript! script vs. XForms! declarativity vs. wizards!

Sylvain's question about Javascript-without-tears certainly kicked off a chain of interesting dialogue, for me. It also happens to dovetail with some interesting dialog elsewhere that started independently.

Kurt had an interesting take in his follow-up "Javascript and Python"

To be blunt, I would prefer to see a Python interpreter being standard issue in all browsers in addition to a Javascript one, as I believe the language to be much more expressive, more object oriented, more secure, easier to write, and in general better suited to contemporary needs. Unfortunately, browsers in particular are informed as much by business and political decisions, some, perhaps even most, based less upon the best technology for a problem and more based upon what will provide the best backward compatibility to insure that existing websites do not break or that download sizes remain under some critical value.

I like Python as well, but boy do I shudder to imagine the political conflagration that would ensue if Python were elevated in Web programming over peers such as Perl and Ruby. Javascript appears to have carved out a niche as the Switzerland of dynamic languages.

I would like to see diversity in Web scripting languages, so that others have choices besides Javascript, and as Kurt says, Mozilla does seem to be taking the lead in this. It's really cool to see the Mozilla Wiki entry "Breaking the grip JS has on the DOM". And you gotta love the opening lines:

We want to change the grip JS has on the DOM and on XUL. We will do this in 2 steps:

Ideally, the first step could be done without consideration for the second, in the assumption at the implementation should be truly language neutral.

But regardless of scripting language, there is the fact that XForms is out there looking to take on the very need for scripting in many of the browser use cases. Kurt says:

I am a major advocate of XML, precisely because it is much more difficult to isolate a language when a mapping is essentially an XSLT transformation away. For this reason, XForms is a very attractive model to be moving towards, certainly, and I look forward to the day that I can build XForms applications that work in all browsers equally. However, XForms is not even remotely widely implemented yet nor are there standard forms of declarative binding languages along the lines of XBL (sXBL is getting there, but so far there are perhaps two still very much test implementations in existence).

Chime predictably takes exception to this (in a response to Kurt). He's been a very involved early adopter of XForms, and I've been amazed to see how productive XForms has made him, so I take his point very seriously when he says:

I beg to differ. Though it may be true that it doesn't quite have the traction that Flash has at the moment, I wouldn't go as far as saying it isn't remotely, widely implemented. There are several very mature implementations...
[...] [re:eliminating the need for an imperative language] Once again, I have yet find myself in a situation where I needed javascript for UI-related capabilities that weren't covered by XForms event processing, instance binding, and other such [programmatic] components. The only time I did was when I had to encode XML content as base64 encoded binary (see: http://copia.ogbuji.net/blog/2005-08-19/BinaryEncodingAndXMLRPCs) and had little to do with XForms but more with the means of remote communication (SOAP). I'm not suggesting that frameworks such as XForms will eliminate the need for an imperative language, but rather that the need will be more like the reverse of your 80/20 proposition.

To be fair, I think Kurt was saying that script is only needed for the 20% case, and not the 80% case. He just felt that declarative solutions architecture "is hideously inappropriate for the remaining 20%." I do agree with that, but I think that people (not Kurt) tend to exaggerate this fact as an argument against declarative programming.

Chime wraps up:

I must give the disclaimer that I'm not suggesting XForms will be a user interface / browser-based application building [panacea], but rather that the potential it has to eliminate the unportable, architecturally unsound code that often drive DHTML web-sites with minimal complexity is very much overlooked.

Based on what I've seen of XForms, I tend to agree. I actually think Chime and Kurt are more in agreement than they sound, except maybe on the matter of the maturity of XForms engines.

At almost the same time another script versus XML exchange was going on in Mark Birbeck's blog, and in particular "On Adobe and XForms via Declarative Programming, Wizards and Aspects". That article is well worth reading in its entirety, but I'll highlight what he says about Wizards:

...in nearly all cases I find the 'wizard approach' is great to get you started, but then very quickly gets complex again. Anyone who uses Microsoft's Visual Studio, for example, will know that getting a C++ application up and running quickly, with support for multiple windows, toolbars, printing and file saving, is a snip. But then when you want to modify that code and move away from the wizard, you are very soon into normal C++ territory.

I tend to put this point even more strongly, as I do in my article "The worry about program wizards", but I'm always happy to have it reinforced.

In some ways I see wizards as the shoddy high street knock-off of declarative systems. Well designed declarative systems fully encapsulate modal aspects of the application in development, and they expose slots for ready extension in imperative implementation, if needed. Wizards, on the other hand, do focus on parameters in a way tantalizingly like declarative systems, but then ruin the entire plot by handing the programmer a hairball of imperative code that they have to hack at arbitrarily in order to complete the application. It's the difference between just plugging a device into a USB port to add capability to your PC, rather than having the motherboard thrust in your face so that you can find the right place to solder in the leads.

Chimezie Ogbuji

via Copia

Wrapping rdflib's Graph around a 4RDF Model

Well, for some time I had pondered what it would take fo provide SPARQL support in 4Suite RDF. I fell upon sparql-p, earlier and noticed it was essentially a SPARQL query processor w/out a parser to drive it. It works over a deprecated rdflib interface: TripleStore. The newly suggested interface is Graph, which is as solid suggestion for a generic RDF:API as any. So, I wrote a 4Suite RDF model backend for rdflib, that allows the wrapping of Graph around a live 4Suite RDF model. Finally, I used this backend to execute a sparql-p query over http://http://del.icio.us/rss/chimezie:

SELECT
  ?title
WHERE {
  ?item rdf:type rss:item;
        dc:subject ?subj;
        rss:title ?title.
        FILTER (REGEX(?subj,".*rdf")).
}

The corresponding python code:

#Setup FtRDF Model
Memory.InitializeModule()   
db = Memory.GetDb('rules', 'test')
db.begin()
model = Model.Model(db)

#Parse my del.icio.us rss feed
szr = Dom.Serializer()
domStr=urllib2.urlopen('http://del.icio.us/rss/chimezie').read()        
dom = Domlette.NonvalidatingReader.parseString(domStr,'http://del.icio.us/rss/chimezie')
szr.deserialize(model,dom,scope='http://del.icio.us/rss/chimezie')

#Setup rdflib.Graph with FtRDF Model as Backend, using FtRdf driver
g=Graph(FtRdf(model))

#Setup sparql-p query processor engine
select = ("?title")

#Setup term
copia = URIRef('http://del.icio.us/chimezie')
rssTitle = URIRef('http://purl.org/rss/1.0/title')
versaWiki = URIRef('http://en.wikipedia.org/wiki/Versa')
dc_subject=URIRef("http://purl.org/dc/elements/1.1/subject")

#Filter on objects of statements (dc:subject values) - keep only those containing the string 'rdf'
def rdfSubFilter(subj,pred,obj):
    return bool(obj.find('rdf')+1)

#Execute query
where = GraphPattern([("?item",rdf_type,URIRef('http://purl.org/rss/1.0/item')),
                       ("?item",dc_subject,"?subj",rdfSubFilter),
                       ("?item",rssTitle,"?title")])    
tStore = myTripleStore(FtRdf(model))
result = tStore.query(select,where)
pprint(result)

The result (which will change daily as my links shift thru my del.icio.us channel queue:

[chimezie@Zion RDF-API]$ python FtRdfBackend.py
[u'rdflibUtils',
 u'Representing Specified Values in OWL: "value partitions" and "value sets"',
 u'Sparta',
 u'planner-rdf',
 u'RDF Template Language 1.0',
 u'SIOC Vocabulary Specification',
 u'SPARQL in RDFLib',
 u'MeetingRecords - ESW Wiki',
 u'Enumerated datatypes (OWL)',
 u'Defining N-ary Relations on the Semantic Web: Use With Individuals']

Chimezie Ogbuji

via Copia

Thinking XML #33: Serving up WordNet as XML

"Thinking XML: Serving up WordNet as XML"

Subtitle: Build the basic WordNet/XML facilities into a Web server framework
Synopsis: A few articles back, Uche Ogbuji discussed WordNet 2.0, a Princeton University project that aims to build a database of English words and lexical relationships between them. He showed how to extract XML serializations from the word database. In this article he continues the exploration, demonstrating code to serve up these WordNet/XML documents over Web protocols and showing you how to access these from XSLT.

This is the second part of a mini-series within the column. The previous article is "Querying WordNet as XML,", in which I present Python code for processing WordNet 2.0 into XML. This time I use CherryPy to expose the XML on the Web, either in human-readable or in raw form. This seems to be part of a nice trend of CherryPy on developerWorks. I hope people see this as yet another example of how easy and clean CherryPy is.

See other articles in the column. Comments here on Copia or on the column's official discussion forum. Next up in Thinking XML, RDF equivalents for the WordNet/XML.

[Uche Ogbuji]

via Copia

Yes! Markdown needs attributes, not footnotes

I tried to post this to the Markdown mailing list, but they have a policy of rejecting (not just moderating) non-member mailings. Seems a bit harsh to me, considering that a combination of moderation and spam filtering is not really that much of a burden (as I know from copious experience), but whatevah. I really don't have room to join yet another mailing list.

In response to a thread that started out discussing internal links, and then falling back to footnotes, Arisotle Pagaltzis made this brilliant suggestion:

Lately I’ve increasingly been thinking that it would be nice, perfectly sufficient for footnotes, but also useful for many other uses, to have a {@attr=value} syntax (or something similar) which attaches the given attribute to its surrounding tag. So this example attribute {@id=foo} would be ganked from where it is and dattached to the tag for this paragraph: <p id="foo">; whereas *this {@class=shout}* would attach to the emphasis tag: <em class="shout">

I want to say to the Markdown folks: please, please listen to Aristotle's suggestion. It is exactly what Markdown needs now. It would add much of the power and flexibility that is currently lacking when I consider Markdown against reStructuredText, and it would do so without the welter of punctuation in ReST. A complete win. He also gives an example of how this would enable useful image elements for Markdown:

![{@width=200} Or image metadata.](bar.png) {@align=center}

Currently it is not possible to use Markdown syntax to produce images that meet general usability guidelines. You have to embed a full img tag. Aristotle's suggestion elegantly solves that problem as well.

I think that this is far more important than such minutiae as footnotes, which are all the current rage on the Markdown list. This is not to say that the need for footnotes is minute, but rather that the need for footnotes can easily be seen as one aspect of a more general problem in expression. If Markdown had a way of asserting ID attributes, there would be flexible support for the many different ways people want to express footnotes, margin notes, biblio refs, etc. But by solving footnotes as a complete problem in themselves, Markdown is just heading down the path to punctuation madness.

I suggest starting with flexible attribute syntax such as Aristotle suggests, using this for footnotes as far as it goes, and then, if usage experience shows that something yet more convenient is needed, reconsider specialized footnote syntax.

I also think it's worth explicitly supporting the attribute syntax at the top of a markdown document as a way to express overall document metadata, which is another gap I see in Markdown.

In his concurrence Jelks Cabaniss suggests:

The only thing I might add are "shortcuts" when the attributes are ID and CLASS: {#foo} as an alias of {@id=foo}, and {.bar} for {@class=bar}.

But I think these should be left off for the moment, and maybe added later if they seem especially useful. Again I'm just wary of increasing the amount of punctuation one has to remember. Jelks does wrap up with what I consider the knockout point:

BTW, take a look at http://daringfireball.net/projects/markdown/syntax.text. It's sprinkled full of "real HTML" precisely because of this deficiency. (There aren't any CLASS attributes in that particular document, but even there, if there were Markdownish equivalents of ID and TITLE attributes, that would all go away.)

I'm pretty sure I'll implement this into my Python Markdown tools even if it doesn't become part of the "official" Markdown spec. I need it too badly (I really don't want to have to move to ReST just yet).

[Uche Ogbuji]

via Copia

Domlette and Saxlette: huge performance boosts for 4Suite (and derived code)

For a while 4Suite has had an 80/20 DOM implementation completely in C: Domlette (formerly named cDomlette). Jeremy has been making a lot of performance tweaks to the C code, and current CVS is already 3-4 times faster than Domlette in 4Suite 1.0a4.

In addition, Jeremy stealthily introduced a new feature to 4Suite, Saxlette. Saxlette uses the same Expat C code Domlette uses, but exposes it as SAX. So we get SAX implemented completely in C. It follows the Python/SAX API normally, so for example the following code uses Saxlette to count the elements:

from xml import sax

furi = "file:ot.xml"

class element_counter(sax.ContentHandler):
    def startDocument(self):
        self.ecount = 0

    def startElementNS(self, name, qname, attribs):
        self.ecount += 1

parser = sax.make_parser(['Ft.Xml.Sax'])
handler = element_counter()
parser.setContentHandler(handler)
parser.parse(furi)
print "Elements counted:", handler.ecount

If you don't care about PySax compatibility, you can use the more specialized API, which involves the following lines in place of the equivalents above:

from Ft.Xml import Sax
...
class element_counter():
....
parser = Sax.CreateParser()

The code changes needed from the first listing above to regular PySax are minimal. As Jeremy puts it:

Unlike the distributed PySax drivers, Saxlette follows the SAX2 spec and defaults feature_namespaces to True and feature_namespace_prefixes to False both of which are not allowed to be changed (which is exactly what SAX2 says is required). Python/SAX defaults to SAX1 behavior and Saxlette defaults to SAX2 behavior.

The following is a PySax example:

from xml import sax

furi = "file:ot.xml"

#Handler has to derive from sax.ContentHandler,'
#or, in practice, implement all interfaces
class element_counter(sax.ContentHandler):
    def startDocument(self):
        self.ecount = 0

    #SAX1 startElement by default, rather than SAX2 startElementNS
    def startElement(self, name, attribs):
        self.ecount += 1

parser = sax.make_parser()
handler = element_counter()
parser.setContentHandler(handler)
parser.parse(furi)
print "Elements counted:", handler.ecount

The speed difference is huge. Jeremy did some testing with timeit.py (using more involved test code than the above), and in those limited tests Saxlette showed up as fast as, and in some cases a bit faster than cElementTree and libxml/Python (much, much faster than xml.sax in all cases). Interestingly, Domlette is now within 30%-40% of Saxlette in raw speed, which is impressive considering that it is building a fully functional DOM. As I've said in the past, I'm done with the silly benchmarks game, so someone else will have to pursue matters to further detail if they really can't do without their hot dog eating contests.

In another exciting development Saxlette has gained a generator mode using Expat's suspend/resume capability. This means you can have a Saxlette handler yield results from the SAX callbacks. It will allow me, for example, to have Amara's pushdom and pushbind work without threads, eliminating a huge drag on their performance (context switching is basically punishment). I'm working this capability into the code in the Amara 1.2 branch. So far the effects are dramatic.

[Uche Ogbuji]

via Copia

Xampl, re: "XML data bindings, static languages, dynamic languages"

In response to XML data bindings, static languages, dynamic languages Bob Hutchison posted some thoughts. As I used Amara as the kernel of my demonstrations, Bob used his project xampl as the kernel of his. He introduces xampl in another entry which was inspired by my own article on EaseXML.

Xampl is a an XML data binding. As Bob writes:

Secondly, there are versions of xampl for Java and Common Lisp. I’ve got an old (summer 2002) version for Ruby that needs updating (I wrote the xampl-pp pull parser to support this experiment).

Bob says that Xampl also deals with things that Elliotte Harold mentions as usual scourges for Java data bindings: mixed content, repeated elements, omitted elements, and element order. Of course these things should be food and drink to any XML tool, and I'm glad folks are finally plugging such gaping holes. Eric van der Vlist is also in the game with TreeBind, and it seems some Java tools try to wriggle out of the pinch by using XQuery.

Based on Bob's snippets, Xampl looks handy. Rather verbose, but no more so than Java pretty much requires. One thing that strikes me in Bob's examples is that Xampl appears to require and create a bogon namespace (http://www.xampl.com/labelsExample). It seems maybe it has something to do with Java packaging or something, but regardless of the role of this fake namespace, the XML represented by Xampl in Bob's snippets is not the same as the XML in the original source examples. An unprefixed element in a namespace is of course not the same thing as an element in the null namespace. I would not accept any tool that involves such a mix-up. It's quite possible that Xampl does not do so, and I'm just misunderstanding Bob's examples.

Bob provides Xampl code to match the EaseXML snippets in my article. Similarly to how EaseXML requires Python framework code, Xampl requires XML framework code. Since "XML situps" have been on the wires lately, they come to mind for a moment, but hey, if you're already processing XML with Xampl, I suppose you might not flinch at one more XML. I will point out that Amara does not require any framework code whatsoever besides the XML itself, not even an XML schema. It effectively provides dynamic code generation.

Xampl turns XML constructs into Java getters, e.g. html.getHead(). Amara uses the Python convention of properties rather than getters and setters, so you have html.head, and you can even assign to this property in order to mutate the XML. Xampl looks neat. The things that turn me off are largely things that are pretty much inevitable in Java, not least the very large amount of code generated by the binding. It supports XPath, as Amara does, and provides a "rhino" option to expose XML objects through Javascript, which offers you a bit more of the flexibility of Python (I don't know how much overhead to expect from Javascript through Java through XML, but it's a question I'd be quick to ask as a user).

It's good to have projects such as Xampl and Treebind and Nux. I'd rather use Python tools such as Amara, Gnosis and GenerateDS, but Java has the visibility and it's good for people to be aware that XML does not necessarily require greater imprisonment of expression than what comes with the application language. You don't need to accept crazy idioms and stifling limitations in matters as fundamental as mixed content and element ordering. XML and sanity can coexist.

[Uche Ogbuji]

via Copia

Python + XML = wary coexistence

There has been quite a bit of discussion triggered by my article "Python and XML: Should Python and XML Coexist?". This sort of thing always surprises me more than it should. I like to post code-heavy articles and leave the philosophy to the occasional entry, or to this very Weblog, but it seems that people respond more vocally to philosophy than to code. Perhaps I'll discuss with Kendall, my editor, what this suggests in terms of future directions for my Python/XML column.

Anyway, first I point to PJE's response. I used quotes from his Weblog as jumping-off points for my article.

Uche Ogbuji liberally quotes from and analyzes two of my XML-v.-Python rants, and actually gets it completely right. Since at least one of those rants has been cited as meaning I think XML is the spawn of Satan, I'm glad Uche read closely enough to get the context and nuance, without projecting things into it that I didn't say. Kudos!

I don't claim to know whom PJE speaks of when he refers to other commentary on his rant, but Martijn Faassen indicated his own response. I do think that Martijn missed some of PJE's intended nuance, but to be fair, it took me more than one reading to catch that nuance. I think that PJE could have saved himself a lot of misunderstanding, but hell, I've had my turn at thickly nuanced rant myself, so I see both sides. Looking more broadly at the landscape, Martijn puts succinctly what I've said in the past.

This disdain for XML technologies is very common among Python programmers.

But maybe that means something greater than petty rivalry. Mike Champion brought up my article on XML-DEV:

For some time now we've seen the JSON "fat-free alternative to XML" direction that some in the AJAX world are taking to address both XML's inefficiency and the mismatch with programming languages. Now I see that many in the Python community have a similar attitude toward XML and encourage its use only when necessary to exchange data with non-Python apps.

He followed with a list of thoughts, touching on the likely roles of JSON, Python, XML, and more, and I responded. To much to quote from the exchange. Read the originals yourself, if you like. I will mention the final thought in my response:

In many ways I think a vicious backlash from programming languages against XML is just what XML needs right now.

In saying that, I had in mind some of my other prosaic articles about the direction of XML, including:

I think that many XML folks have been working to encroach on the territory of languages such as Python, even if Python folks aren't always clear on this fact while complaining about XML. We'll just have to see how it all shakes out. I know what pattern of tool usage I'll stick to for now. Speaking of omni-tools, Dimitre Novatchev put in a plug for XSLT as general-purpose programming language, which he's also done here in Copia comments. I still think it's a bad idea to treat XSLT as anything other than a template language. XSLT in its place, Python (or Javascript, Ruby, or whatever) in its place.

In the comments on my article there are some interesting bits, including one correspondent's mention of the importance of open file formats, and the XML's role in this, followed bewilderingly by:

C++ is so powerful that with the right classes, many of the advantages of a scripting language are attainable.

Sounds like someone who badly needs to actually try Python.

[Uche Ogbuji]

via Copia