Amara 1.2 goes alpha, and other developments

First of all 4Suite went 1.0 rather quietly because the day job schedule has left room for very little besides quiet releases. It's probably just as well because by common standards 4Suite has been 1.0 grade for years. Under any less conservative version numbering scheme it would be 4Suite 3.0 by now.

I'm pushing Amara to 1.2 (a more typical progression of version numbers in that case) and after a developers-only alpha, we've released alpha 1.2a2 publicly, but quietly. As I've hinted before I have a lot of ideas for Amara post 1.2. The next major branch will be a full rewrite, probably to be released as Amara 2.0. Anyway, see the draft for the 1.2 full release announcement.

I also put together a quick start recipe for Amara on Ubuntu, and Luis Miguel Morillas has one for Windows users in Spanish. He says he'll be translating it to English soon, and when he does, I'm sure he'll link it from his "Amara Installers for Windows Users" page.

[Uche Ogbuji]

via Copia

“Real Web 2.0: Bookmarks? Tagging? Delicious!”

“Real Web 2.0: Bookmarks? Tagging? Delicious!”

Subtitle: Learn how real-world developers and users gain value from a classic Web 2.0 site
Synopsis: In this article, you'll learn how to work with del.icio.us, one of the classic Web 2.0 sites, using Web XML feeds and JSON, in Python and ECMAScript. When you think of Web 2.0 technology, you might think of the latest Ajax tricks, but that is just a small part of the picture. More fundamental concerns are open data, simple APIs, and features that encourage users to form social networks. These are also what make Web 2.0 a compelling problem for Web architects. This column will look more than skin deep at important real-world Web 2.0 sites and demonstrate how Web architects can incorporate the best from the Web into their own Web sites.

This is the first installment of a new column, Real Web 2.0. Of course "Web 2.0" is a hype term, and as has been argued to sheer tedium, it doesn't offer anything but the most incremental advances, but in keeping with my tendency of mildness towards buzzwords I think that anything that helps focus Web developers on collaborative features of Web sites is a good thing. And that's what this column is about. It's not about the Miss AJAX pageant, but rather about open data for users and developers. From the article:

The substance of an effective Web 2.0 site, and the points of interest for Web architects (as opposed to, say, Web designers), lie in how readily real developers and users can take advantage of open data features. From widgets that users can use to customize their bits of territory on a social site to mashups that developers can use to create offspring from Web 2.0 parents, there are ways to understand what leads to success for such sites, and how you can emulate such success in your own work. This column, Real Web 2.0, will cut through the hype to focus on the most valuable features of actual sites from the perspective of the Web architect. In this first installment, I'll begin with one of the ancestors of the genre, del.icio.us.

And I still don't want that that monkey-ass Web 1.0. Anyway, as usual, there's lots of code here. Python, Amara, ECMAScript, JSON, and more. That will be the recipe (mixing up the ingredients a bit each time) as I journey along the poster child sites for open data.

[Uche Ogbuji]

via Copia

Quick grab of XHTML metadata

I recently needed some code to quickly scrape the metadata from XHTML Web pages, so I kicked up the following code:

import amara

XHTML1_NS = u'http://www.w3.org/1999/xhtml'
PREFIXES = { u'xh': XHTML1_NS }

def get_xhtml_metadata(source):
    md = {}
    for node in amara.pushbind(source, u'/xh:html/xh:head/*', prefixes=PREFIXES):
        if node.localName == u'title':
            md[u'title'] = unicode(node)
        if node.localName == u'link':
            linkinfo = dict([ (attr.name, unicode(attr))
                              for attr in node.xml_xpath(u'@*') ])
            md.setdefault(u'links', []).append(linkinfo)
        elif node.xml_xpath(u'self::xh:meta[@name]'):
            md[node.name] = unicode(node.content)
    return md

if __name__ == "__main__":
    import sys, pprint
    source = sys.argv[1]
    pprint.pprint(get_xhtml_metadata(source))

So, for example, scraping planet XML:

$ python xhtml-metadata.py http://planet.xmlhack.com/
{u'links': [{u'href': u'planet.css',
             u'media': u'screen',
             u'rel': u'stylesheet',
             u'title': u'Default',
             u'type': u'text/css'},
            {u'href': u'/index.rdf',
             u'rel': u'alternate',
             u'title': u'RSS',
             u'type': u'application/rss+xml'}],
 u'title': u'Planet XMLhack: Aggregated weblogs from XML hackers and commentators'}

[Uche Ogbuji]

via Copia

Amara trimxml: an XML reporting tool

For the past few months in my day job (consulting for Sun Microsystems) I've been working on what you can call a really big (and hairy) enterprise mashup. I'm in charge of the kit that actually does the mashing-up. It's an XML pipeline that drives merging, processing and correction of data streams. There are a lot of very intricately intersecting business rules and without the ability to make very quick ad-hoc reports from arbitrary data streams, there is no way we could get it all sorted out given our aggressive deadlines.

This project benefits greatly from a side task I had sitting on my hard drive, and that I've since polished and worked into the Amara 1.1.9 release. It's a command-line tool called trimxml which is basically a reporting tool for XML. You just point it at some XML data source and give it an XSLT pattern for the bits of interest and optionally some XPath to tune the report and the display. It's designed to only read as much of the file as needed, which helps with performance. In the project I discussed above the XML files of interest range from 3-100MB.

Just to provide a taste using Ovidiu Predescu's old Docbook example, you could get the title as follows:

trimxml http://xslt-process.sourceforge.net/docbook-example.xml book/bookinfo/title

Since you know there's just one title you care about you can make sure trimxml stops looking after it finds it

trimxml -c 1 http://xslt-process.sourceforge.net/docbook-example.xml book/bookinfo/title

-c is a count of results and you can set it to other than 1, of course.

You can get all titles in the document, regardless of location:

trimxml http://xslt-process.sourceforge.net/docbook-example.xml title

Or just the titles that contain the string "DocBook":

trimxml http://xslt-process.sourceforge.net/docbook-example.xml title "contains(., 'DocBook')"

The second argument is an filtering XPath expression. Only nodes that satisfy that condition are reported.

By default each entire matching node is reported, so you get an output such as "". You can specify something different to display for each match using the -d flag. For example, to just print the first 10 characters of each title, and not the title tags themselves, use:

trimxml -d "substring(., 0, 10)" http://xslt-process.sourceforge.net/docbook-example.xml title

There are other options and features, and of course you can use the tool on local files as well as Web-based files.

In another useful development in the 4Suite/Amara world, we now have a Wiki.

With 4Suite, Amara, WSGI.xml, Bright Content and the day job I have no idea when I'll be able to get back to working on Akara, so I finally set up some Wikis for 4Suite.org. The main starting point is:

http://notes.4suite.org/

Some other useful starting points are

http://notes.4suite.org/AmaraXmlToolkit
http://notes.4suite.org/WsgiXml

As a bit of an extra anti-vandalism measure I have set the above 3 entry pages for editing only by 4Suite developers. [...] Of course you can edit and add other pages in usual Wiki fashion. You might want to start with http://notes.4suite.org/4SuiteFaq which is a collaborative addendum to the official FAQ.

[Uche Ogbuji]

via Copia

Rete-inspired N3 Rule Network Finished

See: previous

I called it quits (for now) in trying to retrofit the RETE-based algorithm to handle N3 functions & filters. I've checked in a rewritten Rete module with tests for SHIOF Description Logic axioms.

Tracking dependencies between filter / function variables, became hairy.

An SVG diagram of these rules compiled into a RETE-like network was generated using an included Boost Graph Library utility function.

It's unit test output:

testRules (__main__.TestEvaluateNetwork) ... Time to build production rule (RDFLib): 0.0118989944458 seconds
Time to calculate closure on working memory: 0.478726148605 seconds
ok
----------------------------------------------------------------------
Ran 1 test in 0.751s
OK

I also integrated Python iterator algebra implementation of a relational hash join (for the Beta Nodes).

It workes with RDFLib, so I'd eventually like to integrate the ability to dispatch SPARQL queries over an N3 Closure Graph - generated from the network. The speed in which it is able to render rules graphs rendered the option of serializing a compiled network into persistence a non-issue.

From the README.txt:

Fu Xi (pronounced foo-see) is a forward-chaining inferencing expert system for RDFLib Notation 3 graphs. It is named after the first mythical soveriegn of ancient china who supposedly, 'looked upward and contemplated the images in the heavens, and looked downward and contemplated the occurrences on earth.'.

Originally, it was an idea to express the formalisms of the Yi Jing / I Ching in Description & First Order Logic.

It relies on Charles Forgy's Rete algorithm for the many pattern/many object match problem - which builds a triple pattern reasoning network. It can be used as a reasoner with capabilities for certain expressive Description Logics (via OWL/RDFS axioms in N3) or a general N3 production / expert system. It uses Python hash / set / list idioms to maximize matching efficiency of the network. In particular, it uses an Iterator algebra implementation for the join activation mechanism

An example of its use:

from FuXi.Rete.Util import generateTokenSet
        from FuXi.Rete import *
        from rdflib.Graph import Graph,ReadOnlyGraphAggregate
        from rdflib import plugin
        from FuXi.Rete.RuleStore import N3RuleStore
        store = plugin.get('IOMemory',Store)()
        store.open('')
        ruleGraph = Graph(N3RuleStore())
        ruleGraph.parse(open('some-rule-file.n3'),format='n3')             
        factGraph = Graph(store)
        factGraph.parse(open('fact-file'),format='..')
        closureDeltaGraph = Graph(store)            
        network = ReteNetwork(ruleStore,
                              initialWorkingMemory=generateTokenSet(factGraph),
                              inferredTarget = closureDeltaGraph)            
        for s,p,o in network.closureGraph(factGraph):
            .. do something with triple ..

To check out from cvs:

cvs -d:pserver:anonymous@cvs.4suite.org:/var/local/cvsroot login
cvs -d:pserver:anonymous@cvs.4suite.org:/var/local/cvsroot get Fuxi

Chimezie Ogbuji

via Copia

Amara en Español

My focus during open-source development availability has been on pushing 4Suite XML to 1.0 (and we're on the very final leg of that journey). I'm still putting a bit of time into Amara, but I should have even more time for it soon, and I have many ideas for what to do with that time.

Others have been up to fun stuff with Amara as well, and no more so, it seems, than Spanish speakers. Luis Miguel Morillas has been putting Amara through its paces in his LivingPyXML project. César Cárdenas Desales has contributed a nice intro "Procesamiento fácil de XML con Python y Amara"

A pesar de que la libreria estándar Python cuenta con herramientas y modulos para el procesamiento de XML con SAX y DOM, muchos programadores han pensado que podrían existir formas más simples de trabajar con XML. Amara es un conjunto de herramientas que sirven para facilitar el procesamiento de XML usando Python. En este manual se da una breve introducción al uso de Amara para dichas tareas.

Yep. That was pretty much the entire idea.

Original link (not as up-to-date): "Procesamiento fácil de XML con Python y Amara"

[Uche Ogbuji]

via Copia

LazyWeb Ho! Detecting whether a browser supports XML+XSLT

I'm wrapping up applyxslt, a WSGI middleware module to serve separate XML and XSLT to browser that can handle it (using the stylesheet PI. For browsers that can't it would intercept the response and perform the XSLT transform for the browser, sending on the result. BTW, for more on WSGI Middleware, see “Mix and match Web components with Python WSGI”.

My biggest uncertainty is the best way to determine whether a browser can handle XML+XSLT. I doubt anything in the Accept header would help, so I'm left having to list all User-Agent strings for browsers that I know can handle this (basically Firefox, Opera, and recent Mozilla, Safari and MSIE).

So far I'm deriving my User-Agent list from several sources, including

Really the Wikipedia list is all I needed, but I found and worked with the other ones first.

So based on that here is the list of User-Agent string patterns I am treating as evidence the browser does understand XML+XSLT (Python/Perl regex):

.*MSIE 5.5.*
.*MSIE 6.0.*
.*MSIE 7.0.*
.*Gecko/2005.*
.*Gecko/2006.*
.*Opera/9.*
.*AppleWebKit/31.*
.*AppleWebKit/4.*

Note: this hoovers up a few browser versions I'm not entirely sure of: Minimo, AOL Explorer and OmniWeb. I'm fine with some such uncertainty, but if anyone has any suggestions for further refinement of this list, let me know. I'd like to keep it updated.

[Uche Ogbuji]

via Copia

The BDFL's boundary

I think my response to the recent news that Guido had "pronounced" Django as the Python Web framework was "so wot's 'e think 'e's doing, anyway?"

First of all, I don't think it's a big deal. Every few months something happens to get the Python world all abuzz about the number of Python Web frameworks. It might be the announcement of a new one (Django, TurboGears, web.py, Clever Harold, etc.). It may be some bit of Ruby on Rails news, which for some reason seems to strike fear into the depths of Pythoneers' consciousness. (I think O'Reilly amuses themselves by publishing book sales figures just to see how high they can get some in the Python community to jump). Whatever the stimulus, people blog back and forth about it, a handful of folks switch frameworks, just to be safe, but in the end it all dies down with the status quo still pretty much as it is. When Python Web frameworks die (which seems to be a rare event), they die with a whimper, and not a bang.

Some might think that a BDFL "pronouncement" is no ordinary stimulus, but I'd bet good money that this case is indeed quite ordinary. You see, that's where we have these lovely things called boundaries and it seems one has unmistakably been exceeded in this case. When the BDFL pronounces on a matter, it has always been with the intention, and effect, of settling a troubling argument once and for all. That's not possible in this case. It's not as with Python decorators where Guido's decision to include them, and his choice of syntax, became part of the language proper, and the only thing you could do if you disagreed was take the enormous step of forking the language. In this case, if you disagree with Guido, what do you do? You ignore him and keep using (or developing) your favorite framework. Nothing in the evolution of Python impedes you the slightest bit. In the end, I think this is what each individual developer will end up doing. A few will indeed give Django a new look, and some will convert, but in the main, all will be back to normal in a few weeks.

For my part every sniff I've had of Django makes me think it's way too large and monolithic for my taste. In other words, it's far too much like Zope. (Speaking of Zope, if it were ever appropriate to declare one Python framework to rule them all, Zope is the one with the popularity and maturity for such a role—which is why I'm glad such a declaration is not appropriate). I understand they're getting rid of a lot of the magic, which is another thing that gave me flashbacks of Zope, but I doubt that would be nearly enough for me. For now, I'll continue to work with my favorite three frameworks: CherryPy, RhubarbTart and Pylons, although I do carry some hope that the latter two will merge, and leave me with just two favorites. I'll also focus as much Web development work as I can in creating WSGI middleware, which provides the greatest flexibility.

See also:

[Uche Ogbuji]

via Copia

Discussion group for Atom protocol implementations in Python

I've had discussions about implementing Atom protocol in Python with man colleagues, and I decided to create a proper forum for discussion, and so the Google group atom-protocol-python was born.

A group dedication to discussion among developers and users of Python libraries and tools for processing the atom protocol, either as client or server.

Honestly, the idea of an Atom store, and of an Atom client is so broad that I expect there to be several implementations in Python. This group is to be very open, and I'd love for even folks working on competing implementations to join up, so we can at least discuss interoperability.

And don't forget there is also an Atom IRC channel where we can discuss Atom syntax and Atom protocol. And while I'm plugging stuff, I shan't forget Planet Atom.

[Uche Ogbuji]

via Copia