XsltRenderer for PyBlosxom

XsltRenderer.py

The XSLT renderer is a renderer that takes any output from the default renderer (i.e. after processing using flavors), and applies an XSLT, using the result. If you have handy XSLTs that start with XML vocabularies such as Atom or XHTML, you can create such output just as you currently do in PyBlosxom, by appliying your flavor of choice. You can then use the XSLTs you have to produce additional types of output quite easily.

I used this to add an RSS 1.0 feed here on Copia. It starts with the Atom flavor, and uses Antonio Cavedoni's XSLT to translate the atom to RSS 1.0. Taking this approach in general made thinsg a lot easier for me. I went from nothing to RSS 1.0 feed in about two hours of hacking, which included a lot of learning about the plug-in API. Based on the hassles that other PyBlosxom feed projects seemed to be working through, I think the detour through XSLT was quite valuable. It made it easier to lean on the shoulders of others outside the PyBlosxom community.

Anyway some other sample uses:

  • A very simple "plain text" feed. Start with the Atom flavor and then use a tiny (5 lines or so) XSLT to strip all tags
  • An alternative, image-free view. Start with a flavor that generates XHTML, then apply a tiny (5 lines or so) XSLT to strip all images

There is one configuration option for this plug-in-- xslt_trigger_suffix. See the module header doc for more info.

[Uche Ogbuji]

via Copia

Notes on Porting an XSLT/HTML Application to XForms/SOAP

Motivation

Several years ago, I wrote a front-end to 4Suite that fullfilled the following requirements:

  • Default Content management
  • System management
  • Adhoc RDF Query interface

It was to be written as a set of XSLT stylesheets which generated HTML pages composed mostly of HTML forms. The 4Suite repository consists of user-specified content (XML/RDF and otherwise) as well as system resources: XML documents that provide repository-functionality such as user-management, servers, XML-to-RDF mapping, etc. The idea was to build special user interface for managing these system documents. The less satisfactory alternative is to modify the raw XML. This would require an understanding of the structure of these documents as they were being modified and introduced the possibility of creating invalid documents which broke their expected functionality.

At the time, I thought that XSLT alone was the perfect means to do this because of the whole slew of extensions available for managing resources in the underlying 4Suite repository. Mike Brown wrote a very concise overview of what the 4Suite repository is, available here. There is also a useful overview on the architecture of 4Suite's XSLT-based web application framework.

In the end, this project (called the 4Suite Dashboard) became very difficult to maintain because of the spaghetti-like nature of the XSLT. There are two factors in this:

  • At the time when I wrote it, I was less adept at using XSLT to its strengths
  • The cumbersome, ad-hoc processing of form data - which was the primary component of the user interface

As a result, it has slowly lagged behind with the rest of 4Suite and is essentially unusable because of the inordinate amount of effort that would be neccessary to refactor it to a more maintainable state. Motivated mostly by the great success we have had in the Cleveland Clinic Foundation with research regarding the use of XForms as the primary means of serving user-interfaces to a semi-structured, metadata-driven database, I decided to port the old Dashboard code to an XForms/SOAP-based solution.

The XForm 'sweet-spot'

The primary motivating factor was the idea that with XForms you kill several birds with one stone:

  1. You move a majority of XML processing to the client
  2. You reduce the complexity of request processing by piggy-backing on the XForms approach to Web applications
  3. Makes for an overall cleaner architecture by seperating what the user sees from the actual processing (by the application) of the user's actions within the user-interface
  4. Session management: SOAP messages can be submitted within an existing session.

The result was a cleaner, leaner application that was much easier to implement, given my better appreciation for XSLT as the framework for an application as well as my familiarity with XForms. Below is a high-level diagram of the main components:

XDashboard Architecture

Secure, remote service invokations

One of the goals of the port was to demonstrate the submission of session-managed SOAP messages. By having a session created at the server when a request to manage a resource came, the session id can be passed along with the resulting XForm so all subsequent service requests will authenticate at the repository using this session id (generated at the server). Since the session is specific to the 4Suite user that requested the XDashboard screen (an HTTP authentication request is sent when the application is originally loaded, requiring a valid 4Suite user to enter their credentials), service requests on resources not available to the user will fail with an appropriate SOAP Fault detailing the server-side security violation.

Base64 encoded XML content from an XForm

The other interesting thing I was able to demonstrate was the usefullness of submitting XML strings as base64 encoded content via SOAP. One of the primary arguments against SOAP as a remote procedure protocol is it's use of a verbose syntax as the medium for communciation (XML). Now imagine a SOAP message whose purpose was for modifying the content of an existing XML resource. The instinctive first solution would probably be to submit the XML document as a fragment within the SOAP envelope like so:

[SOAP:Envelope]
   [SOAP:Body]
     [foo:setContent]
       [path] .. path to document [/path]
       [src]... new document as a fragment ...[/src]
     [/foo:setContent]
   [/SOAP:Body]
[/SOAP:Envelope]

But imagine the extra processing that the SOAP endpoint must contend with when you consider the SOAP message as a whole. 4Suite's SOAP server allows content to be submitted as plain text or as Base64 encoded content. In addition, the XForm's upload component is restricted only for nodes with the following datatypes:

  • xsd:anyURI
  • xsd:base64Binary
  • xsd:hexBinary

The result of this is that for nodes bound to xsd:base64Binary, an XForms processor is responsible for Base64 encoding data selected via the xforms:upload component for submission, which simplifies the problem for the case where the XML content you wish to submit is uploaded from a file on the local filesystem of the client. However, the previous dashboard allowed XML content for an arbitrary resource in the repository to be submitted from a textarea. In the XForms scenario, this caused the requirement of having a javascript function do this encoding explicitely and binding the encoding to text collected from a textarea.

Ironically, at the time when I was dealing with this problem there was an ongoing thread in the W3C's www-forms list about the ins/outs of encoding XML content as strings for submission from an XForm.

The XDashboard Services

The following is the list of services setup and used by the application (with an accompanying description of what each does):

  • addAcl (Add an acl identifier to the acl key. If the specified key does not exist on the resource this function has the same functionality of setAcl)
  • createContainer (Create's a 4Suite repository Container)
  • createDocument (Creates a document with the given document definition name, path, type, and source. if the type is not specified it will attempt to infer the type based on IMT and the src)
  • createRawFile (Creates a raw file resource, using the specified, path, internet media type, and source string.)
  • delete (Delete this object from the repository)
  • fetchResource (Fetch a resource from the system. path is a string)
  • getContent (Get the string content of this resource)
  • removeAcl (Remove the acl identifier from the acl key. If the specified aclKey is not present on the resource or the specified aclIdent is not in the key, then do nothing. Remember that entries inherited from the parent are not included in the acl on the actual resurce. In order to override inherited permissions, set to no access rather than trying to remove)
  • setAcl (Replace the aclKey with a single entry of aclIdent)
  • setContent (Set the string content of this resource)
  • setImt (Sets the Internet Media Type of the raw file resource)
  • setPassword (Change the password of the specified user to the SHA-1 hashed value)
  • setServerRunning (change the state of a 4Suite repository server to running/stopped. This operation is executed as an XUpdate at the server-side modifying the Server document to reflect the correct state. The XUpdate document is serialized from the client and submitted to the repository).
  • xUpdate (Allows XML content to be updated with the XUpdate protocol. updateSrc is a string that represents the XUpdate to be applied to the resource. extraNss is a dict of additional namespace bindings to pass to the XUpdate processor, if necessary)

Compliance Notes

As FormsPlayer is probably the most mature of all the XForms processors available (the list is growing), it was the targeted XForms processor for this application. For the most part, this doesn't introduce any issues of non-compliance with XForms as everything was done using mostly XForms 1.0, a little XForms 1.1 (xforms:duplicate action, primarily), and 2 FormsPlayer specific capabilities.

It must be mentioned, however, that FormsPlayer is an Internet Explorer plugin solution to XForms. The tradeoff, essentially, is browser compatibility for the full complement of XForms functionality that comes with FormsPlayer. Below is a briefing of the deviations from pure XForms 1.0:

XForms 1.1 constructs

xforms:duplicate was used for the copying of nodes from a source to an origin

Forms Player specific capabilities

  • xforms:setvalue was used for deep-copying nodes to a target location without destroying existing childnodes (the limitation of it's XForms 1.0 counterpart: xforms:copy)
  • fp:serialise function was used to facilitate the retrival of a node's XML representation in order to Base64 encode for submission
  • fp:HTMLserialise was used for debugging purposes
  • Forms Player's inline capability was used in order to access external javascript functions from within XPath expressions

Resources

This application relies on the most recent version of 4Suite's SOAP server (can be retrieved from CVS). A listing of the most recent version of this SOAP server can be found here. The XDashboard is bundled as a 4Suite repository application and so must be installed to a running repository using the 4ss install command. It should be sufficient to unpack the tar / zip ball and run 4ss install against the setup.xml file.

The XDashboard application can be downloaded from as a tgz archive or zip archive from:

Bear in mind, this application is a proof of concept / demo, so it's likely to have undiscovered typos/bugs

This demo/application makes use of and refers to the following third-party resources:

  • Mike Brown and Jeni Tennison's tree-view.xslt - For rendering a decorated view of an XML document
  • Micah Dubinko's XForms Relax NG schema
  • MSXML's default IE XML DHTML stylesheet [see]: - For rendering a decorated view of an XML document
  • xslt2xform's 'Powered by XForm' logo - Pending a winner to Micah's logo contest

[Chimezie Ogbuji]

via Copia

Copia gets RSS 1.0, courtesy XSLT

I added RSS 1.0 feeds to Copia yesterday. It was a fun Sunday evening hack project, and even though parts of PyBlosxom still make me scratch my head, I'm in even more awe of its hackability.

It seems RSS 1.0 has always been a sketchy area for PyBlosxom. Because RSS 1.0 essentially needs 2 modes, the item list and the item details, you can't emit it linearly, and so you can't use a PyBlosxom flavor, as with RSS 0.91 or atom. I found some discussion of an RSS 1.0 from time to time in the archives, including this Perl port, but nothing readily usable, so I had to roll my own.

But what does an XML head do when faced with such a task? He runs a simple equation: existing atom flavor for PyBlosxom + plenty of Atom to RSS 1.0 XSLT transforms out there = decision to implement an XsltRenderer for PyBlosxom. The XsltRenderer can take a flavor's output and run it through an XSLT to produce the final output.

More on that in a follow-up item, but for now, I've added the whole site feed to the feed discovery convention in the HTML headers, and also to the right hand listing. There are also topic-specific feeds for all keywords.

So, for example, here is the RSS 1.0 Python feed, and here is the XML Atom feed. Note: we use lowercase for keywords on Copia.

[Uche Ogbuji]

via Copia

TreeBind, and incidentally Nux

In my previous entry I said "When you compare the weary nature of, say Java XML data bindings, Amara is a nice advertisement for Python's dynamicism." Interestingly, Eric van der Vlist recently mentioned to me a project in which he attempts to address some of these deficiencies within Java. TreeBind, is "yet another XML <-> Java binding API." The TreeBind page says:

The difference between TreeBind and most of the other Java binding APIs is that we've tried to minimize the need for any type of schema or configuration file and to maximize the usage of introspection of Java classes in order to facilitate the integration with existing classes."

It's about time, but is Java's introspection really enough? It doesn't save you from welding to Java's type rigidity.

It makes me recall Wolfgang Hoschek's response to one of my Amara announcements on XML-DEV. The Amara example was:

The following is complete code for iterating through address labels in an XML document, while never loading more memory than needed to hold one label element:

from amara import binderytools
        for subtree in binderytools.pushbind('/labels/label',
        source='labels.xml'):
            print subtree.label.name, 'of', subtree.label.address.city

And Wolfgang responded:

Very handy!

FYI, analog example Java code for the current Nux version reads as follows:

StreamingTransform myTransform = new StreamingTransform() {
             public Nodes transform(Element subtree) {
                 System.out.println(XQueryUtil.xquery(subtree, "name").get(0).getValue());
                 System.out.println("of");
                 System.out.println(XQueryUtil.xquery(subtree, "address/city").get(0).getValue());
                 return new Nodes(); // mark current element as subject to garbage collection
             }
        };
        Builder builder = new Builder(new StreamingPathFilter("/labels/label", null).
            createNodeFactory(null, myTransform)); //[Line split by Uche for formatting reasons]
        builder.build(new File("labels.xml"));

It's not as compact as with Amara, but still quite handy...

This is certainly a great leap forward from Java/XML APIs I've seen, even from plain XOM. I'd expect a similar leap, albeit in a different direction, in TreeBind. But in my biased opinion, even such impressive leaps lose a lot of luster when compared to the gains in expressiveness provided by the Python example. Line count is just a bit of the picture. For overall idiom, and the amount of conceptual load buried in each construct, it's hard to even place the Python and Java examples on the same scale.

In here, I think, is the crux of where dynamic language advocates are unimpressed by the productivity gains claimed by XQuery advocates. Productivity gained through declarativity should complement rather than interfere with productivity gained through natural, expressive idiom. XQuery does not meet this test. By imposing a ponderous type framework over XML it provides productivity on one hand while stifling the power of dynamic languages. this is why in Amara I seek to harness the declarative power of unencumbered little languages such as XPath and XSLT patterns to the expressive power of Python. I think this gives us a huge head start over systems using XQuery and Java introspection to tame the chore of XML processing.

See also:

[Uche Ogbuji]

via Copia

Another on Amara

"XML Parsing with Python"--Derek Willis

Let’s face it, relational database types don’t like XML files. They’re structured, sure, but not in quite the way we’re used to. So pulling them apart is a chore for which there are many tools but few that seem to fit easily into the CAR Computer-Assisted Reporting] mindset. Enter [Python and the Amara toolkit. Amara builds on 4Suite, which processes XML and RDF, and it works in a very Pythonic way by essentially turning XML data into Python objects. If I have to parse XML into a relational database, Amara is my tool of choice.

One thing that I've especially appreciated about feedback on Amara is the way users cite it as an example of the essential power of Python, and why it is a draw from even outside of Python. This has always been my aim, more conventionally with 4Suite, and more subversively with Amara. When you compare the weary nature of, say Java XML data bindings, Amara is a nice advertisement for Python's dynamicism.

Later on Willis concludes:

CAR folks can think of [XML as processed through Amara] as calling field names, and instead of printing out elements you can insert them into a database. Nice and easy - the way everybody says XML should be.

And just the way I intended. Nice.

[Uche Ogbuji]

via Copia

Amara Appreciations

Last week I mentioned a kind message Sanjay Velamparambil sent to the 4Suite list. As I said, "it's always pleasing to me as a developer to hear a voice ring out from the noise clearly appreciating the value of the work we've done." This week, Amara gets some love.

In a message to the 4Suite list Wednesday Tom Lazar said:

i just wanted to chime in that just yesterday I had an urgent, real- world problem in where I needed to manipulate an XML Document
programmatically - grep/sed/awk on the textfile would have been too
difficult ("now you've got two problems"[tm]) and an XSLT alone
wouldn't have done it either.

using Amara i hacked a python script that did the whole job in
(literally) ten minutes. as I started the script I was a bit
apprehensive: afterall our script would pick out certain nodes and
assign new values to them (or delete them, depending) and then write
it back to the file - but it all worked without a hitch.

and looking at the script, you'd never think it was handling XML at
all ;-)

so thanks for making Amara and keep up the good work!

You're very welcome, Tom, and thanks for all the help with improving the bindery mutation API.

As if that wasn't enough, the same day I got a private message from another user. I haven't asked his permission so I won't identify him at the moment, but he actually put together a video clip of himself demonstrating Amara. In the clip he shows the eight or so custom Python modules for XML processing that were replaced with a one-liner using Amara. My only regret is that he and his team had to write all that other code in the first place, before finding Amara, but at least they don't have to maintain it any more.

Feedback like that makes all the long hours worthwhile.

[Uche Ogbuji]

via Copia

Of BisonGen

For a while now we've been hosting what I consider to be quite the hidden gem in the infrastructure of the 4Suite project: BisonGen.

BisonGen is a Python tool that reads in an input file in a simple XML format based on Bison's text format, and creates LALR parsers in both pure Python and as a Python/C extension. This way the resulting parsers have a fast version and a more portable version, for maximum flexibility.

In this article I provide information on BisonGen, in preparation for more complete packaging to come later (probably by the next 4Suite release).

The latest version of BisonGen can always be downloaded from the FTP site. The most recent release was 0.8.0b1 in mid April. See Jeremy's announcement.

See the simple, built-in example to get a picture of what BisonGen expects for input. More sophisticated examples are in 4Suite: in Ft/Xml/XPath, Ft/Xml/XSLT and Ft/Rdf/Parsers/Versa. Martin v. Löwis presented "Towards a Standard Parser Generator" at IPC10. His overview of BisonGen is very useful. He did note the big performance advantage of BisonGen parsers over pure Python counterparts (assuming, of course, that you use the resulting C parser from BisonGen).

Some other useful resources on BisonGen:

[Uche Ogbuji]

via Copia

One Binary XML to rue them all

"Re: [xml-dev] XML-enabled databases, XQuery APIs"
"An Evaluation of Binary XML Encoding Optimizations for Fast Stream Based XML Processing"

"Our results have shown that with the exception of trivial binary encoding strategies, most binary encoding optimizations yield performance improvements in only limited applications or situations, and/or restrict the ability for pipelined XML processing. This supports the contention in [22] that there is not one binary encoding standard that could satisfy the needs of all applications. On the contrary, however, a trivial binary encoding standard would appear to at least provide performance benefits to most applications, without any significant drawbacks other than compromising the view-source principle."

Interesting paper. I think this is what those of us who oppose binary XML understand intuitively, but it's nice to see the emergence of research along those lines.

[Uche Ogbuji]

via Copia

Python/XML community: XIST, XSV and a Wiki français

XIST 2.9
XSV 2.10-1
Wiki : Python et XML

XIST 2.9. XIST (simple, free license) is a very capable package for XML and HTML processing and generation. I covered it recently in "Writing and Reading XML with XIST". The long list of changes is given in Walter Dörwald's announcement.

XSV 2.10-1 . XML Schema Validator (XSV) is a GPLed WXS validator written in Python.I didn't see any announcement of the XSV release except on Cafe con Leche, but there it is. The Web page has a host of changes listed for this one as well.

Wiki : Python et XML. Rémi à écrit: «Je me suis permis d'ouvrir une nouvelle page sur le Wiki avec mes, maigres, connaissances acquises sur Python et XML. »

Rémi wrote: "I was allowed to open a new page on the Wiki with the modest knowledge I've picked up of Python and XML."

[Uche Ogbuji]

via Copia