Dare on "contract-first"

Contract-First XML Web Service Design is No Panacea

Dare Obasanjo has had a lot of good comments lately on the whole REST/Web Services thing, and here he argues against claims that writing the interface definition first (the "contract") is the key to getting Web services right. I first heard this thinking that WSDL-first is the magic from Tony Hong years ago at one of the old SOAP interoperability derbies. I'd figured this was actually written into the WS stone tablets somewhere.

Dare dwells on the mismatch between WXS schema types and programming platforms, and gets at topics I discussed in "XML class warfare" and "The worry about program wizards" [both Application Development Trends], as well as other outlets. I think the mismatch alone isn't a problem: all modeling involves approximation. The problem is that people want it all to work without leaving the safety of their wizards, and I'm sorry for managers who don't like the smell of engineers, but all modeling also requires expertise. Wizards save some time cosmetically in the cheapest phase of development (implementation) just to charge very usurious interest on that savings throughout all the other phases. Customer problems such as Dare mentions are classic surfeit of the poisoned Kool-Aid, dating back to the 4GL heyday.

Model-first is still the key to solving such problems, I think. WXS and WSDL are too low-level to qualify. The model uses a specification language that expresses the solution in terms that are both formal, and close to the conception of the problem space. That takes collaboration between the subject matter experts and modeling experts during analysis and high-level design. It then requires further effort from a modeling expert to define the mappings between the model and artifacts of th eprogramming environment such as IDL, WSDL and WXS during the low-level design, which flows into implementation. No need for waterfalls or analysis-paralysis: these can all be rapid, iterative steps. But you can't just make a magic leap to low-level problem solving and expect not to pay the penalty in maintenance. This is the oldest advice in software engineering. It's amazing how few pay it the slightest attention.

[Uche Ogbuji]

via Copia

A 4Suite Appreciation

Sanjay Velamparambil's message

The best part of building 4Suite has always been the community. That goes way back to when Mike Olson and I were chuffed to hear from folks who'd stumbled across our inchoate DOM implementation (about 6 generations of the code ago), or our initial stabs at XPath and XSLT (about 4 generations in that case). Now the 4Suite community is a loud, thriving bazaar (100 messages to the user's list in a slow month), with all timbres of voices and all sorts of agendas. it's always pleasing to me as a developer to hear a voice ring out from the noise clearly appreciating the value of the work we've done. Thanks, Sanjay, for a very nice note.

[Uche Ogbuji]

via Copia

Deletion added to friendlier Amara mutation

As I've mentioned I added friendlier mutation API to Amara. Deletion didn't come up in the original discussion, but I just got around to addressing that as well. Now checked in are enhancements that support the following use cases:

Use case 10:

Source doc: spameggs
Code: del doc.a.b
Result: doc mutated to eggs

Use case 11:

Source doc: spameggs
Code: del doc.a.b[0]
Result: doc mutated to eggs

Use case 12:

Source doc: spameggs
Code: del doc.a.b[1]
Result: doc mutated to spam

Use case 13:

Source doc: spameggs
Code: del doc.a.b[2]
Result: IndexError

Use case 14:

Source doc: spam
Code: del doc.a.b
Result: doc mutated to spam

Of course there are oddities to go with the new convenience. Check out the following:

>>> from amara import binderytools
>>> doc = binderytools.bind_string("spamspam")
>>> unicode(doc.a.b)
u'spam'
>>> doc.a.b
<amara.bindery.b object at 0x685b2c>
>>> del doc.a.b
>>> unicode(doc.a.b)
u'spam'
>>> #Eh?  Still there, are ye?
...
>>> doc.a.b
<amara.bindery.b object at 0x685b8c>

Perfectly consistent with what the users seem to be saying, I think, but I'll be amazed if this doesn't trip up the odd fellow.

[Uche Ogbuji]

via Copia

Elements versus attributes in Amara

In the previous entry I discussed changes to Amara's mutation API. In the original discussion one of the things that came up was the old element/attribute conundrum. Take the following document:

Users like to be able to access both elements and attributes using friendly Python idiom, but here we have a name clash on the resulting a object.Right now Amara exposes the attribute as a.b and the element as a.b_, using name mangling to disambiguate.

The important thing to remember, however, is that such clashes are quite rare in practice, even when you throw in namespaces, so such mangling is rarely necessary, and I personally think Amara's current behavior makes sense. But I may just have a blind spot, so I've been paying attention to suggestions from others.

Jeremy Kloth suggested just always using different idioms. a.[u"b"] for the attribute and a.b for the element. This is not a bad idea, but I feel that given that clashes are rare, that it complicates the common case just to aid the rare case.

Luis Miguel Morillas had an idea I consider almost the opposite. Rather than completely separate element/attribute idioms, Luis suggests embracing how Amara has unified them. Right now Amara rolls up multiple elements of the same name in a convenient way:

Works such that a.b or a.b[0] yields the element with y and a.b[1] yields the element with z. Luis thinks that the following case should just be an extension of this:

And then a.b or a.b[0] would yields the attribute value (u"x"), a.b[1] would yield the element with y, and a.b[2] would yield the element with z. I kinda think of this idea as "so crazy it almost makes perfect sense", but it's way too big a change to introduce before Amara 1.0. I'd be curious to hear what others think of it. Luis actually brings it up in the context of mutation--see his original post (scroll to the bottom)--but I figure that the mutation API will follow naturally from the access API, so I'm focusing my thoughts a bit.

[Uche Ogbuji]

via Copia

Amara gets friendlier mutation

Tom Lazar asked for a friendlier idiom for mutating elements in Amara. I was reluctant at first because the simpler-on-the-surface idioms he wanted would require rather untidy idioms in the code. I relented to the argument that user convenience comes even before clean code. I finally got around to making and committing the changes today. I'd planned to release Amara 1.0b2 as soon as I'd made these changes, and the timing seems perfect since we've just released 4Suite 1.0b1, but the changes are intrusive enough that I think I'll give folks a chance to try things out from CVS and first see whether it craters for anyone. Please give it a go and give me feedback here or on the mailing list. Thanks.

Here are use cases illustrating the new idioms for Amara. I have added them to the test file mutation.py:

Use case 1:

Source doc: spam
Code: doc.a.b = u"eggs"
Result: doc mutated to eggs

Use case 2:

Source doc: spam
Code: doc.a.b[0] = u"eggs"
Result: doc mutated to eggs

Use case 3:

Source doc:
Code: doc.a.b = u"eggs"
Result: doc mutated to

Use case 4:

Source doc: spamspam
Code: doc.a.b = u"eggs"
Result: doc mutated to eggsspam

Use case 5:

Source doc: spamspam
Code: doc.a.b[0] = u"eggs"
Result: doc mutated to eggsspam

Use case 5:

Source doc: spamspam
Code: doc.a.b[1] = u"eggs"
Result: doc mutated to spameggs

Use case 6:

Source doc: spamspam
Code: doc.a.b[2] = u"eggs"
Result: IndexError

Use case 7:

Source doc: spam
Code: doc.a.b = u"eggs"
Result: doc mutated to spam

Note: attributes take precedence over same name elements in binding. See next use case.

Use case 8:

Source doc: spam
Code: doc.a.b_ = u"eggs"
Result: doc mutated to eggs

In a follow-up entry I'll talk about some other suggestions I've received on this matter.

[Uche Ogbuji]

via Copia

Elliotte Rusty Harold on "Managing XML data"

Managing XML data: A look ahead

IBM developerWorks has done well to corral Elliotte Rusty Harold for a column, Managing XML data, on tools and techniques for working with large collections of XML documents. The first article prepares the overall topic, summarizing likely subtopics and examining some history.

It's as good an opening as any for me to plug my own developerWorks article "Manage XML collections with XAPI". XAPI is a simple but well-considered community spec for XML database collection API.

Elliotte is one of the best writers for explaining XML technologies, right up there with Jeni Tennison. I think developerWorks is a greatly overlooked source of great materials on XML and other technologies. Maybe I'm biased because I write a great deal for them, but they also feature a lot of other very good writers and I you'd do well to check the site regularly, especially the XML zone, run very diligently by John Swanson.

[Uche Ogbuji]

via Copia

Installing 4Suite 1.0b1 as non-root

Update: How could I have forgotten --enable-unicode=ucs4 in the Python build instructions?

Just gathering up some details on how to install 4Suite as non-root (i.e. in a user's home directory). This is based on experience installing on Red Hat and Fedora Core, but should work for most POSIX environments.

If you don't have Python installed (or want your own copy):

Grab Python-2.3.x.tgz or Python-2.4.x.tgz and unpack:

tar zxvf ~/dl/Python-2.3.5.tgz
cd Python-2.3.5/
./configure --prefix=$HOME/lib --enable-unicode=ucs4

Pick whatever prefix works for you. --enable-unicode=ucs4 is essential IMO if you're doing XML processing.

make && make install
ln -s $HOME/lib/bin/python $HOME/bin

The last step is to put the Python exe you just built into your $PATH, presumably before any other Python exe in the system.

Now for 4Suite

Grab 4Suite 1.0b1

cd $DOWNLOADS
tar zxvf 4Suite-1.0b1.tar.gz
cd 4Suite-1.0b1
python setup.py config --prefix=$HOME/lib
python setup.py install

Notice the extra "setup.py config" step. This is the key to the whole thing. The "setup.py config" sets the location for all the files installed by 4Suite except for the Python library files, which are installed to the location determined by the Python executable used to invoke the setup script. For more on where 4Suite puts things, see Mike Brown's excellent document "4Suite Installation Locations".

There is also a --home option to setup.py config, but do not use this unless you really know what you're doing. Stick to --prefix.

Finally you may want to make a link for all the 4suite commands to your home's bin directory

ln -s $HOME/lib/bin/4* $HOME/bin

Now you can run the tests.

cd $HOME/lib/lib/4Suite

Remember that this is beta software, and some test failures are to be expected (heck, I'd be amazed if there weren't some test failures with the full 1.0 release).

[Uche Ogbuji]

via Copia

UBL 1.0 International Data Dictionary, First Edition

UBL home page

The First Edition of the UBL 1.0 International Data Dictionary (IDD) has been approved as an OASIS Committee Draft by the OASIS Universal Business Language Technical Committee and is now available for general use.

See my articles on UBL, such as:

Thinking XML: Universal Business Language (UBL Thinking XML: UBL 1.0 (plus ebXML Core Components and more)

UBL builds full document schemata from discrete data elements called business information entities (BIEs) in a process I describe in those articles. This translation makes BIEs more useful to Chinese (Traditional and Simplified), Japanese, Korean, and Spanish speakers, as well as English.

[Uche Ogbuji]

via Copia

4Suite 1.0b1

The announcement

Yaaaaay! This is nominally the feature freeze for the way, way overdue 4Suite 1.0. Let's hope this freeze accelerates progress towards full 1.0. Thanks to all our patient users. The focus of this release is probably performance. Jeremy Kloth, one of the best programming minds I've encountered, threw himself into the challenge of squeezing waste out of Domlette, without losing its great functional benefits. Some of the resulting gains are amazing. There are a lot of other fixes and enhancements, and I think it's a very solid release.

My next step is to release Amara 1.0b2 and kick off a branch to take better use of some of Jeremy's enhancements, including a super-efficient mini-SAX for Domlette.

4Suite home page

[Uche Ogbuji]

via Copia

4Suite for RDF

RDF hacking for fun and profit -- Bill de hÓra

"I find 4Suite to be stable software (tho' I'm not sure the RDF stuff is active anymore"

The main limitation with RDF in 4Suite is that it has not not been tracking the latest specs. This sucks, but it reflects the reality of "it works, and grand updates don't scratch anyone's itch". 4Suite's RDF library is actually very stable, and has been accumulating bug fixes, performance fixes and new drivers.

I'd say that 4RDF is fine if you don't need all the nuances of the new specs (which are modest enough). It is heavily used, which is one nice test of its suitability. We do have grand post-1.0 plans, but they are not yet set in stone. My guess is the following:

  • We'll abandon our own parser for rdflib. That parser is SAX-based, has been tracking the latest specs, and is very well tested. This is actually something we and the rdflib folks have been discussing near forever. We just haven't got around to the actual work (itch scratching need and all that).
  • We'll make the low-level API more Pythonesque. Developments such as iterators and generators have come since the original 4RDF effort, and we want to put them to good use.
  • We'll work in a Versa 2.0 (RDF query language). SPARQL is not doing it for me, and for a lot of my colleagues and corresponents. OK. I'll be blunt. I think SPARQL sucks, and I'm likely to support W3C XML Schema before I support it (hint: earthworms will fly of their own locomotion before either event).

"Uche et al have been working on anobind most recently)..."

Well, that's just me, no et alii so far. And Anobind is no more. It has been absorbed into Amara XML Toolkit. I'm developing Amara in order to complement 4Suite, not to supplant it in any way. It's an add-on to 4Suite that gives Pythoneers the super-friendly idioms they like. I still put into 4Suite about as much effort as I do Amara.

One shouldn't make any assumptions on 4Suite development based on Amara.

[Uche Ogbuji]

via Copia