Copia

XHTML to Excel as well as OOo

In my article "Use XSLT to prepare XML for import into OpenOffice Calc" I discussed how to use XSLT to format XML for import into OpenOffice Calc.

The popular open source office suite OpenOffice.org is XML-savvy at its core. It uses XML in its file formats and offers several XML-processing plug-ins, so you might expect it to have nice tools built in for importing XML data. Unfortunately, things are not so simple, and a bit of work is required to manipulate general XML into delimited text format in order to import the data into its spreadsheet component, Calc. This article offers a quick XSLT tool for this purpose and demonstrates the Calc import of records-oriented XML. In addition to learning a practical trick for working with Calc, you might also learn a few handy XSLT techniques for using dynamic criteria to transform XML.

I do wonder about formulae, though. Via Parand Tony Darugar I found "Excel Reports with Apache Cocoon and POI", which (in the section starting with the header "Rendering Machinery") shows that you can just as easily use such a technique for MS Excel. Good to know. I've recently had reason to figure out a system for aggregating reports to and from spreadsheets and XML, and I'd possibly have to deal with simple formulae. I guess I'll cross that bridge if I ever for sure get to it, and the full OpenDocument saved-file format is always an option, but if anyone does happen to have any quick pointers, I'd be grateful.

[Uche Ogbuji]

via Copia

From Fourthought to Kadomo

I founded Fourthought in June, 1998 with three other friends from college. Eight and a half years doesn't sound that long when I say it, but the near-decade fills my rear view mirror so completely that I can scarcely remember having done anything before it. That's probably a good thing as it means I don't much remember the years of perfunctory consulting at places such as IBM Global Services and Sabre Decision Technologies prior to making the leap to relative independence. It was in part the typical entrepreneurial yen of the immigrant and in part the urge to chart my own high-tech career course that drove me to take the risk and endure the ups and downs of running a consultancy.

And I did say Fourthought is in the rear-view mirror. Last week I accepted a position at The Kadomo Group, a very young solutions company focused in the semantic Web space. Kadomo was founded by Eric miller, former Semantic Web Activity Lead at the W3C. Eric and I have always looked for ways we could work together considering our shared interest in how strategic elements of the semantic Web vision can be brought to bear in practice. He and the other bright and energetic folks coming together under the Kadomo banner were a major part of my decision to join. It was also made clear to me that I would have a sizeable role in shaping all aspects of the company. I would be able, and in fact encouraged to continue my leadership in open source projects and community specification development. Last but not least the culture of the company is set up to suit my lifestyle very well, which was always one tremendous benefit of Fourthought.

--> Without a doubt we have the seeds at Kadomo to grow something much greater than Fourthought was ever likely to be. The company has not neglected resources for high-caliber business development, operations nor marketing. Committing to these resources was something we always had a hard time doing at Fourthought, and this meant that even though we had brilliant personnel, strong client references and a market profile disproportionate to the resources we devoted to marketing, we were never able to grow at a fraction of our potential. I've learned many of these lessons the hard way, and it seems clearly to me that Kadomo is born to greater ambition. One good sign is that I'll just be Chief Technical architect, allowed to focus primarily on the company's technology strategy. I will not be stranded juggling primary sales, operations as well as lead consultant responsibilities. Another good sign is that product development is woven into the company's foundation, so I can look forward to greater leverage of small-company resources.

Considering my primary responsibility for technology strategy it may seem strange to some that I'd join a semantic Web company, knowing
that I have expressed such skepticism of the direction core semantic Web technology has taken lately. I soured on the heaping helping of gobbledygook that was laden on RDF in the post-2000 round of specs, I soured on SPARQL as a query language when it became clear that it was to be as ugly and inelegant as XQuery. There have been some bright spots of lightweight goodness such as GRDDL and SKOS but overall, I've found myself more and more focused on XML schema and transform technology. My departure point for the past few years has been that a well-annotated syntactic Web can meet all the goals I personally have for the semantic Web. I've always been pretty modest in what I want from semantics on the Web. To put it bluntly what interests me most is reducing the cost of screen-scraping. Of course, as I prove every day in my day job, even such an unfashionable goal leads to the sorts of valuable techniques that people prefer to buzz about using terms such as "enterprise mashups". Not that I begrudge folks their buzzwords, mind you.

I still think some simplified version or profile of RDF can be very useful, and I'll be doing what I can to promote a pragmatic approach to semantic Web at Kadomo, building on the mountains of XML that vendors have winked and nodded into IT and the Web, much of it a hopeless congeries. There is a ton of problem in this space, and I believe, accordingly, a ton of opportunity. I think mixing in my somewhat diffractive view of semantic Web will make for interesting discussion at Kadomo, and a lot of that will be reflected here on Copia, which, after all, I share with Chimezie, one of the most accomplished users of semantic Web technology to solve real-world problems.

One ongoing opportunity I don't plan to leave behind is my strong working relationship with the Web Platform Engineering group at Sun. With recent, hard-earned success in hand, and much yet to accomplish, we're navigating the paper trail to allow for a smooth transition from my services as a Fourthought representative to those as a Kadomo representative.

I hope some of you will consider contacting Kadomo to learn more about our services and solutions. We're just getting off the ground but we have a surprising amount of structure in place for bringing focus to our service offerings, and we have some exciting products in development of which you'll soon be hearing more. If you've found my writings useful or examples of my work agreeable, do keep me in mind as I plough into my new role.keep in touch-->.

Updated to reflect the final settling into Zepheira. Most other bits are still relevant

[Uche Ogbuji]

via Copia

Atom Feed Semantics

Not a lot of people outside the core Semantic Web community actually want to create RDF, but extracting it from what's already there can be useful for a wide variety of projects. (RSS and Atom are first and relatively easy steps that direction.)

Terminal dump

chimezie@Zion:~/devel/grddl-hg$ python GRDDL.py --debug --output-format=n3 --zone=https:--ns=aowl=http://bblfish.net/work/atom-owl/2006-06-06/# --ns=iana=http://www.iana.org/assignments/relation/ --ns=some-blog=http://example.org/2003/12/13/  https://sommer.dev.java.net/atom/2006-06-06/transform/atom-grddl.xml
binding foaf to http://xmlns.com/foaf/0.1/
binding owl to http://www.w3.org/2002/07/owl#
binding iana to http://www.iana.org/assignments/relation/
binding rdfs to http://www.w3.org/2000/01/rdf-schema#
binding wot to http://xmlns.com/wot/0.1/
binding dc to http://purl.org/dc/elements/1.1/
binding aowl to http://bblfish.net/work/atom-owl/2006-06-06/#
binding rdf to http://www.w3.org/1999/02/22-rdf-syntax-ns#
binding some-blog to http://example.org/2003/12/13/
Attempting a comprehensive glean of  https://sommer.dev.java.net/atom/2006-06-06/transform/atom-grddl.xml
@@fetching:  https://sommer.dev.java.net/atom/2006-06-06/transform/atom-grddl.xml
@@ignoring types: ('application/rdf+xml', 'application/xml', 'text/xml', 'application/xhtml+xml', 'text/html')
applying transformation https://sommer.dev.java.net/atom/2006-06-06/transform/atom2turtle_xslt-1.0.xsl
@@fetching:  https://sommer.dev.java.net/atom/2006-06-06/transform/atom2turtle_xslt-1.0.xsl
@@ignoring types: ('application/xml',)
Parsed 22 triples as Notation 3
Attempting a comprehensive glean of  http://www.w3.org/2005/Atom

Via atom2turtle_xslt-1.0.xslt and Atom OWL: The GRDDL result document:

@prefix aowl: <http://bblfish.net/work/atom-owl/2006-06-06/#>.
@prefix iana: <http://www.iana.org/assignments/relation/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix some-blog: <http://example.org/2003/12/13/>.
[ a aowl:Feed;
     aowl:author [ a aowl:Person;
             aowl:name "John Doe"];
     aowl:entry [ a aowl:Entry;
             aowl:id "urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a"^^<http://www.w3.org/2001/XMLSchema#anyURI>;
             aowl:link [ a aowl:Link;
                     aowl:rel iana:alternate;
                     aowl:to [ aowl:src some-blog:atom03]];
             aowl:title "Atom-Powered Robots Run Amok";
             aowl:updated "2003-12-13T18:30:02Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>];
     aowl:id "urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6"^^<http://www.w3.org/2001/XMLSchema#anyURI>;
     aowl:link [ a aowl:Link;
             aowl:rel iana:alternate;
             aowl:to [ aowl:src <http://example.org/>]];
     aowl:title "Example Feed";
     aowl:updated "2003-12-13T18:30:02Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>].

Planet Atom's feed

@prefix : <http://bblfish.net/work/atom-owl/2006-06-06/#> .
 @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
 @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
 @prefix foaf: <http://xmlns.com/foaf/0.1/> .
 @prefix iana: <http://www.iana.org/assignments/relation/> .
 @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[] a :Feed ;
:id "http://planetatom.net/"^^xsd:anyURI;
:title "Planet Atom" ;
:updated "2006-12-10T06:57:54.166890Z"^^xsd:dateTime;
:generator [ a :Generator;
            :uri <>;
            :generatorVersion "";
            :name """atomixlib"""];
 :entry [  a :Entry;
           :title "The Darfur Wall" ;
           :author [ a :Person; :name "James Tauber"] ;
           :link [ a :Link;
                     :rel iana:alternate ;
                     :to [ :src <http://jtauber.com/blog/2006/12/10/the_darfur_wall>;]          
           ];
:updated "2006-12-10T00:13:34Z"^^xsd:dateTime;
:published "2006-12-10T00:13:34Z"^^xsd:dateTime;
:id "http://jtauber.com/blog/2006/12/10/the_darfur_wall"^^xsd:anyURI; ]

[Uche Ogbuji]

via Copia

XML 2006 Synopsis: Are we there yet?

Well, XML 2006 came and went with a rather busy bang. My presentation on using XSLT to generate Xforms (from XUL/XHTML) was well attended and I hoped it helped increase awareness on the importance and value of XForms, (perhaps) the only comprehensive vehicle by which XML can be brought to the web in the way proponents of XML have had in mind for some time. As Simon puts it:

XML pretty (much) completely missed its original target market. SGML culture and web developer culture seemed like a poor fit on many levels, and I can't say I even remember a concerted effort to explain what XML might mean to web developers, or to ask them whether this new vision of the Web had much relationship to what they were doing or what they had planned. SGML/XML culture and web culture never really meshed.

Most of the questions I received had to do with our particular choice of FormsPlayer (an Internet Explorer plugin) instead of other alternatives such as Orbeon, Mozilla, Chiba, etc. This was a bit unfortunate and an indication of a much larger problem in this particular area of innovation we lovingly coin 'web 2.0'. I will get back to this later.

I was glad to hear John Boyer tell me he was pleasantly surprised to see mention of the Rich Web Application Backplane W3C Note. Mark Birbeck and Micah Dubinko (fellow XForms gurus and visionaries in their own rights) didn't let this pass over their radar, either.

I believe the vision outlined in that note is much more lucid than a lot of the hype-centered notions of 'web 2.0' which seem more focused on painting a picture of scattered buzzwords ('mash-ups', AJAX etc..) than commonalities between concrete architectures.

Though this architectural style accommodates solutions based on scripting (AJAX) as well as more declarative approaches, I believe the primary value is in freeing web developers from the 80% of scripting that is a result of not having an alternative (READ: browser vendor monopolies) than being the appropriate solution for the job. I've jousted with Kurt Kagle before on this topic and Mark Birkeck has written extensively on this as well.

In writing the presentation, I sort of stumbled upon some interesting observations about XUL and XForms:

XUL relies on a static, inarticulate means of binding components to their behavior
XForms relies on XPath for doing the same
XUL relies completely on javascript to define the behavior of it's widgets / components
A more complete mapping from XUL to XForms (than the one I composed for my talk) could be valuable to those more familiar with XUL as a bridge to XForms.

At the very least, it was a great way to familiarize myself with XUL.

In all, I left Boston feeling like I had experienced a very subtle anti-climax as far as innovation was concerned.
If I were to plot a graph of innovative progression over time, it would seem to me that the XML space has plateaued as of late and political in-fighting and spec proliferation has overtaken truly innovative ideas. I asked Harry Halpin about this and his take on it was that perhaps "XML has won". I think there is some truth to this, though I don't think XML has necessarily made the advances that were hoped in the web space (as Simon St. Laurent put it earlier).

There were a few exceptions however

XML Pipelines

I really enjoyed Norm Walsh's presentation on XProc and it was an example of scratching a very real itch: consensus on a vocabulary for XML processing workflows. Though, ironically, it probably wouldn't take much to implement in 4Suite as support for most (if not all) of the pipeline operations are already there.

I did ask Norm if XProc would support setting up XPath variables for operations that relied on them and was pleased to hear that they had that in mind. I also asked about support for non-standard XML operations such as XUpdate and was also pleased to hear that they had that covered as well. It was worth noting that XUpdate by itself could make the viewport operation rather redudant.

The Semantic Web Contingent

There was noticeable representation by semantic web enthusiasts (myself, Harry Halpin, Bob Ducharm, Norm Walsh, Elias Torres, Eric Prud'hommeux, Ralph Hodgson, etc..) and their presentations had somewhat subdued tones (perhaps) so as not to incite ravenous bickering from narrow-minded enthusiasts. There was still some of that however as I was asked by someone why RDF couldn't be persisted natively as XML, queried via XQuery, and inferred over via extension functions! Um... right... There is some irony in that as I have yet to find a legitimate reason myself to even use XQuery in the first place.

The common scenario is when you need to query across a collection of XML documents, but I've personally preferred to index XML documents with RDF content (extracted from a subset of the documents), match the documents via RDF, isolate a document and evaluate an XPath against it essentially bypassing the collection extension to XPath with a 'semantic' index. Ofcourse, this only makes sense where there is a viable mapping from XML to RDF, but where there is one I've preferred this approach. But to each his/her own..

Content Management API's

I was pleasantly surprised to learn from Joel Amousou that there is a standard (a datastore and language-agnostic? standard) for CMS APIs. called JSR-170. The 4Suite repository is the only Content Mangement System / API with a well though-out architecture for integrating XML & RDF persistence and processing in a way that emphasizes their strengths with regard to content management. Perhaps there is some merit in investigating the possibility of porting (or wrapping) the 4Suite repository API as JSR-170? Joel seems to think so.

Meta-stylesheets

Micheal Kay had a nice synopsis of the value of generating XSLT from XSLT – a novel mechanism I've been using for some time and it was interesting to note that one of his prior client projects involved a pipeline that started with an XForm, post-processed by XSLT and aggregated with results from an Xquery (also generated from XSLT).

Code generation is a valuable pattern that has plenty unrecognized value in the XML space and I was glad to see Micheal Kay highlight this. He had some choice words on when to use XSLT and when to use XQuery that I thought was on point: Use XSLT for re-purposing, formatting and use Xquery for querying your database.

GRDDL

Finally, I spent quite some time with Harry Halpin (chair of the GRDDL Working Group) helping him installing / using the 4Suite / RDFLib client I recently wrote for use with the GRDDL test suite. You can take what I say with a grain of salt (as I am a member and loud, vocal supporter), but I think that GRDDL will end up having the most influential impact in the semantic web vision (which I believe is much less important than the technological components it relies on to fulfill the vision) and XML adoption on the web than any other, primarily because it allows content publishers to leverage the full spectrum of both XML and RDF technologies.
Within my presentation, I mention an architectural style I call 'modality segregation' that captures the value proposition of XSLT for drawing sharp, distinguishable boundaries (where there were once none) between:

content
presentation
meaning (semantics)
application behavior

Modality Segregation

I believe it's a powerful idiom for managing, publishing, and consuming data & knowledge (especially over the web).

Harry demonstrated how easy it is to extract review data, vocabulary mappings, and social networks (the primary topic of his talk) from XHTML that would ordinarily be dormant with regards to everything other than presentation.
We ran into a few snafus with 4Suite when we tried to run Norm Walsh's hCard2RDF.xslt against Dan Connolleys web site and Harrys home page. We also ran into problems with the client (which is mostly compliant with the Working Draft).

I also had the chance to set Harry up with my blazingly fast RETE-based N3 reasoner, which we used to test GRDDL-based identity consolidation by piping multiple GRDDL results (from XHTML with embedded XFN) into the reasoner, performing an OWL DL closure, and identifying duplicate identities via Inverse Functional Properties (smushing)

As a result of our 5+ hour hackathon, I ended up writing 3 utilities that I hope to release once I find a proper place for them:

FOAFVisualizer - A command-line tool for merging and rendering FOAF networks in a 'controlled' and parameterized manner
RDFPiedPipe - A command-line tool for converting between the syntaxes that RDFLib supports: N3, Ntriples, RDF/XML
Kaleidos - A library used by FOAFVisualizer to control every aspect of how an RDF graph (or any other network structure) is exported to a graphviz diagram via BGL-Python bindings.

In the final analysis, I feel as if we have reached a climax in innovation only to face a bigger challenge from politics than anything else:

RDFa versus eRDF
SPARQL without entailment versus SPARQL with OWL entailment
XHTML versus HTML5
Web Forms versus XForms
Web 2.0 versus Web 3.0
AJAX versus XForms
XQuery versus XSLT
XQuery over RDF/XML versus SPARQL over abstract RDF
XML 1.0 specifications versus the new 2.0 specifications

The list goes on. I expressed my concerns about the danger of technological camp warefare to Liam Quin (XML Activity Lead) and he concurred. We should spend less time arguing over whether or not my spec is more l33t than yours and more time asking the more pragmatic questions about what solutions works best for the problem(s) at hand.

[Uche Ogbuji]

via Copia

Today's XML“wot he said”

Is it just me, but isn't XQuery just a tardy ugly solution looking for a problem? And thinking it's going to excite people who write Mashups is hopeful, but possible if we see it supported in the browser, I guess. But the syntax. Ugh!
—Paul Downey

I wasn't able to attend XML 2006 because I knew the e-commerce launch I've been working with at Sun was too close. This week I've been in marathon planning sessions planning the next phase of the Sun project. I'm in the company of clever, engaging people who've traveled from various U.S. and European spots, so I get a little conference flavor in a much more focused setting. But I'm still getting a bit of the XML 2006 fix through attendee Weblogs.

The first thing that came to mind was "WTF! That's a hella lotta XQuery at the conference". I certainly don't miss that part. And now they have programming extensions of some sort? XQueryP? My stars! what a yucky idea. The enterprisey set seem to like XQuery, and I grant there's more substance to XQuery than some enterprisey fads such as WS-Jigsaw. So FLWR and friends are here to stay, whether I like it or not. No worries. I'll just arrange my affairs so that I have to deal with the muck as little as possible. But that doesn't mean I can't enjoy the occasional barb from a fellow nay-sayer. Wot Paul said.

[Uche Ogbuji]

via Copia

New shopping cart features for Sun.com

Weblogging has been pretty thin for me lately, as has everything else. for the past few months now I've been working on a large XML-driven integration project at Sun Microsystems. I consult as a data architect to the group that drives the main www.sun.com Web site, as well as product marketing pages and other data-driven venues. That's been a large part of my day job for the last four years, and in the most recent project Sun is working in a versatile new e-commerce engine. They put a lot of care into analysis for integrating this into existing product pages, so I found myself waist deep in XML pipeline architecture and data flows from numerous source systems (some XML, some ERP, some CMS and every other TLA you can fathom). The XML pipeline aggregates the sources according to a formal data model, the result of which feeds normalized XML data into the commerce back end. A veritable enterprise mash-up. It's been a lot of work, leavened by collaboration with a top-notch team, and with the launch last week of the new system I've found palpable reward.

Web Usability guru Martin Hardee, whose team put together the stringent design parameters for the project, mentioned the new feature this week.

We're already off building on this success, and it's more enterprise-grade (yeah, buzzword, sue me) XML modeling and pipeline-driven architecture with a global flavor for a good while to come, I expect. And probably not all that much time for Weblogging.

[Uche Ogbuji]

via Copia

Today's (non-XML) WTF or High stakes in the SOA sweeps

So here's Infoworld's daily nugget of wisdom, poised over the provocation: "Should you fire your enterprise architect in 2007? Take the test."

The largest and most disturbing issue ... is the fact that there seems to be a huge chasm between the traditional enterprise architecture crowd, and those looking at the value of SOA. Indeed, enterprise architecture, as a notion, has morphed from an approach for the betterment of corporate IT to a management practice, at least for some. Thus, the person that is needed to understand and implement the value of SOA is sometimes not the current enterprise architect in charge. -- David Linthicum.

So the SOA wars are heating up. More and more smart people are pointing out that the emperor has no clothes; but stakes is still crazy high. Some folks haven't yet made all their money from SOA. So how do the stakeholders respond? With cold-blooded threats.

"So your architect isn't all bought into SOA, eh? Well fire him, dammit."

And oh, isn't it delicious irony that this dude is claiming it's the experienced architects cautious on SOA who are establishing a pet management practice within IT. Oh, there's no way the SOA sellers could be guilty of that. Noooo. Never. Never. Never. Neeeever!

[Uche Ogbuji]

via Copia

Thinking“Thinking XML: The XML decade”

I've been very pleased at the response to my article “Thinking XML: The XML decade”. I like to write two types of articles: straightforward expositions of running code, and pieces providing more general perspective on some aspect of technology. I've felt some pressure in the publishing world lately to dispense with some of the latter in favor of puff pieces for the hottest technology fad; right now, having "SOA", "AJAX" or "podcast" in the title is the formula for selling a book or article. I've been lucky throughout my career to build relationships with editors who trust my judgment, even if my writing is so often out of the mainstream. As such, whenever an article that's not of any obvious appeal touches a chord and provokes intelligent response, I welcome it as some more ammunition about following the road less trampled in my future writing.

Karl Dubost of the W3C especially appreciated my effort to bring perspective to the inevitable tensions that emerged as XML took hold in various communities.

Uche Ogbuji has written an excellent article The XML Decade. The author is going through XML development history as well as tensions between technological choices in XML communities. One of the fundamental choices of creating XML was to remove the data jail created by some application or programming languages.

Mike Champion, with whom I've often jousted (we have very different ideas of what's practical) was very kind in "People are reflecting on XML after 10 years".

A more balanced assessment of the special [IBM Systems Journal issue] is from Uche Ogbuji. There is quite a nice summary of the very different points of view about what XML is good for and how it can be used. He reminds us that today's blog debates about simplicity/complexity, tight/loose coupling, static/dynamic typing, etc. reflect debates that go back to the very beginning. I particularly like his pushback on one article's assertion that XML leverages the value of "information hiding" in OO design.

It was a really big leap for me from OO (and structured programming/abstract data type) orthodoxy to embrace of XML's open data philosophy (more on that in “Objects. Encapsulation. XML?”). It did help that I'd watched application interface wars from numerous angles in my career: RPC granularity, mixins and implementation versus interface inheritance, SQL/C interface coupling, etc. It start to become apparent to me that something was wrong when we were translating natural business domain problems into such patently artificial forms. RDBMS purists have been making a similar point for ages, but in my opinion, they just want to replace N-tier applications artifice with their own brand of artifice. XML is far from some magic mix for natural expression of the business domain, but I believe that XML artifacts tend to be fundamentally more transparent than other approaches to computing. In my experience, I've found that it's easier to maintain even a poorly-designed XML-driven system than a well-designed system where the programming is preeminent.

Mike's entry goes on to analyze the usefulness, in perspective, of the 10 guiding principles for XML. When he speaks of "a more balanced view" he's contrasting the SlashDot thread on the article, which is mostly filled with the sort of half-educated nonsense that drove me from that site a few years ago (these days I find the most respectable discussion on reddit.com). Poor Liam Quinn spent an awful lot of patience on a gang of inveterate flame-throwers. Besides his calm explanations the best bits in that thread were on HL7 and ASN.1.

Supposedly the new version 3 [HL7] standard (which uses the "modeling approach") will be much more firm with the implementors, which will hopefully mean that every now and then one implementation will actually be compatible with another implementation. I've looked over their "models" and they've modelled a lot of the business use-case stuff for patient data, but not a lot of the actual data itself. Hopefully when it's done, it'll come out a bit better baked than previous versions.

That does not sound encouraging. I know Chimezie, my brother and Copia co-conspirator, is doing some really exciting work on patient data records. More RDF than XML, but I know he has a good head for the structured/unstructured data continuum, so I hope his work propagates more widely than just the Cleveland Clinic Foundation.

J. Andrew Rogers made the point that Simon St.Laurent and I (among others) have made in the many cases where people misuse XML rather than something more suitable, such as ASN.1.

The "slow processing" is caused by more than taking a lot of space. XML is basically a document markup but is frequently and regular used as a wire protocol, which has very different design requirements if you want a good standard. And in fact we already have a good standard for this kind of thing called "ASN.1", which was actually engineered to be extremely efficient as a wire protocol standard. (There is also an ITU standard for encoding XML as ASN.1 called XER, which solves many of the performance problems.)

Of course, I think he goes a bit too far.

The only real advantage XML has is that it is (sort of) human readable. Raw TLV formatted documents are a bit opaque, but they can be trivially converted into an XML-like format with no loss (and back) without giving software parsers headaches. There is buckets of irony that the deficiencies of XML are being fixed by essentially converting it to ASN.1 style formats so that machines can parse them with maximum efficiency. Yet another case of computer science history repeating itself. XML is not useful for much more than a presentation layer, and the fact that it is often treated as far more is ridiculous.

I'd actually argue that XML is suited for a (semi-structured) model layer, not a presentation layer. For one thing, wire efficiency often counts in presentation as well. But his essential point is correct that XML is an awful substitute for ASN.1 as a wire protocol. By the same token, the Web services stack is an awful substitute for CORBA/OMA and even Microsoft's answers to same. It seems the industry is slowly beginning to realize this. I love all the many articles I see with titles such as "We know your SOA investment is stuck firmly in the toilet, but honest, believe us, there is an effective way to use this stuff. Really."

Anyway later on in that sub-thread:

The company I work for has had a lot of success with XML, and are planning to move the internal data structure for our application from maps to XML. There is one simple reason for our sucess with it: XSLT. A customer asks for output in a specific format? Write a template. Want to display the data on a web page? Write a template that converts to HTML. Want to print to PDF? Write a template that converts to XSL, and use one of many available XSL->PDF processors. Want to use PDF forms to input data? Write a template to convert XFDF to our format. Want to import data from a competitor and steal their customer? You get the picture.

Bingo! The secret to XML's value is transformation. Pass it on.

In "XML at 10: Big or Little?" Mark writes:

What the article ultimately ends up being about is the "Big" idea of XML vs. the oftentimes "Little" implementation of it. The Big idea is that XML can be used in a bottom-up fashion to model the grammar of a particular problem domain in an application- and context-independent manner. The little implementation is when XML is essentially used as a more verbose protocol for data interchange between existing applications. I would guess that is 90+ percent of what it is currently used for.

He's right, and I this overuse is one of the reasons XML so often elicits reflex hostility. Then again, anything that doesn't elicit such hostility in some quarters is, of course, entirely irrelevant. I think it's a good thing that cannot be said of XML.

[Uche Ogbuji]

via Copia

“Thinking XML: The XML decade”

Subtitle: Thoughts on IBM Systems Journal's retrospective of XML at ten years (or so)
Synopsis: IBM Systems Journal recently published an issue dedicated to XML's 10th anniversary. It is primarily a collection of interesting papers for XML application techniques, but some of its articles offer general discussion of the technical, economic and even cultural effects of XML. There is a lot in these papers to draw from in thinking about why XML has been successful, and what it would take for XML to continue its success. This article expands on some of these topics that are especially relevant to readers of this column.

In this article I touch on points from what the XML community can learn from the COBOL boom of the 90s, to why it's dangerous to use XML as a basis for traditional application modeling systems. It's a bit of a gestalt approach to analyzing some of the key issues facing XML technology at this milestone, and what it might take to ensure XML is still relevant and valuable after another decade. And by that I mean valuable in itself, and not just as a legacy format with valuable data.

[Uche Ogbuji]

via Copia

Amara 1.2 goes alpha, and other developments

First of all 4Suite went 1.0 rather quietly because the day job schedule has left room for very little besides quiet releases. It's probably just as well because by common standards 4Suite has been 1.0 grade for years. Under any less conservative version numbering scheme it would be 4Suite 3.0 by now.

I'm pushing Amara to 1.2 (a more typical progression of version numbers in that case) and after a developers-only alpha, we've released alpha 1.2a2 publicly, but quietly. As I've hinted before I have a lot of ideas for Amara post 1.2. The next major branch will be a full rewrite, probably to be released as Amara 2.0. Anyway, see the draft for the 1.2 full release announcement.

I also put together a quick start recipe for Amara on Ubuntu, and Luis Miguel Morillas has one for Windows users in Spanish. He says he'll be translating it to English soon, and when he does, I'm sure he'll link it from his "Amara Installers for Windows Users" page.

[Uche Ogbuji]

via Copia

Copia

Ogbujis on an abundance of topics

Tag xml

XHTML to Excel as well as OOo

From Fourthought to Kadomo

Atom Feed Semantics

XML 2006 Synopsis: Are we there yet?

XML Pipelines

The Semantic Web Contingent

Content Management API's

Meta-stylesheets

GRDDL

Today's XML“wot he said”

New shopping cart features for Sun.com

Today's (non-XML) WTF or High stakes in the SOA sweeps

Thinking“Thinking XML: The XML decade”

“Thinking XML: The XML decade”

Amara 1.2 goes alpha, and other developments