Tagging meets hierarchies: XBELicious

The indefatigable John L. Clark recently announced another very useful effort, the start of a system for managing your del.icio.us bookmarks as XBEL files. Of course not everyone might be as keen on XBEL as I am, but even if you aren't, there is a reason for more general interest in the project. It uses a very sensible set of heuristics for mapping tagged metadata to hierarchical metadata. del.icio.us is all Web 2.0-ish and thus uses tagging for organization. XBEL is all XML-ish and thus uses hierarchicy for same. I've long wanted to document simple common-sense rules for mapping one scenario to another, and John's approach is very similar to sketches I had in my mind. Read section 5 ("Templates") of the XBELicious Installation and User's Guide for an overview. Here is a key snippet:

For example, if your XBEL template has a hierarchy of folders like "Computers → linux → news" and you have a bookmark tagged with all three of these tags, then it will be placed under the "news" folder because it has tags corresponding to each level in this hierarchy. Note, however, that this bookmark will not be placed in either of the two higher directories, because it fits best in the news category. A bookmark tagged with "Computers" and "news" would only be placed under "Computers" because it doesn't have the "linux" tag, and a bookmark tagged with "linux" and "news" would not be stored in any of these three folders.

XBELicious is work in progress, but worthy work for a variety of reasons. I hope I have some time to lend a hand soon.

[Uche Ogbuji]

via Copia

"Semantic Transparency" by any other name

In response to "Semantic hairball, y'all" Paul Downey responded with approval of my skewering of some of the technologies I see dominating the semantics space, but did say:

..."semantic transparency" in "XML Schema" sounds just a little too scary for my tastes....

This could mean that the names sound scary, or that his interpretation of the idea itself sounds scary. If it's the latter, I'll try to show soon that the idea is very simple and shouldn't be at all scary. If it's the former, the man has a point. "Semantic Transparency" is a very ungainly name. As far as I can tell, it was coined by Robin Cover, and I'm sure it was quite suitable at the time, but for sure right now it's a bit of a liability in the pursuit that most interests me.

The pursuit is of ways to build on the prodigious success of XML to make truly revolutionary changes in data architecture within and across organizations. Not revolutionary in terms of the technology to be used. In fact, as I said in "No one ever got fired for...", the trick is to finally give age-old and well proven Web architecture more than a peripheral role in enterprise architecture. The revolution would be in the effects on corporate culture that could come from the increased openness and collaboration being fostered in current Web trends.

XML ushered in a small revolution by at least codifying a syntactic basis for general purpose information exchange. A common alphabet for sharing. Much is made of the division between data and documents in XML (more accurate terms have been proposed, including records versus narrative, but I think people understand quite well what's meant by the data/documents divide, and those terms are just fine). The key to XML is that even though it's much more suited to documents, it adapts fairly well to data applications. Technologies born in the data world such as relational and object databases have never been nearly as suitable for document applications, despite shrill claims of relational fundamentalists. XML's syntax mini-revolution means that for once those trying to make enterprise information systems more transparent by consolidating departmental databases into massive stores (call them the data warehouse empire), and those trying to consolidate documents into titanic content management stores (call them the CMS empire) can use the same alphabet (note: the latter group is usually closely allied with those who want to make all that intellectual capital extremely easy to exchange under contract terms. Call them the EDI empire). The common alphabet might not be ideal for any one side at the table, but it's a place to start building interoperability, and along with that the next generation of applications.

All over the place I find in my consulting and speaking that people have embraced XML just to run into the inevitable limitations of its syntactic interoperability and scratch their head wondering OK, what's the next stop on this bus route? People who know how to make money have latched onto the suspense, largely as a way of re-emphasizing the relevance of their traditional products and services, rather than as a way to push for further evolution. A few more idealistic visionaries are pushing such further evolution, some rallying under the banner of the "Semantic Web". I think this term is, unfortunately, tainted. Too much of the 70s AI ambition has been cooked into recent iterations of Semantic Web technologies, and these technologies are completely glazing over the eyes of the folks who really matter: the non-Ph.Ds (for the most part) who generate the vast bodies of public and private documents and data that are to drive the burgeoning information economy.

Some people building on XML are looking for a sort of mindshare arbitrage between the sharp vendors and the polyester hippies, touting sloppy, bottom-up initiatives such as microformats and folksonomies. These are cheap, and don't make the head spin to contemplate, but it seems clear to anyone paying attention that they don't scale as a way to consolidate knowledge any more than the original Web does.

I believe all these forces will offer significant juice to next generation information systems, and that the debate really is just how the success will be apportioned. As such, we still need an umbrella term for what it means to build on a syntactic foundation by sharing context as well. To start sharing glossaries as well as alphabets. The fate (IMO) of the term "Semantic Web" is instructive. I often joke about the curse of the s-word. It's a joke I picked up from elsewhere (I forget where) to the effect that certain words starting with "s", and "semantic" in particular are doomed to sound imposing yet impossibly vague. My first thought was: rather than "semantic transparency", how about just "transparency? The problem is that it's a bit too much of a hijack of the generic. A data architect probably will get the right picture from the term, but we need to communicate to ithe wider world.

Other ideas that occur to me are:

  • "information transparency"
  • "shared context" or "context sharing"
  • "merged context"
  • "context framing"
  • "Web reference"

The latter idea comes from my favorite metaphor for these XML++ technologies: that they are the reference section (plus card catalog) of the library (see "Managing XML libraries"). They are what makes it possible to find, cross-reference and understand all the information in the actual books themselves. I'm still fishing for good terms, and would welcome any suggestions.

[Uche Ogbuji]

via Copia

Mi...cro...for...mats...sis...boom...BLAH!

Mike Linksvayer had a nice comment on my recent talk at the Semantic Technology Conference.

I think Uche Ogbuji's Microformats: Partial Bridge from XML to the Semantic Web is the first talk I've heard on that I've heard from a non-cheerleader and was a pretty good introduction to the upsides and downsides of microformats and how can leverage microformats for officious Semantic Web purposes. My opinion is that the value in microformats hype is in encouraging people to take advantage of XHTML semantics in however a conventional in non-rigorous fashion they may. It is a pipe dream to think that most pages containing microformats will include the correct profile references to allow a spec-following crawler to extract much useful data via GRDDL. Add some convention-following heuristics a crawler may get lots of interesting data from microformatted pages. The big search engines are great at tolerating ambiguity and non-conformance, as they must.

Yeah, I'm no cheerleader (or even follower) for Microformats. Certainly I've been skeptical of Microformats here on Copia (1, 2, 3). I think that the problem with Microformats is that value is tied very closely to hype. I think that as long as they're a hot technology they can be a useful technology. I do think, however, that they have very little intrinsic technological value. I guess one could say this about many technologies, but Microformats perhaps annoy me a bit more because given XML as a base, we could do so much better.

Mike is also right to be skeptical that GRDDL will succeed if, as it presently does, it relies on people putting profile information into Web documents that use Microformats.

My experience at the conference, some very trenchant questions from the audience, A very interesting talk by Ben Adida right after my own, and more matters have got me thinking a lot about Microformats and what those of us whose information consolidation goals are more ambitious might be able to make of them. Watch this space. More to come.

[Uche Ogbuji]

via Copia

"XML in Firefox 1.5, Part 2: Basic XML processing"

"XML in Firefox 1.5, Part 2: Basic XML processing"

Subtitle Do a lot with XML in Firefox, but watch out for some basic limitations

Synopsis This second article in the series, "XML in Firefox 1.5," focuses on basic XML processing. Firefox supports XML parsing, Cascading Stylesheets (CSS), and XSLT stylesheets. You also want to be aware of some limitations. In the first article of this series, "XML in Firefox 1.5, Part 1: Overview of XML features," Uche Ogbuji looked briefly at the different XML-related facilities in Firefox.

I also updated part 1 to reflect the FireFox 1.5 final release.

This article is written at an introductory level. The next articles in the series will be more technically in-depth, as I move from plain old generic XML to fancy stuff such as SVG and E4X.

[Uche Ogbuji]

via Copia

Semantic hairball, y'all

I'm in San Jose and the Semantic Technology Conference 2006 has just wrapped up. A good time, as always, and very well attended (way up from even last year. This is an extraordinarily well organized conference). But I did want to throw up one impression I got from one of the first talks I went to.

The talk discussed an effort in "convergence" of MDA/UML, RDF/OWL, Web Services and Topic Maps. Apparently all the big committees are involved, from OMG, W3C, ISO, etc. Having been an enthusiastic early adopter in the first three technologies, I was violently struck by the casually side-stepped enormousness of this undertaking. In my view, all four projects had promising roots and were all eventually buried under the weight of their own complexity. And yet the convergence effort that's being touted seems little more sophisticated than balling all these behemoths together. I wonder what's the purpose. I can't imagine the result will be greater adoption for these technologies taken together. Many potential users already ignore them because of the barrier of impenetrable mumbo-jumbo. I can't imagine there would be much cross-pollination within these technologies because without brutal simplification and profiling model mismatches would make it impractical for an application to efficiently cross the bridge from one semantic modeling technology to the other.

I came to this conference to talk about how Microformats might present a slender opportunity for semantic folks to harness the volume of raw material being generated in the Web 2.0 craze. The trade-off is that the Web 2.0 craze produces a huge amount of crap metadata, and someone will have to clean up the mess in the resulting RDF models even if GRDDL is ever deployed widely enough to generate models worth the effort. And let's not even start on the inevitable meltdown of "folksonomies" (I predict formation of a black hole of fundamental crapitational force). I replaced my previous year's talk about how managers of controlled information systems could harness XML schemata for semantic transparency. I think next year I should go back to that. It's quite practical, as I've determined in my consulting experience. I'm not sure hitching information pipelines to Web 2.0 is the least bit practical.

I'm struck by the appearance of two extremes in popular fields of distributed information management (and all you Semantic Technology pooh-pooh-ers would be gob-smacked if you had any idea how deadly seriously Big Business is taking this stuff: it's popular in terms of dollars and cents, even if it's not the gleam in your favorite blogger's eye). On one hand we have the Daedalos committee fastening labyrinth to labyrinth. On the other hand we have the tower of Web 2.0 Babel. We need a mob in the middle to burn 80% of the AI-one-more-time-for-your-mind-magic off of RDF, 80% of the chicago-cluster-consultant-diesel off of MDA, 80% of the toolkit-vendor-flypaper off of Web services. Once the ashes clear, we need folks to build lightweight tools that actually would help with extracting value from distributed information systems without scaring off the non-Ph.D.s. I still think XML is the key, and that XML schema systems should have been addressing semantic transparency from the start, rather than getting tied up in static typing bondage and discipline.

I have no idea whether I can do anything about the cluster-fuck besides ranting, but I'll be squeezing neurons hard until XTech, which does have the eminent advantage of being an in-person meeting of the semantic, XML and Web 2.0 crowds.

Let's dance in Amsterdam, potnas.

See also:

[Uche Ogbuji]

via Copia

Schematron creeping on the come-up (again)

Schematron is the boss XML schema language your boss has never heard of. Unfortunately it's had some slow times of recent, but it's surged back with a vengeance thanks to honcho Rick Jelliffe with logistical support from Betty Harvey. There's now a working mailing list and a Wiki. Rick says that Schematron is slated to become an ISO Standard next month.

The text for the Final Draft Internation Standard for Schematron has now been approved by multi-national voting. It is copyright ISO, but it is basically identical to the draft at www.schematron.com

The standard is 30 pages. 21 are normative, including schema listings and a characterization of Schematron semantics in predicate logic. Appendixes show how to use other query language bindings (than XSLT1), how to use Schematron as a vocabulary, how to express multi-lingual dignostics, and a simple description of the design requirements for ISO Schematron.

Congrats to Rick. Here's to the most important schema language of them all (yes, I do mean that). I guess I'll have to check Scimitar, Amara's fast Schematron processor for compliance to the updated draft standard.

[Uche Ogbuji]

via Copia

Mystery of Google index drop solved?

Update. Corrected Christian's surname. Sorry, man.

A while ago I complained that uche.ogbuji.net disppeared from Google search results soon after I went to a CherryPy-based server. I'm up to say that I'm a goof, but I hope that admitting my silly error might save someone else some head-scratching (maybe even this gentleman)

I'm at least not alone in my error. The clue came from this message by the very smart Christian Wyglendowski In my case I was getting 404s for most things, but I did have a bug that was causing a 500 error on requests to robots.txt. Apparently the Google bot shuns sites with that problem. I can understand that but it's interesting that Yahoo doesn't seem to do the same thing, since my ranking didn't drop much there. I fixed the bug and then submitted a reinclusion request to Google following the suggestions in this article (I guess SEO advice isn't a completely parasitic endeavor). The body of my message was as follows:

I had a bug causing 500 error on robots.txt request, and I think that's why I got dropped from your index. I've fixed that bug, and would like to request reinclusion to your index. Thanks.

We'll see if that does the trick.

[Uche Ogbuji]

via Copia

Thunderbird crash recovery of composed messages

Dare laments Firefox's lack of text area content savings upon crashing. At first I found this strange because Firefox does save text area content in my experience. Then I remembered that I always install SessionSaver. I suspect that's where I might be getting my protection from. It did make me wonder whether XForms content is similarly protected. These days I like to use Chime's XForm document with the FireFox XForms extension to post to copia, and I should test how it handles crashes.

But the main point of this entry is to make a related rant and lazyweb request about Thunderbird. I learned the hard way that unlike Evolution, Thunderbird does not auto-save messages you are composing. That means that my habit of starting drafts and then switching to another task is very dangerous. If I do not manually save the draft and Thunderbird crashes, I lose my work. This is stupid. Evolution would save all compose window content in files named ".evolution-<opaque-id>", and would offer to restore these windows upon restart. If I can't find an extension along the lines of SessionSaver for Thunderbird, I'll have to ditch it. Do any of my LazyWeb friends know of such an extension? Googling and other searching turned up blanks.

[Uche Ogbuji]

via Copia

Agile Web #3: "Scripting Flickr with Python and REST"

"Scripting Flickr with Python and REST"

In his latest Agile Web column, Uche Ogbuji shows us how to use Python to interact with Flickr as a lightweight web service.

This Agile Web installment is fairly straightforward. I look at the several Python libraries for accessing Flickr from programs. They range from low level, thin veneers over the official Flickr API to the one higher level, more Pythonic library. And of course there's the obligatory package I just can't get to work.

[Uche Ogbuji]

via Copia