Thinking XML #34: Search engine enhancement using the XML WordNet server system

Updated—Fixed link to "Serving up WordNet as XML"

"Thinking XML: Search engine enhancement using the XML WordNet server system"

Subtitle: Also, use XSLT to create an RDF/XML representation of the WordNet data
Synopsis: In previous installments of this column, Uche Ogbuji introduced the WordNet natural language database, and showed how to represent database nodes as XML and serve this XML though the Web. In this article, he shows how to convert this XML to an RDF representation, and how to use the WordNet XML server to enrich search engine technology.

This is the final part of a mini-series within the column. The previous articles are:

In this article I write my own flavor of RDF schema for WordNet, a transform for conversion from the XML format presented previously, and a little demo app that shows how you can use WordNet to enhance search with synonym capabilities (and this time it's a much faster approach).

I hope to publicly host the WordNet server I've developed in this series once I get my home page's CherryPy setup updated for 2.2.

See other articles in the column. Comments here on Copia or on the column's official discussion forum. Next up in Thinking XML, RDF equivalents for the WordNet/XML.

[Uche Ogbuji]

via Copia

" Process Atom 1.0 with XSLT"

"Process Atom 1.0 with XSLT"

Learn XSLT techniques for processing Atom documents. In this tutorial, author Uche Ogbuji shows how with real-world use cases. (free registration required)

Atom 1.0 is [the] Internet Engineering Task Force (IETF) standard for Web feeds -- information updates on Web site contents. Since Atom is an XML format, XSLT is a powerful tool for processing it. In this tutorial, Uche Ogbuji looks at XSLT techniques for processing Atom documents, addressing real-life use cases.

This tutorial shows you how to:

  • Navigate the basic structure of Atom 1.0 documents using XPath expressions
  • Use these expressions to drive XSLT transformations of Atom source files
  • Deal with the complications of text and markup embedded in Atom files You will also learn how to use XSLT templates to generate valid Atom files, and how to check the validity of the results.

A companion piece to my recent XML.com article "Handling Atom Text and Content Constructs", this is a task-driven tutorial, taking a more deliberate pace and focusing on XSLT.

developerWorks has had a lot to say about Atom lately, courtesy James Snell (who is also writing a lot of useful Atom extension drafts).

I guess how do you celebrate Atom's promotion to RFC 4287? Why by cooking up even more reading material.

[Uche Ogbuji]

via Copia

4Suite XML 1.0b3

I posted the 4Suite XML 1.0b3 announcement today. This was supposed to be 1.0rc1 but then Jeremy went and added this little feature. Yeah, 4Suite now has full DTD validation, written in C. Just use the ValidatingReader. PyXML is no longer necessary for any 4Suite feature. I just need to figure out whether Jeremy ever sleeps. I hope to move quickly on a 1.0rc1. Perhaps even in January. We'll see.

I've updated my on-line manual

[Uche Ogbuji]

via Copia

Getting Some Mileage out of Semantic Works

Well, I recently had a need to write-up an OWL ontology describing the components of a 4Suite repository configuration file (which is expressed as an RDF graph, hence the use of OWL to formalize the format). There has been some mention (with regards to the long-term roadmap of the 4Suite repository component) of the possiblity of moving to a pure XML format.

Anyways, below is a diagram of the model produced by Semantic Works. I still think Protege and SWOOP provide much more bang for your buck (when you consider that they are free and Semantic Works isn't) and produce much more concise OWL/RDFS XML. But the ability to produce diagrams of this quality of a complex OWL ontology is definately a plus.

Semantic Works Diagram of 4Suite Repository Configuration Ontology Semantic Works Diagram of 4Suite Repository Configuration Ontology Semantic Works Diagram of 4Suite Repository Configuration Ontology

[Chimezie Ogbuji]

via Copia

Whose fault is the modern pin-up image?

I happened across a very interesting Slashdot posting. To quote liberally:

Consider this quote:

Naomi Wolf is much more blunt. In her book The Beauty Myth, she argues that this very standard of beauty set forth by the media is the primary mechanism of women's oppression by men. She discusses the "suffering caused by trying to meet the demands of the thin ideal"

This would be a great idea, except that laying this all at the feet of men is more than a bit unfair to me. To be sure, the ideal of feminine beauty that is espoused by male oriented media seems extreme -- until you compare it to the images in female oriented media. The male favored image requires surgery, unconscionable quantities of gym time, fasting, and a soupcon of digital touch up. But it's nothing compared to the gaunt images that women pay to consume.

Of course, can say that it's men who run the media companies that produce these images, and you'd be wrong on two counts. The "Cosmo Girl" was the creation of Helen Gurley Brown, after all. But Ms. Brown's sex is not at issue at all. The point is that women and men who run media companies end up doing much the same thing, because they're driven by the same economic forces. The Cosmo Girl wants to have it all. The reason she wants to have it all is because promoting the ideal of having it all pleases the advertisers; it involves not a little buying.

The reason that media female body image is so unrealistic is simple economics. If scarcity enhances value, then the unobtainable must be perceived as infinitely valuable. For the man, the companies inevitably take the general parameters indicating robust healthy child bearing capability and simply nip and tuck it to the edge of impossibility. You meet a woman who looks like that once in a blue moon, and she's definitely not going to be interested in you. Voila! the unobtainable.

For women, the companies produce an image that is starved (never mind this contradicts the male oriented images). A normal woman's homestatic processes will torture her into sumbission long before she reaches this stage. Voia! once more the unobtainable.

It's not the opression of women by men; at least if it is nobody's ever invited me to the meetings where this is arranged. It's not as personal as that. The problem is the antithesis of that. It's completely impersonal. it's economic and thus about systems and performance metrics and quarterly goals, not anything as personally satisfying as domination I'm afraid. And when the putatively immoral male sex is displaced in a position by the putatively superior female sex, there's bound to be very little difference in results. They're just cogs in the machine either way.

It's a fairly sterile point of view, but that works for me because my view of the matter is similarly sterile. The battle of the sexes in Western media usually a lot frothier, tending towards extremes such as the idea that women might be wired for domination by men, or that men might be obsoleted by reproductive technology (by the same idiot-savant line of reasoning one would arrive at the converse absurdity that women should be wary of oven and incubator technology).

I've always found extraordinary the idea that men would put forth signals towards extremes such as Anorexia. I would have expected what's claimed in the above comment—that most men would go for exaggerated lineaments of child-bearing function. But I'm also not sure how far one can take the notion that women are the driving force behind skinny chic. Competitiveness among women alone cannot explain, say, the flapper look, or more recently those Calvin Klein models, or even—to zoom way in—how Fiona Apple became a sensation by starring in a supposedly sexy video looking like a twelve year old foster child. (I sure as hell don't imagine for a moment that it was her high-school-burlesque singing that put her on the map). There must have been a critical mass of men going about saying "that's, like, the acme of hot, dude".

Any why do men not seem as keen on taking the ubiquitous washboard abs in the media as excuse for destructive auto-sculpture? Surely there is more to the overall mechanism of destructive self-image than evil media. Surely what you see in the media is little more than reaction to deeper forces.

It all just goes to reinforce that Economics is an accumulation of small, rational tendencies that in the large seem to drive the most irrational trends.

[Uche Ogbuji]

via Copia

AJAX and the Back button

Sylvain and I have discussed recently his discomfort with Web browser state of the art in the age of AJAX (to use a grand term, even though I strongly believe that AJAX is nothing but an incremental gathering of conventions rather than anything new and special). He has gathered his thoughts in a blog posting "The chicken and egg problem". I posted a comment, but I thought I might copy the comment here as well.

[Let me summarize] in brief my reasons for thinking that the current system is not broken, and that we do not need to change anything fundamental about browsers.

First of all the basic semantic of "link history" in a Web browser has not changed since the Mosaic days for a very good reason: it is empirical to HTTP, REST and all that. At each point a browser is at a particular resource, and it moves from one resource to another according to actuation of simple REST verbs. Within each resource the browser can do all sorts of complex things, including showing animations (Shockwave, SVG, etc.), providing mini-applications to the user (Java applets, Flash, AJAX, etc.) and more, but the resource has not changed. The boundary of resource is defined by the service provider, and the browser simply reflects that in the history, URL bar and other features. I don't think the back and forward buttons should be overloaded for any operation within a resource. They should not be used as hot buttons in Flash apps or in AJAX apps. This violates the layering that is so important to the success of the Web.

If service providers want to provide navigation within a particular resource, they should do so within the application, and not at the REST level. I want my Front office app to have an "Undo" button (which makes much more sense than "Back"). [Why do I need chameleon browser chrome when I can just do <xforms:button id="undo"><xforms:caption>Undo</xforms:caption>...</xforms:button>?] When I click browser "Back" I want that to exit the application and go to the previous resource.

IMO People think they have trouble with the back button and Ajax because they do not appreciate protocol layering very well, and because the AJAX tools do not yet help in this understanding. I think a better understanding of this layering and better tools are what's needed, not a major redesign of the browser idiom.

[Uche Ogbuji]

via Copia

Agile Web #2: "Handling Atom Text and Content Constructs"

"Handling Atom Text and Content Constructs"

Uche Ogbuji's Agile Web column returns with a look at handling some of the trickier issues in the Atom Syndication Format, which has recently become RFC 4287, an internet standard.

Second article in my new column is out. In this one I focus on Atom text and content constructs. I spent more time on the Atom examples and less on the sample processing code, but I thought more of the former would be especially useful. I've been working with and writing about Atom a lot lately, and in fact I have an IBM developerWorks tutorial for Atom processing in XSLT in production. It should be live some time today.

Joe Gregorio has been working the other half of the Atom pie (old joke for folks who've been following Atom), and he has a very timely new article out: "Catching Up with the Atom Publishing Protocol".

And once again, if you'd like to discuss Atom (syntax or publishing protocol), please do join us on the #atom channel on irc.freenode.net.

[Uche Ogbuji]

via Copia

CVS log since tag?

My usual trick for creating a "What's changed" summary in my projects is to check CVS for commits since the previous release. SO if the previous release was 24 October 2005 I run

cvs log -NSd ">2005/10/24"

It would be nice if I could do the same thing while specifying the last revision, rather than a date. I wish I could do:

cvs log -NSr<last-rev>::HEAD

but that seems to work only for numerical revisions rather than tags. Does anyone know of any neat hacks to achieve this? Note: if you prefer to advocate Subversion, that's OK, but at least be sure to specify the precise command to do this with SVN so that others can benefit from the example.

Note: this is coming up for me now because I'm wrapping up the packaging for 4Suite 1.0b3 release. One huge new feature: Full DTD support for all the parsers (written in C by the indefatigable Jeremy). One big fix: build support for 64 bit Intel architecture machines.

[Uche Ogbuji]

via Copia