Compositional Evaluation of W3C SPARQL Algebra via Reduce/Map

[by Chimezie Ogbuji]

Committed to svn

<CIA-16> chimezie * r1132 rdflib/sparql/ (Algebra.py bison/Processor.py bison/SPARQLEvaluate.py): Full implementation of the W3C SPARQL Algebra. This should provide coverage for the full SPARQL grammar (including all combinations of GRAPH). Includes unit testing and has been run against the old DAWG testsuite.

Tested against older DAWG testsuite. Implemented using functional programming idioms: fold (reduce) / unfold (map)

Does that suggest parallelizable execution?

reduce(lambda left,right: ReduceToAlgebra(left,right),{ .. triple patterns .. } => expression

expression -> sparql-p -> solution mappings

GRAPH ?var / <.. URI ..> support as well.

The only things outstanding (besides the new modifiers and non-SELECT query forms), really, are:

  • a pluggable extension mechanism
  • support for an exploratory protocol
  • a way for Fuxi to implement entailment.
  • other nice-to-haves..

.. Looking forward to XTech 2007 and Semantic Technology Conference '07

Chimezie Ogbuji

via Copia

One Bad Apple Spoiling the Bunch

When you quantify how much leverage has been given to vendor politics over common sense, XHTML is the epitome of where the marriage of validity and well-formedness did much more harm than good. To this day, I do not understand why even well-informed people believe there is no value in XML without validation. For me, the sad story that is (or was?) XHTML says it all, even much more loudly than evidence of the same dowry being paid by the next generation of XML standards and WS-*

[Uche Ogbuji]

via Copia

Where does the Semantic Web converge with the Computerized Patient Record?

I've been thinking alot about the "Computer-based Patient Record: CPR", an acronym as unlikely as GRDDL but once again, a methodology expressed as an engineering specification. In both cases, the methodology is a mouthful, but a coherent architectural "style" and requires a mouthful of words to describe. Other examples of this:

  • Representation State Transfer
  • Rich Web Application Backplane
  • Problem-oriented Medical Record
  • Gleaning Resource Descriptions from Dialects of Languages

The term itself was coined (I think) by the Institute of Medicine [1]. If you are in healthcare and are motivated by the notion of using technology to make healthcare effective and inexpensive as possible, you should do the Institute a favor and buy the book:

National Institutute of Medicine, The Computer-Based Patient Record: An Essential Technology for Health Care - Revised Edition., 1998, ISBN: 0309055326.

I've written some recent slides that are on the W3C ESW 'wiki' which all have something to do with the idea in one way or another:

The nice thing about working in a W3C Interest Group is that the work you do is for the general publics benefit, so it is a manefestation of the W3C notion of the Semantic Web, which primarily involves a human social process.

Sorta like a technological manefestation of our natural darwinian instinct.

That's how I think of the Semantic Web, anyways: as a very old, living thread of advancements in Knowledge Representation which intersected with an anthropological assesment of some recent web architecture engineering standards.

Technology is our greatest contribution and so it sohould only make sense that wherer we use it to better our health it should not come as a cost to us. The slides reference and include a suggested OWL-sanctioned vocabulary for basically implementing the Problem-oriented Medical Record (a clinical methodology for problem solving).

I think the idea of a free (as in beer) vocabulary for people who need healthcare has an interesting intersection with the pragmatic parts of the Semantic Web (avoiding the double quotes) vision. I have exercised-induced asthma (or was "diagnosed" as such when I was younger). I still ran Track-and-Field in Highschool and was okay after an initial period where my lungs had to work overtime. I wouldn't mind hosting RDF content about such a "finding" if it was for my person benefit that a piece of software could do something useful for me in an automated, deterministic way.

"HL7 CDA" seems to be a freely avaiable, well-organized vocabulary for describing messages dispatched between hospital systems. And I recently wrote a set of XSLT templates which extract predicate logic statemnts about a CDA document using the POMR ontology and the other freely available "foundational ontologies" it coordinates. The CDA document on xml.coverpages.org has a nice concise description of the technological merits of HL7 CDA:

The HL7 Clinical Document Architecture is an XML-based document markup standard that specifies the structure and semantics of clinical documents for the purpose of exchange. Known earlier as the Patient Record Architecture (PRA), CDA "provides an exchange model for clinical documents such as discharge summaries and progress notes, and brings the healthcare industry closer to the realization of an electronic medical record. By leveraging the use of XML, the HL7 Reference Information Model (RIM) and coded vocabularies, the CDA makes documents both machine-readable (so they are easily parsed and processed electronically) and human-readable so they can be easily retrieved and used by the people who need them. CDA documents can be displayed using XML-aware Web browsers or wireless applications such as cell phones..."

The HL7 CDA was designed to "give priority to delivery of patient care. It provides cost effective implementation across as wide a spectrum of systems as possible. It supports exchange of human-readable documents between users, including those with different levels of technical sophistication, and promotes longevity of all information encoded according to this architecture. CDA enables a wide range of post-exchange processing applications and is compatible with a wide range of document creation applications."

A CDA document is a defined and complete information object that can exist outside of a messaging context and/or can be a MIME-encoded payload within an HL7 message; thus, the CDA complements HL7 messaging specifications.

If I could put up a CDA document describing the aspects of my medical history that were in my benefit to be freely available (at my discretion), I would do so in the event some piece of software could do some automated things for my benefit. Leveraging a vocabulary which essentially grounds an expressive variant of predicate logic in a transport protocol makes the chances that this happens, very likely. The effect is as multiplicative as the human population.

The CPR specification is also very well engineered and much ahead of its time (it was written about 15 years ago). The only technological checkmark left is a uniform vocabulary. Consensus stands in the way of uniformity, so some group of people need to be thinking about how the "pragmatic" and anthropological notions of the Semantic Web can be realized with a vocabulary about our personally controlled, public clinical content. Don't you think?

I was able to register the /cpr top level PURL domain and the URL http://purl.org/cpr/1.0/problem-oriented-medical-record.owl# resolves to the OWL ontology with commented imports to other very relevant OWL ontologies. Once I see a pragmatic demonstration of leaving owl:imports in a 'live' URL, I'll remove them. It would be a shame if any Semantic Web vocabulary terms came in conflict with a legal mandate which controlled the use of a vocabulary.

Chimezie Ogbuji

via Copia

First day as a Python/Mac developer

This is primarily just my scattershot notes on getting myself ready for Python and C development on Mac. It really is a confusing picture as to how to get started with Python development on the Mac. You can get a bunch of bits and pieces from the official Mac page for Python , the Python/Mac FAQ and a few other places, but it's hard to put it al together to understand how The OS X bundled Python, MacPython, Fink, MacPorts, framework or non framework, etc. all fit together, and how to navigate the options. It didn't help that important Wiki pages such as the FAQ had been vandalized, and I was not able to fix it for some reason.
It seems to me that the reason for all this confusion is that a person just needing to run some cool Python script they downloaded would go about things in a very different way from someone like me who needs to heavily maintain software that uses advanced Python/C facilities. It all comes down to the split personality that comes from the OS X way of life superimposed upon the UNIX way of life.

Picking a distribution

Also see:

The key section from the FAQ is the following, pasted from the diff of the vandalized page:

Q: Python overload! I've got Apple's Python, Jack's Python, Fink's Python... A: Newcomers to Python-on-X are often confused by the several distributions of Python available. Each flavor has a history and a reason for existance, but if you're starting out, you probably want to look at the "official unofficial" builds of MacPython 2.4 on http://undefined.org/python and install additional packages like numarray or PIL from http://pythonmac.org/packages. These builds have a feature set that supersedes that of the beloved 'official MacPython builds' by Jack Jansen and solve many of the obstacles that are described by the FAQ entries on this page.

I followed this advice and went with MacPython, but I also set up MacPorts for some flexibility (see below).

Getting started with MacPython (including setuptools)

I grabbed and installed python-2.5-macosx.dmg dmg/python-2.5-macosx.dmg from the page recommended in the FAQ.

I went with the approach of MacPython in system directories, but packages I build from source in my home directory. This meant the following in my ~/.profile, for a start, added after the "# Setting PATH for MacPython 2.5" section added by the MacPython installer.

export PATH=$HOME/bin:/usr/local/bin:$PATH

export PYVERSION=2.5
export PYSITE=$HOME/Library/Python/$PYVERSION/site-packages
export PYTHONPATH=$PYSITE

And then the following in ~/.pydistutils.cfg:

[install]
install_lib = ~/Library/Python/$py_version_short/site-packages
install_scripts = ~/bin

I also had to do a one-time

mkdir ~/bin
mkdir -p $PYSITE

I used setuptools for the first 4Suite and Amara install, following the OS X specific instructions.
One wrinkle was that Firefox 2.0.0.1 refused to save the page with ez_setup.py so I could run it. I tried changing locations and all that to no avail. Smells like a bug. I just used Safari to get it in the end. I noticed that OS X doesn't seem to come with wget. After this set-up, a simple:

easy_install Amara

Worked like a champ, and so I had 4Suite and Amara installed. I also got them set up in CVS easily enough, with the above basic config in place.

MacPorts

I also installed MacPorts, following the install instructions.
I was able to log into Apple Developer Network very easily using my Apple Store ID. One problem is that The instructions say:

Click Customize, expand the Applications category and click the checkbox beside X11 SDK to add it to the default items.

But the XCode 2.4.1dmg I got had "X11 SDK" greyed out. I just went ahead anyway, and it turns out you must install X11 itself before XCode will allow you to install the X11 SDK. Makes sense, but the instructions on the install page have this backward.

As for installing X11 itself, the page says:

Insert the OS X 10.4 installation DVD and run the package named Additional Software.

For the MacBook Pro the installation DVD is labeled "Mac OS X Install Disc 1". The package is actually named "Optional Installs". I clicked through until I got to the page where I could select X11:

You also need to use sudo for the ports update, which isn't clear in the instructions:

sudo port -vd selfupdate

And that's really about as far as I got. I installed MacPorts just to have it handy, just in case. I might first put it to use for wget, which I won't be able to live without very long, and really should come with OS X.

[Uche Ogbuji]

via Copia

The new MacBook Pro

I ended up changing my return flight from Chicago to Denver because of the chaos from last week's huge snow-storm. By the time I got back early yesterday morning all seemed back to normal--and FedEx had attempted three deliveries of my new MacBook Pro. I went to pick it up yesterday, and when he handed me the package I peered suspiciously at the label as I hefted it, amazed at its small size and lightness. I was used to my Dells coming in near-cubic-meter boxes with respectable weight. The label seemed to be right, but I opened the package in the car, anyway. Inside I found an even more svelte box, with the unmistakable goods. Consumer Reports won't be dishing out a Golden Cocoon award to Apple any time soon, and that's a very good thing. I took a few pictures too (see below) of the out-of-box-experience, using my Dell Inspiron 8600 for comparison. The MacBook is much thinner and a bit lighter, and about the same in the other dimensions, despite having a 17" widescreen to the Dell's 15". I just hope I won't miss the Dell's WUXGA resolution too dearly.

My first moves were to install Firefox and Thunderbird. I've done a lot of research while waiting for the new computer and Tim's and Mark's public repudiation of some of the more proprietary aspects of Mac's bundled tools resonated strongly with me. The arguments that Mozilla interfaces were non-Aqua and thus ugly are completely uninteresting to me. I don't subscribe to the school of thought that only Apple is capable of good interface design. More importantly, I've used Safari and Mail.app quite a bit, and I don't really like their UI. I personally find them rather patronizing. In the end, the only reason I made the switch to Mac is that I've come to believe that I can make My Mac serve me, rather than turning me into a servant of The Great Mac Cause. Being able to install cross-platform tools for my basic work was a bit like erecting my flag of independence, to be a bit florid. Anyway I considered Camino but the incompatibility with FF extensions, including the likes of ScrapBook and Web developer tools was a show-stopper for me. I might still install Camino and even Flock. I'm all for browser polygamy.

The next thing I grabbed was Virtue Desktops (Thanks, Graham). Sorry but I can't work with all my windows crammed into one room. It seems Apple realizes the need for these as well, and is preparing the feature for Leopard. Unfortunately Virtue, and AFAICT Apple Spaces are far more limited than virtual desktop technology I'm used to. They work on the principle that each app is assigned to a "space", rather than each window. So my usual setup of having a set of Firefox windows with tabs for regular browsing, and another for client-related browsing, and another for OSS work isn't supported. I can probably get around this for browsing by using a few different browser apps, but I think this will be a real problem in the case of iTerm. I usually have a terminal window or two in each of my "spaces". I also need to find some more keyboard shortcuts for Virtue. shift-tab...arrow keys...enter is a tad too much.

I grabbed iTerm right away because I need tabs. I did find WidgetTerm, a neat Dashboard version of iTerm (no tabs, though). Dashboard is slick. I can't wait till I have some time to go hunting for widgets, and maybe even hacking up some of my own. Hope I can do so in Python.

I chose Vienna as Web feed reader. I'd have been OK paying for NetNewswire, but not on all their dubious terms . I need to quickly figure out IRC and IM (Jabber, AIM and Yahoo), and I'm finding this a bit of a murky area. AdiumX gets some great notices but some of my colleagues warned me of it because of some lingering show-stopper bugs. I'd also love to have IRC and IM in the same app. I'm guessing I'll end up trying a bunch of stuff to find what works for me. Oh well. I'm also presently trying to work out ssh-agent. I found this resource I plan to try. Then will come the hard part: my development set up. I'll be looking for an overview of Python and C dev tools on the Mac, preferably one that evaluates a broad variety of options. I think I'm going to try giving up emacs again, so I'll be checking out good stand-alone text editors. I might even go as far as trying an IDE or two. I got great advice on dev setup in comments to "Time for Mac".

A couple of annoyances I'll have to research more are lack of right click on the touchpad and an occasional disappearing mouse cursor. We ordered Lori's Intel Mac with a wireless keyboard and its mouse had right click as well as a very neat scroll button. I hope I won't be forced to use an external mouse on my notebook: I hate holding down ctrl for context menu. And sometimes the mouse cursor seems to disappear for a second or two. I'm trying to narrow down what triggers this. It's not a huge deal, but sometimes an annoying obstacle.

All in all I'm getting a god vibe about my choice. If nothing else, the energy that comes from shaking up my routine is refreshing. Thanks to all who have given such useful advice, either directly to me, or in the many general, on-line resources.

[Uche Ogbuji]

via Copia

Here's to the snowstorm echo

We missed last week's epic snowstorm, reported breathlessly all over the world, as we put off our return flight from Wisconsin to avoid the back-up at DIA. My colleague Linda has some good pictures from last week. There was plenty of warning of a smaller echo of that storm overnight, and sure enough we woke up this morning to a good 18 inches of the best Colorado Cava snow. Not quite the Champagne powder: you have get up into the real high country of the continental divide for that. I expect a good deal for the snowboarding trip this weekend (can't wait).

Anyway we all piled out this morning to dig out, have some fun in the fresh, and enjoy that delicious, crisp air that a goodly Colorado snow storm always leaves behind. L'Chai'm!.

Osi with the piste crawl:

I knew I should have tied the cover back on the grill yestereve after the wind took it off:

The neighbors' yard:

Osi off to break tracks in the back:

The back yard:

[Uche Ogbuji]

via Copia

Amara 1.2rc1

4Suite has been bumped to 1.0.2 with some important bug fixes. I also pushed Amara a step closer to 1.2 with a 1.2rc1 release. I'll make it 1.2 final some time this week, and then on to some pretty big architectural changes for 2.0. All test reports are welcome, especially from Web server users. Jeremy might have figured out a workaround fo the multiple-interpreter issue discussed in "multiple interpreters and extension modules". That should fix remaining known problems with mod_python.

[Uche Ogbuji]

via Copia

“I really know how it feels to be/Stressed out/Stressed out”

Faith Evans with A Tribe Called Quest on the title lyric, but it was Kristen Harris, my manager at Sun, who forwarded me "Top 5 myths about workplace stress". It's a nice piece. It has a few flake-off bits, but it certainly does identify unfortunate attitudes towards workplace stress I've seen. One bit I decided I need to pay special attention to is:

So the solution to stress is not to work harder to catch up because in most workplaces this is impossible. The solution is to feel good about the work you finish and not to get stressed about the work you don’t finish. It’s not that you should stop caring, it’s just that you should remember that being stressed makes you less productive, which means you get less work done and become more stressed. That’s a vicious circle right there and we need to break it.

Seems obvious in print, but I do so often get caught up in my mountains of unfinished work, and sometimes it weighs on me so heavily that it slows everything down. I think I'll try to keep a scratch list of accomplishments, however minor, for each day, and try starting each day reviewing the list from the previous day. Perhaps this might put me at risk of further malaise if, for example, I fall behind on keeping my accomplishments scratch list, or if I start each day nit-picking my previous day's work. But it's worth a try.

[Uche Ogbuji]

via Copia

Why JSON vs XML is a yawn

Strange spate of discussion recently about XML vs. JSON. On M. David Peterson's Weblog he states what I think is the obvious: there is no serious battle between XML and JSON. They're entirely complementary. Mike Champion responds:

The same quite rational response could be given about the "war" between WS-* and REST, but that has caused quintillions of electrons to change state in vain for the last 5 years or so. The fact remains that some people with a strong attachment to a given technology howl when it is declared to be less than universal. I completely agree that the metaphor of "keep a healthy tool chest and use the right one for the job at hand" is the appropriate response to all these "wars", but such boring pragmatism doesn't get Diggs or Pagerank.

If I may be so bold as to assume that "pragmatism" includes in some aspect of its definition "that which works", I see a bit of a "one of these things is not like the other" game (sing along, Sesame Street kids) in Mike's comparison.

  • XML - works
  • JSON - works
  • REST - works
  • WS-Kaleidoscope - are you kidding me?

Some people claim that the last entry works, but I've never seen any evidence beyond the "it happened to my sister's boyfriend's roomate's cousin" variety. On the other hand, by the time you click through any ten Web sites you probably have hard evidence that the other three work, and in the case of REST you have that evidence by the time you get to your first site.

For my part, I'm a big XML cheerleader, but JSON is great because it gives people a place to go when XML isn't really right. There are many such places, as I've often said ("Should Python and XML Coexist?", "Alternatives to XML", etc.) Folks tried from the beginning to make XML right for data as well as documents, and even though I think the effort made XML more useful than its predecessors, I think it's clear folks never entirely succeeded. XML is much better suited to documents and text than records and data. The effort to make it more suitable for data leads to the unfortunate likes of WXS (good thing there's RELAX NG) and RDF/XML (good thing there's Turtle). Just think about it: XQuery for JSON. Wouldn't it be so much simpler and cleaner than our XQuery for XML? Heck, wouldn't it be pretty much...SQL?

That having been said there is one area where I see some benefit to XQuery. Mixed-mode data/document storage is inevitable given XML's impressive penetration. XQuery could be a thin layer for extracting data feeds from these mixed-mode stores, which can then be processed using JSON. If the XQuery layer could be kept thin enough, and I think a good architect can ensure this, the result could be a very neat integration. If I had ab initio control over such a system my preference would be schema annotations and super-simple RDF for data/document integration. After all, that's a space I've been working in for years now, and it is what I expect to focus on at Kadomo. But I don't expect to always be so lucky. Then again, DITA is close enough to that vision that I can be hopeful people are starting to get it, just as I'm grateful that the development of GRDDL means that people in the Semantic Web community are also starting to get it.

On the running code front I've been working on practical ways of working nicely with XML and JSON in tandem. The topic has pervaded several aspects of my professional work all at once in the past few months, and I expect to have a lot of code examples and tools to discuss here on Copia soon.

[Uche Ogbuji]

via Copia

Woe US VISIT

At least this holiday I can get some small satisfaction from our wannabe Gestapo's having suffered a setback. When paranoid policy meets inept IT. I'm still tempted to an act of vandalism every time I see a stupid US VISIT sign at the airport, even more so now that in response to the unsurprising news that they are driving away innocent visitors (and thus business), they added a banner at the bottom of each sign saying "thank you thank you thank you". Nothing like heaping condescension upon insult.

[Uche Ogbuji]

via Copia