Funky fresh Vail day

Shot out to Jen, Kenny, Susan, Kim, Dawn, Alexandra and, Rogé. Thanks for the sweet day at Vail Sunday. The snow, weather, company and all round fun factor was exquisite, and it was surely the best day of what has been a really good season. Here are the videos I promised.

First an apology. That stupid ass camera a bien cassé mes pieds (or I suppose, as I learned Sunday, bien me merde)! Several of the video clips were partially hosed, and a couple of them completely so. As a general note, avoid the DXG-305V like the pox. Not only is it useless for taking pictures (the shot above is exhibit A; ignore the wrong time-stamp on the image, I never did set the time on the camera), but its movie mode is seriously buggy. I returned it to Target today, and earnestly pleaded with the bemused staff to take all the other such cameras off the shelf and use them for footholds on a rock climbing wall. Oh yeah, and all the videos are in Microsoft ASF format (my first thought was "you gotta be kidding me"). Your choices for playback might be a bit limited. Gi-Gi Alex, and Kim-oui, the clips with all your fly turns appear to be completely lost in memory, even using Windows media player. At least I have some footage from the others. Don't blame me. Blame the poxy appareil photo.

Anyway, I uploaded all the video clips, even the broken ones. If anyone has any luck with them, let me in on the secret.

[Uche Ogbuji]

via Copia

"Semantic Transparency" by any other name

In response to "Semantic hairball, y'all" Paul Downey responded with approval of my skewering of some of the technologies I see dominating the semantics space, but did say:

..."semantic transparency" in "XML Schema" sounds just a little too scary for my tastes....

This could mean that the names sound scary, or that his interpretation of the idea itself sounds scary. If it's the latter, I'll try to show soon that the idea is very simple and shouldn't be at all scary. If it's the former, the man has a point. "Semantic Transparency" is a very ungainly name. As far as I can tell, it was coined by Robin Cover, and I'm sure it was quite suitable at the time, but for sure right now it's a bit of a liability in the pursuit that most interests me.

The pursuit is of ways to build on the prodigious success of XML to make truly revolutionary changes in data architecture within and across organizations. Not revolutionary in terms of the technology to be used. In fact, as I said in "No one ever got fired for...", the trick is to finally give age-old and well proven Web architecture more than a peripheral role in enterprise architecture. The revolution would be in the effects on corporate culture that could come from the increased openness and collaboration being fostered in current Web trends.

XML ushered in a small revolution by at least codifying a syntactic basis for general purpose information exchange. A common alphabet for sharing. Much is made of the division between data and documents in XML (more accurate terms have been proposed, including records versus narrative, but I think people understand quite well what's meant by the data/documents divide, and those terms are just fine). The key to XML is that even though it's much more suited to documents, it adapts fairly well to data applications. Technologies born in the data world such as relational and object databases have never been nearly as suitable for document applications, despite shrill claims of relational fundamentalists. XML's syntax mini-revolution means that for once those trying to make enterprise information systems more transparent by consolidating departmental databases into massive stores (call them the data warehouse empire), and those trying to consolidate documents into titanic content management stores (call them the CMS empire) can use the same alphabet (note: the latter group is usually closely allied with those who want to make all that intellectual capital extremely easy to exchange under contract terms. Call them the EDI empire). The common alphabet might not be ideal for any one side at the table, but it's a place to start building interoperability, and along with that the next generation of applications.

All over the place I find in my consulting and speaking that people have embraced XML just to run into the inevitable limitations of its syntactic interoperability and scratch their head wondering OK, what's the next stop on this bus route? People who know how to make money have latched onto the suspense, largely as a way of re-emphasizing the relevance of their traditional products and services, rather than as a way to push for further evolution. A few more idealistic visionaries are pushing such further evolution, some rallying under the banner of the "Semantic Web". I think this term is, unfortunately, tainted. Too much of the 70s AI ambition has been cooked into recent iterations of Semantic Web technologies, and these technologies are completely glazing over the eyes of the folks who really matter: the non-Ph.Ds (for the most part) who generate the vast bodies of public and private documents and data that are to drive the burgeoning information economy.

Some people building on XML are looking for a sort of mindshare arbitrage between the sharp vendors and the polyester hippies, touting sloppy, bottom-up initiatives such as microformats and folksonomies. These are cheap, and don't make the head spin to contemplate, but it seems clear to anyone paying attention that they don't scale as a way to consolidate knowledge any more than the original Web does.

I believe all these forces will offer significant juice to next generation information systems, and that the debate really is just how the success will be apportioned. As such, we still need an umbrella term for what it means to build on a syntactic foundation by sharing context as well. To start sharing glossaries as well as alphabets. The fate (IMO) of the term "Semantic Web" is instructive. I often joke about the curse of the s-word. It's a joke I picked up from elsewhere (I forget where) to the effect that certain words starting with "s", and "semantic" in particular are doomed to sound imposing yet impossibly vague. My first thought was: rather than "semantic transparency", how about just "transparency? The problem is that it's a bit too much of a hijack of the generic. A data architect probably will get the right picture from the term, but we need to communicate to ithe wider world.

Other ideas that occur to me are:

  • "information transparency"
  • "shared context" or "context sharing"
  • "merged context"
  • "context framing"
  • "Web reference"

The latter idea comes from my favorite metaphor for these XML++ technologies: that they are the reference section (plus card catalog) of the library (see "Managing XML libraries"). They are what makes it possible to find, cross-reference and understand all the information in the actual books themselves. I'm still fishing for good terms, and would welcome any suggestions.

[Uche Ogbuji]

via Copia

Binary Predicates in FOPL and at Large Volumes

I've wanted to comeback to the issue of RDF scalability of a relational model for some time (a topic that has been on my mind for some time). Earlier, I mentioned a Description Logics (DL) representation technique that would dramatically reduce the amount of size needed for most RDF graphs. I only know of one other RDF store (besides rdflib) that does this. At large volumes, metrics of query response time are more succeptible to space efficiency than pure seek time. At some point along the numerical scale, there will be a point where the amount of time it takes to resolve a query is more directly affected by the size of the knowledge base than anything else. When you consider the URI lexical grammar, skolemization, seek times, BTrees, and Hash-tables even interning (by that I mean the general reliance on uniqueness in crafting identifiers) has little effect to the high-volume metrics of FOPL.

Perhaps something more could be said about the efficiency of DL? I've suggested the possiblity of semantic compression (or 'forward consumption' if you think of it as analagous to forward chaining) where what can be implied is never added or is removed by some intelligent process (perhaps periodically). For example, consider a knowledge base that only stored 'knows' relationships (foaf:knows, perhaps) between between people. It would be very redundant to state that two individual are 'People' (foaf:Person) if they know each other (66.6% space saving right there). Couldn't the formality of DL be used to both enhance expressiveness as well as efficiency? In the same way that invariant representations make our neocortex so much more efficient at logical prediction? If not DL, perhaps at least the formality of a local domain ontology and it's rules? I was able to apply the same principle (though not in any formal way you could automate) to improve the speed of a content management knowledge base.

[Uche Ogbuji]

via Copia

Optimizing XML to RDF mappings for Content Management Persistence

I recently re-factored the 4Suite repository's persistent layer for the purpose of making it more responsive to large sets of data. The 4Suite repository's persistence stack – which consists of a set of core APIs for the various first class resources - is the heart and sole of a framework that leverages XML and RDF in tandem as a platform for content management. Essentially, the changes minimized the amount of redundant RDF statements mirrored into the system graph (an RDF graph where provenance statements about resources in the virtual filesystem are persisted) from the metadata XML documents associated with every resource in the repository.

The ability to mirror RDF content from XML documents in a controlled manner is core to the repository and the way it manages it's virtual filesystem. This mapping is made possible by a mechanism called document definitions. Document definitions are mappings (persisted as documents in the 4Suite repository) of controlled XML vocabularies into corresponding RDF statements. Every resource has a small 'metadata' XML document associated with it that captures ACL data as well as system-level provenance typically associated with filesystems.

For example, the metadata document for the root container of the 4Suite instance running on my laptop is:

<?xml version="1.0" encoding="utf-8"?>
<ftss:MetaData 
  xmlns:ftss="http://xmlns.4suite.org/reserved" 
  path="/" 
  document-definition="http://schemas.4suite.org/4ss#xmldocument.null_document_definition"   
  type="http://schemas.4suite.org/4ss#container" creation-date="2006-03-26T00:35:02Z">
  <ftss:Acl>
    <ftss:Access ident="owner" type="execute" allowed="1"/>  
    <ftss:Access ident="world" type="execute" allowed="1"/> 
    <ftss:Access ident="super-users" type="execute" allowed="1"/>  
    <ftss:Access ident="owner" type="read" allowed="1"/>
    <ftss:Access ident="world" type="read" allowed="1"/>    
    <ftss:Access ident="super-users" type="read" allowed="1"/>  
    <ftss:Access ident="owner" type="write user model" allowed="1"/>
    <ftss:Access ident="super-users" type="write user model" allowed="1"/>  
    <ftss:Access ident="owner" type="change permissions" allowed="1"/>  
    <ftss:Access ident="super-users" type="change permissions" allowed="1"/>
    <ftss:Access ident="owner" type="write" allowed="1"/> 
    <ftss:Access ident="super-users" type="write" allowed="1"/> 
    <ftss:Access ident="owner" type="change owner" allowed="1"/> 
    <ftss:Access ident="super-users" type="change owner" allowed="1"/>
    <ftss:Access ident="owner" type="delete" allowed="1"/>
    <ftss:Access ident="super-users" type="delete" allowed="1"/>
  </ftss:Acl>
  <ftss:LastModifiedDate>2006-03-26T00:36:51Z</ftss:LastModifiedDate>
  <ftss:Owner>super-users</ftss:Owner>
  <ftss:Imt>text/xml</ftss:Imt>
  <ftss:Size>419</ftss:Size>
</ftss:MetaData>

Each ftss:Access element under ftss;Acl represents an entry in the ACL associated with the resource the metadata document is describing. All the ACL accesses enforced by the persistence layer are documented here.

Certain metadata are not reflected into RDF, either because they are queried more often than others and require prompt response or because they are never queried separately from the resource they describe. In either case, querying a small-sized XML document (associated with an already identified resource) is much more efficient than dispatching a query against an RDF graph in which statements about every resource in the repository are assserted.

ACLs are an example and are persisted only as XML content. The persistence layer interprets and performs ACL operations against XML content via XPath / Xupdate evaluations.

Prior to the change, all of the other properties embedded in the metadata document (listed below) were being reflected into RDF redundantly and inefficiently:

  • @type
  • @creation-date
  • @document-definition
  • ftss:LastModifiedDate
  • ftss:Imt
  • ftss:Size
  • ftss:Owner
  • ftss:TimeToLive

Not too long ago, I hacked (and wrote a piece on it) up an OWL ontology describing these system-level RDF statements.

Most of the inefficiency was due to the fact that a pre-parsed Domlette instance of the metadata document for each resource was already being cached by the persistence layer. However the corresponding APIs for these properties (getLastModifiedDate, for example) were being implemented as queries against the mirrored RDF content. Modifying these methods to evaluate pre-compiled XPaths against the cached DOM instances proved to be several orders of magnitudes more efficient, especially against a repository with a large number of resources in the virtual filesystem.

Of all the above 'properties', only @type (which was being mirrored as rdf:type statemements in RDF), @document-definition, and ftss:TimeToLive were being queried independently from the resources they are associated with. For example, the repository periodically monitors the system RDF graph for ftss:TimeToLive statements whose values are less than the current date time (which indicates their TTL has expired). Expired resources can not be determined by XPath evaluations against metadata XML documents, since XPath is scoped to a specific document by design. If the metadata documents were persisted in a native XML store then the same queries could be dispatched (as an XQuery) across all the metadata documents in order to identify those whose TTL had expired. But I digress...

The @document-defintion attribute associates the resource (an XML document in this case) with a user-defined mapping (expressed as an XSLT transform or a set of XPath to RDF statement templates) which couples it's content with corresponding RDF statements. This presents a interesting scenario where if a document definition changes (document definitions are themselves first-class resources in the repository), then all the documents which refer to it must have their RDF statements re-mapped using the new document definition.

Note, such coupling only works in a controlled, closed system and isn't possible where such mappings from XML to RDF are uncontrolled (ala GRDDL) and work in a distributed context.

At any rate, the @document-definition property was yet another example of system metadata that had to be mirrored into the system RDF graph since document definitions need to be identified independently from the resources that register them.

In the end, only the metadata properties that had to be queried in this fashion were mirrored into RDF. I found this refactoring very instructive in identifying some caveats to be aware of when modeling large scale data sets as XML and RDF interchangeably. This very small architectural modification yielded quite a significant performance boost for the 4Suite repository, which (as far as I can tell) is the only content-management system that leverages XML and RDF as interchangeable representation formats in such an integrated fashion.

[Uche Ogbuji]

via Copia

Mi...cro...for...mats...sis...boom...BLAH!

Mike Linksvayer had a nice comment on my recent talk at the Semantic Technology Conference.

I think Uche Ogbuji's Microformats: Partial Bridge from XML to the Semantic Web is the first talk I've heard on that I've heard from a non-cheerleader and was a pretty good introduction to the upsides and downsides of microformats and how can leverage microformats for officious Semantic Web purposes. My opinion is that the value in microformats hype is in encouraging people to take advantage of XHTML semantics in however a conventional in non-rigorous fashion they may. It is a pipe dream to think that most pages containing microformats will include the correct profile references to allow a spec-following crawler to extract much useful data via GRDDL. Add some convention-following heuristics a crawler may get lots of interesting data from microformatted pages. The big search engines are great at tolerating ambiguity and non-conformance, as they must.

Yeah, I'm no cheerleader (or even follower) for Microformats. Certainly I've been skeptical of Microformats here on Copia (1, 2, 3). I think that the problem with Microformats is that value is tied very closely to hype. I think that as long as they're a hot technology they can be a useful technology. I do think, however, that they have very little intrinsic technological value. I guess one could say this about many technologies, but Microformats perhaps annoy me a bit more because given XML as a base, we could do so much better.

Mike is also right to be skeptical that GRDDL will succeed if, as it presently does, it relies on people putting profile information into Web documents that use Microformats.

My experience at the conference, some very trenchant questions from the audience, A very interesting talk by Ben Adida right after my own, and more matters have got me thinking a lot about Microformats and what those of us whose information consolidation goals are more ambitious might be able to make of them. Watch this space. More to come.

[Uche Ogbuji]

via Copia

"XML in Firefox 1.5, Part 2: Basic XML processing"

"XML in Firefox 1.5, Part 2: Basic XML processing"

Subtitle Do a lot with XML in Firefox, but watch out for some basic limitations

Synopsis This second article in the series, "XML in Firefox 1.5," focuses on basic XML processing. Firefox supports XML parsing, Cascading Stylesheets (CSS), and XSLT stylesheets. You also want to be aware of some limitations. In the first article of this series, "XML in Firefox 1.5, Part 1: Overview of XML features," Uche Ogbuji looked briefly at the different XML-related facilities in Firefox.

I also updated part 1 to reflect the FireFox 1.5 final release.

This article is written at an introductory level. The next articles in the series will be more technically in-depth, as I move from plain old generic XML to fancy stuff such as SVG and E4X.

[Uche Ogbuji]

via Copia

No one ever got fired for...

In my previous entry about enterprise architecture and complexity I forgot to touch on one thread that occurred to me.

My recent experiences, and Dare's quote, bring me to mind of the old adage: "No one ever got fired for buying IBM". Why is there no sign of a corresponding "No one ever got fired for designing like Google"? To be sure, IBM was on top a lot longer than Google before it became subject of the proverb, but hey, the Web age is a faster age, right? Where's my accelerated fulfilment when it comes to enterprise applications architecture?

I get the impression that instead, among the C-level cloisters of many run-of-the-mill companies, the reality is more "no one ever got fired for ordering a titanic Oracle or ERP license and thereupon building an unmaintainable application superstructure". It seems a lot harder to explain to the board that you are introducing revolutionary efficiency in your organization's information systems by learning the lessons of the Web (the most successful distributed information system ever). That sounds dangerously generic to the eyes of analysts trained to receive all truths from Chicago-cluster consultants. It does not sound like a roll-up of synergies to cross the chasm and monetize emergence of elastic markets. Paying the toffs gigabucks and then bending over for the inevitable business process re-engineering is just how it's done, lads.

So no one gets fired for Google-like systems architecture. No. Outside the crescendoing Web 2.0 bubble, no one gets hired in the first place if there's the slightest sniff they'd contemplate such a thing. Shame. Web 2.0 is not a bubble (square-one-dot-com) because it's based on near-trivial technology. It's a bubble because there are very few opportunities for arbitrage in a marketplace whose point is to provide customers unprecedented transparency and choice. The very place where such an approach can more consistently provide value is within the enterprise whose information systems have so long been bantustans of baroque and isolated systems. The enterprise is where there is a real chance of information systems revolution from Google-like technology. And it's the one place where no one is looking to build and deploy technology the way Google does.

[Uche Ogbuji]

via Copia

Today's wot he said

Dare Obasanjo, earlier this week:

The funny thing about a lot of the people who claim to be 'Enterprise Architects' is that I've come to realize that they tend to seek complex solutions to relatively simple problems. How else do you explain the fact that web sites that serve millions of people a day and do billions of dollars in business a year like Amazon and Yahoo are using scripting languages like PHP and approaches based on REST to solve the problem of building distributed applications while you see these 'enterprise architect' telling us that you need complex WS-* technologies and expensive toolkits to build distributed applications for your business which has less issues to deal with than the Amazons and Yahoos of this world?

Gbooyakasha! I've recently had occasion to discuss my "enterprise" credentials with some mainstream-y CIO/CTO types. It always amazes me how many of that number gaze vacantly at simple architectural ideas, and find true comfort in endless, overlapping boxes with data arrows flying in all dizzying directions, so long as those boxes are labeled "Oracle", "SAP" and such. I certainly understand it's easy to confuse simple with simplistic, but unnecessarily complicated should not be so hard to spot, and it's all over the place in your friendly neighborhood enterprise.

I've considered myself an enterprise architect because I've worked in the architecture of solutions that require workflow across departments within medium-sized organizations. Lately, however, I've come to wonder whether unfortunate practice has tainted the title.

[Uche Ogbuji]

via Copia

A Perspective on Temporal Modeling in RDF

I just read the follow-up to a thread (Why we need explicit temporal labelling) on the formal modeling of time and time related semantics to RDF, specifically. I wanted to put my $0.02 in since, I spend a good deal of my time at work buried nose-deep in large volumes of bioinformatic cardiovascular data, most of which is largely temporal. I guess, to put it succintly, I just don't see the value in merging temporal semantics (not a very light weight model) into the fabric of your representation model.

We found (for our purposes) that by including our own specific temporal semantic vocabulary, we could ensure that we can answer questions such as:

How many patients had complained about chest pains witin 30 days of a specific surgical operation.

While at the same time avoiding the rigidness of temporal reasoning that formal models impose. Such formalisms (especially in distributed systems) are unecessary when you consider that most often, data as it is fetched (at any point in time) is 'complete' regardless of how it has varied over time.

Consider the RDF schema for OWL, whose identifier (the identifier of the URL from where it's content can be loaded) includes some temporal semantics (when it was published, and the suggestion that there are prior versions). Though the content might have changed over time, the entire document as it was at any point was 'consistent' in what it conveys. No additional temporal semantics is needed to capture the relations between versions or to maintain some 'sanity' (if you will) over the fact that the data changed over time.

And if such formalism is needed, it's rather easy to piggy back off existing ontologies ("Time Ontology in OWL" for instance.)

Furthermore, If you think about it, named contexts (graphs, scopes, etc..) already provide a more adequate solution to the issue of inconsistency of data (over time) from the same source. For instance, you can take advantage of syntactic RDF/XML and N3 sugar such as:

<> a owl:Ontology;
   dc:date "2002-07-13";

or it's RDF/XML equivalent:

<owl:Ontology 
  rdf:about="">
  <dc:date>2002-07-13</dc:date>
</owl:Ontology>

In order to capture enough provenance data to accomodate change.

Ironically, the ability to make provenance statements (one of which includes the date associated with this 'representation') about a named graph (identified by the URL from which it was loaded) is beyond the semantics of the RDF model. However, through it's use you can be specific about the source of triples and (in addition), you can include the specifics of version either within the identifier of the source of through provenance statements made about it.

I think the problem is more a modeling issue (and having the foresight to determine how you accomodate the change of data over time) than a shorcoming of the framework.

Chimezie Ogbuji

via Copia

Chief Niwot's curse

Living in Boulder county, Colorado, I've often heard humorous references to "Chief Niwot's curse" by fellow transplants, and a few not-so-humorous references by native Coloradans. I've heard a lot of vague formulations of the curse, and got curious about it. It seems vague formulations are what you find on the Web as well. Apocrypha get no respect.

The neatest reference I find is in the Boulder Weekly's "A-to-Z Guide to Boulder: How to talk like a native"

Niwot's Curse—When white men first came to Boulder in 1858, they were looking for gold. What they found instead were Southern Arapaho warriors under Chief Niwot who wanted them to leave. When they refused, Chief Niwot supposedly uttered this curse: "People seeing the beauty of this valley will want to stay, and their staying will be the undoing of its beauty."

That page also has a very brief summary of Chief Niwot's story, which is very interesting, and worth pursuing in primary sources. Overall the page is a reasonable guide to "Boulder weird", which is tolerable to my taste in medium quantities (I love hanging out in Boulder proper, but I'm glad I live one US 36 exit and one big hill south. Not so glad, I admit, about living among the little boxes made of ticky-tacky).

For me, Boulder valley was just part of the magic in the curse. It was in 1995 when I took a road trip with friends from Fort Collins to San Francisco, driving through Boulder as tourists and then on through Golden (striking in a somewhat more industrial way) and onto I-70 through the jaw-dropping rockies. While in Fort Collins, we also trekked up famous Highway 14 to the continental divide. I've been all over the U.S. and I must say three of the four most gorgeous passes I've ever seen are in Colorado. Yosemite is the one out of four, and I think Independence pass edges it out.

I don't know whether it was specifically Boulder valley and the Flatirons that did for me--it was more likely Fort Collins and environs--but Niwot's curse took a hold, and I found myself determined to move to Colorado to settle, succeeding in 1998 when we moved to Fort Collins.

Calabar, Cairo (Egypt), Birmingham (England), New York, Cleveland, Gainesville, Enugu, Owerri, Yola, Port Harcourt, East Brunswick, Cleveland, Milwaukee, Chicago, Dallas, Peoria, Fort Collins, and now Superior/Boulder. I'd never lived anywhere in my life a full three years before settling here in Superior (I'll have lived six years here in May). Lori does love Colorado, but I think she loves even more that my moving itch seems to have subsided. To most of those under its hold, the famous curse has the devious trick of looking like the most extravagant sort of blessing.

[Uche Ogbuji]

via Copia