Linked Data and Overselling the HTTP URI Scheme

So, I'm going to do something which may not be well-recieved: I'm going to push-back (slightly) on the Linked Data movement, because, frankly, I think it is a bit draconian with respect to the way it oversells the HTTP URI scheme (points 3 and 4):

2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information.

There is some interesting overlap as well between this overselling and a recent W3C TAG finding which takes a close look at motivations for 'inventing' URI schemes instead of re-using HTTP. The word 'inventing' seems to suggest that the URI specification discourages the use of URI schemes beyond the most popular one. Does this really only boil down to an argument of popularity?

So, here is an anecdotal story that is based part in fiction and part in fact. So, a vocabulary author within an enterprise is (at the very beginning) has a small domain in mind that she wants to build some concensus around by developing an RDF vocabulary. She doesn't have any authority with regards to web space within (or outside) the enterprise. Does she really have to stop developing her vocabulary until she has selected a base URI from which she can gurantee that something useful can be dereferenced from the URIs she mints for her terms? Is it really the case that her vocabulary has no 'semantic web' value until she does so? Why can't she use the tag scheme (for instance) to identify her terms first and then worry later about the location of the vocabulary definition. Afterall, those who push HTTP URI schemes as a panacea solution must be aware that URIs are about identification first and location second (and this latter characteristic is optional).

Over the years, I've developed an instinct to immediately question arguments that suggests a monopoly on a particular approach. This seems to be the case here. Proponents of a HTTP URI scheme monoploy for follow your nose mechanics (or auto discovery of useful RDF data) seem to suggest (quite strongly) that using anything else besides the HTTP URI scheme is bad practice, without actually saying so. So, if this is not the case, my original question remains: is it just a URI scheme popularity contest? If the argument is to make it easy for clients to build web closure then I've argued before that there are better ways to do this without stressing the protocol with brute force and unintelligent term 'sniffing'.

It seems to be a much better approach to be unambigious about the the trail left for software agents by using an explicit term (within a collection of RDF statements) to point to where more aditionally useful information can be retrieved for said collection of RDF statements. There is already decent precedent in terms such as rdfs:seeAlso and rdfs:isDefinedBy. However, these terms are very poorly defined and woefully abused (the latter term especially).

Interestingly, I was introduced to this "meme" during a thread on the W3C HCLS IG mailing list about the value of the LSID URI scheme and whether it is redundant with respect to HTTP. I believe this disconnect was part of the motivation behind the recent TAG finding: URNs, Namespaces and Registries. Proponents of a HTTP URI scheme monopoly should educate themselves (as I did) on the real problems faced by those who found it neccessary to 'invent' a URI scheme to meet needs they felt were not properly addressed by the mechanics of the HTTP protocol. They reserve that right as the URI specification does not endorse any monopolies on schemes. See: LSID Pros & Cons

Frankly, I think fixing what is broken with rdfs:isDefinedBy (and pervasive use of rdfs:seeAlso - FOAF networks do this) is sufficient for solving the problem that the Linked Data theme is trying to address, but much less heavy handedly. What we want is a way to say is:

this collection of RDF statements are 'defined' (ontologically) by these other collections of RDF statements.

Or we want to say (via rdfs:seeAlso):

with respect to this current collection of RDF statements you might want to look at this other collection

It is also worth noting the FOAF namespace URI issues which recently 'broke' Protege. It appears some OWL tools (Protege - at the time) were making the assumption that the FOAF OWL RDF graph would always be resolvable from the base namespace URI of the vocabulary: http://xmlns.com/foaf/0.1/ . At some point, recently, the namespace URI stopped serving up the OWL RDF/XML from that URI and instead served up the specification. Nowhere in the the human-readable specification (which - during that period - was what was being served up from that URI) is there a declaration that the OWL RDF/XML is served up from that URI. The only explicit link is to : http://xmlns.com/foaf/spec/20070114.rdf

However, how did Protege come to assume that it could always get the FOAF OWL RDF/XML from the base URI? I'm not sure, but the short of it was that any vocabulary which referred to FOAF (at that point) could not be read by Protege (including my foundational ontology for Computerized Patient Records - which has since moved away from using FOAF for reasons that included this break in Protege).

The problem here is that Protege should not have been making that assumption but should have (instead) only attempted to assume an OWL RDF/XML graph could be dereferenced from a URI if that URI is the object of an owl:imports statement. I.e.,

http://example.com/ont owl:imports http://xmlns.com/foaf/spec/20070114.rdf

This is unambigous as owl:imports is very explicit about what the URI at the other end points to. If you setup semantic web clients to assume they will always get something useful from the URI used within an RDF statement or that HTTP schemed URI's in an RDF statement are always resolveable then you set them up for failure or at least alot of uneccessary web crawling in random directions.

My $0.02

Chimezie Ogbuji

via Copia
7 responses
Chimezie,



Important commentary and insght for sure.



Please enrichen the "Linked Data" article on Wikipedia with your thoughts (maybe open up a "===Criticisms===" or similar heading.



Again, this is an important contribution to discourse who's time has come.



Resolvers for URI dereferencing that are protocol independent is a big deal when dealing with heterogeneous data sources across a global distributed network such as the Internet.



For Virtuoso (as you may or may not know), this isn't a big deal, but when talking about a Web of Data Sources and providers as per the "Linked Data" and "Semantic Data Web" as whole, this goes beyond what a few platforms may or may not be capable of doing etc..



Good post, extremely important contribution to the broader "Linked Data" discourse along the lines of: Does URI Dereferencing have to be Protocol specific (i.e. HTTP)?
Like Kingsley says, a good contribution. I don't personally think the use of HTTP URIs is being oversold. I'd look at it the other way, that to date they have been underexploited in the Semantic Web community. i.e. a surfeit of Semantic, not enough Web.



A tweak on your final paragraph might illuminate this point:



If you setup web clients to assume they will always get something useful from the URI used within a link or that HTTP schemed URI's in an link (or document) are always resolveable then you set them up for failure or at least a lot of unneccessary web crawling in random directions.





btw, I believe the FOAF resolving thing has now been fixed.
Kinglsey: I'll make a note up update the Wiki page.



Yes, the FOAF ns problem was fixed,



Danny: Good point about link/@rel | link/@href



I recall murray malone asking once about the possibility of a faithful rendition for XInclude directives, and the idea of either using rdfs:seeAlso or an alternative (but explicit term)



like:



<code><http://bblfish.net/work/atom-owl/2006-06-06/#link> <.. other RDF Dataset or RDF abstract graph URI .. ></code>



I think the chain of dereferenceURI should be:



<http://esw.w3.org/topic/DereferenceURI>(..uri..) := GRDDL(uri) + RDFa parse + Prospective Dereferencing + Recursive Dereferencing , RESTfully (caching, etc..)
A 'default' GRDDL transform should map XLink, XInclude, and xhtml:link[@href] elements (in the XPath Data Model) into rdfs:seeAlso using the link semantics defined in AtomOWL:



Or they could be expressed (instead) using AtomOWL link semantics



:link [ a :Link;

  :rel iana:alternate ;

  :to [ :src <http://example.org/>;]

  ];
Chime & Danny,



What about point 2 re. making your URIs HTTP URIs? Should it not really read: Use Resolvable URIs which protects the notion that URNs and URLs are different types of URIs ..



Being in the middle of two communities (Linked Data and the LSID based HCLS commnities) seperated by URI resolution perspectives, I think we should fix point 2 :-)
Chime,



I've altered the Wikipedia Linked Data article re. point 2. Thus, URIs are presented in a protocol independent manner :-)





Kingsley
I am still of the opinion that linked data and http URIs are a hack to address a glaring hole in RDF: the lack of an equivalent for XML's xsi:schemaLocation construct. It is this lacuna that is causing all the troubles.

It could be addressed *without* disturbing the purity of the semantic web by putting something in the XML schema that carries RDF. That is: don't do it at the "semantic" layer, don't do it with RDF properties. RDF can continue to concern itself only with URIs that are ontology *names*. Whatever it may be that carries RDF - the XML - can deal with the machine-level question of locating the vocabularies on a network.