Is RDF moving beyond the desperate hacker? And what of Microformats?

I've always taken a desperate hacker approach to RDF. I became a convert to the XML way of expressing documents right away, in 1997. As I started building systems that managed collections of XML documents I was missing a good, declarative means for binding such documents together. I came across RDF, and I was sold. I was never really a Semantic Web head. I used RDF more as a desperate hacker with problems in a fairly well-contained domain. At that time the Sem Web aspirations behind RDF didn't get in the way too badly, so all was well for me. My desperate hacker mindset is probably best summarized in this XML-DEV message from may, 2001.

I see RDF as an excellent modeling tool for closed systems. In my practice, most of the real "knowledge" is in the XML documents at the nodes, but RDF can provide important indexing and relationship expression between these nodes.

I go on to in that message expand on where RDF fits into the architecture of apps for my needs. I also mention a bit of wariness about how RDF's extravagant ambition (i.e. Sem Web) could affect my simple, practical needs.

I quickly found out on www-rdf-logic that in the discussion there, the assumption appear to be that in the semantic Web the RDF statements would carry a heavy burden of the "knowledge" in the system. I've started to think that this idea is a straw man set up by folks who would like RDF to be a fully-blown knowledge-representation language, but if "strong RDF" is indeed a cog in the SW wheel, I fear I must excuse myself from contributing to that discussion because It places me immediately out of my depth.

I've spent a lot of time with RDF, and for a while it was a big part of our consulting practice, but recently applications architecture and schema design (RELAX NG mostly, thank goodness) have been the biggest part of the day job. Honestly, I started to lose touch with where RDF was going. I knew there were some common-sense fixes to bugs in the 1999 specs, but I also knew there were some worrying injections of Sem Web think into the model core. Recently I've had some opportunity to catch up. SPARQL just doesn't fit my head, so a few of us in the Versa 1.0 gang, including Mike Olson and Chimezie, have started work towards Versa 2.0. Mike and Chime have kept up with the state of RDF, and in several discussions, I expressed what I felt were simple view of the RDF model and got in response what I thought were overblown claims about how the RDF model's semantics have been updated. In all cases when I checked the relevant parts of the latest RDF specs I found that Mike and Chime were right and it was rather the RDF model itself that was overblown.

I've developed an overall impression of dismay at the latest RDF model semantics specs. I've always had a problem with Topic Maps because I think that they complicate things in search of an unnecessary level of ontological purity. Well, it seems to me that RDF has done the same thing. I get the feeling that in trying to achieve the ontological purity needed for the Semantic Web, it's starting to leave the desperate hacker behind. I used to be confident I could instruct people on almost all of RDF's core model in an hour. I'm no longer so confident, and the reality is that any technology that takes longer than that to encompass is doomed to failure on the Web. If they think that Web punters will be willing to make sense of the baroque thicket of lemmas (yes, "lemmas", mi amici docte) that now lie at the heart of RDF, or to get their heads around such bizarre concepts as assigning identity to literal values, they are sorely mistaken. Now I hear the argument that one does not need to know hedge automata to use RELAX NG, and all that, but I don't think it applies in the case of RDF. In RDF, the model semantics are the primary reason for coming to the party. I don't see it as an optional formalization. Maybe I'm wrong about that and it's the need to write a query language for RDF (hardly typical for the Web punter) that is causing me to gurgle in the muck.

Assuming it were time for a desperate hacker such as me to move on (and I'm not necessarily saying that I am moving on), where would he go from here? I hear the chorus: microformats. But I see nothing but nasty pricklies down that road. IMO microformats are now where RDF was back in 1999 (actually more like 1998) in terms of practical use to the Web, but in making their specification nothing but a few notes scribbled in a WIki, they are purely syntactic, and offer no semantic anchor. As such, I'm not sure why it makes sense to think of microformats as different from XML ca. 1997. What's the news there? They certainly don't solve my desperate hacker need for indexing and expressing relationships across XML documents. I don't need the level of grounding that RDF seems to so slavishly be aiming for these days, but I need more than scattered Wiki notes.

GRDDL is the RDF community's bid to fix microformats up with some grounding. Funny thing is that in GRDDL they are re-discovering what the desperate hackers at Fourthought devised almost four years ago in "document definitions" to map XML syntax to RDF statements using XPath and XSLT. The desperate hacker in me feels at the same time vindicated, and left in the weeds. Sure GRDDL gets RDF some of what I've thought it's needed for ages, but it still does wed microformats to the present-day RDF model, which is just what I'm becoming uneasy about.

I'm more wandering around than getting anywhere in this entry, I freely admit. Working the grounding layer for XML is still what I consider to be my work of primary career interest. Lately, this work has led me more in the direction of schema annotations, as you can see in some of my recent articles on IBM developerWorks. Architectural forms are the closest thing the SGML old-heads gave us to syntax-semantic grounding (grounded to HyTime, of course), and AF were a creature of the schema. Perhaps it's high time we went back to learn that old-head lesson and quit fiddling around with brittle post-schema transformations.

As for the modeling system to use as the basis for grounding XML syntax, I don't know. I stick to RDF for now, but I'll have to see if it's possible to use it interoperably while still ignoring the more esoteric flourishes it's picked up lately. The Versa discussions at first gave me the impression that these flourishes are inevitable, but more recent threads have been a bit more encouraging.

I certainly hope that it doesn't take another rewind to RDF circa 2000 to satisfy the desperate hacker.

[Uche Ogbuji]

via Copia
5 responses
Hmm.. This is a tough one. Where to begin.  I can't deny that I'm probably knee deep in the possibilities of RDF and would consider myself an 'RDF-head' but having worked with it for as long as I have, it would be foolish for me not to admit that the recent changes regarding literals and the entailments that make them resources is more than a bit confusing.  The seperation used to be clear to me, but it's not now, and it's less important when you are only dealing with surface semantics (just syntax primarily) but when you have to consider the ramifications for constructing a query language it makes my head hurt.  And to be honest I've forced myself to continue to think of Literals as Literals and Resources as resources.



Note, this issue is specific to the standard entailments on an RDF Graph (literal generalization rule: lg).



For the life of me, I can't imagine the motivation for it other than to ensure that an RDF graph can be interpreted universally w/out ambiguity.



In summary, I think most of confusing developments in RDF lately have mostly to do with the fact that it's use and value as a solution in open systems has been pushed harder than it's use and value in closed-sysytems.  Way too hard.  The cause of this is the unfortunate SW-effect (I choose not to utter that cliche phrase unless forced to under penalty of death).  It's a lofty goal that overlooks the low-hanging fruit that is very much more useful. 



In particular, I touched on this annoyingly backwards emphasis in a previous post (where I also talk about how GRDDL is incredibly reminiscent of Document Definitions)



I agree, with the assertions about Microformats as well.  When I see microformats, I think: so what's new about the idea of mixing XML vocabularies (the only difference seems to be that the framework for such mixing is XHTML instead of pure XML).  Norm Walsh sums it up well when he says:



"

Microformats are becoming quite popular. Old timers like myself recognize that these are what we used to call “architectural forms” being reinvented. Exactly what constitutes a microformat is probably open to debate."



So, I don't blame people like Christopher Schmidt for getting <a  href="http://crschmidt.net/blog/archives/85/exhaustio...> up with RDF.



However, I think it's just as easy to over-sell RDF as it is to not give it the credit where it's due for the kinds of problems it's suited for specifically.



Which is why when ever I'm asked about RDF I always respond with:



- What is the problem you are trying to solve, specifically?

- Does flexible syntax (XML) solve your problem or

do you really need the heavy-lifting of Knowledge Representation



I've found the sweet spot is where I have as much control over the generation of RDF content as possible (generating it from XML).  I think I've heard the phrase used before:  strict production, liberal consumption.



The problem, I think is that most people wander across the line that seperates expressive syntax from Knowledge Representation (in the pure AI sense) and wonder why they are getting beat upside their head.  The difficult issues of interpretation, logical deduction, grounding, etc.. should only be breached if your requirement is truely to be able to model complex, interelated, concepts not just data (in the generic sense).  And even if you do need to model complex concepts, the next question you need to ask is if your context can be limited to a closed, controlled system.  If so, you are on the right path.  If not, be prepared to venture on a journey through a maze that is as complicated as our tax code - mostly because of the nature of the topic (First Order Logic, Model Theoretics, Proofs, etc..) not so much the tools that are used.
Much to ponder here (things like "hey, he doesn't keep up with the specs?").



The mention of topic maps is interesting, but as a contrast to RDF (post 1998). The formal methods used by Pat Hayes have tightened the specs, and that's helped a great deal. There's now often an authoritative answer to many questions, and that answer is verifiable as a proof.



Topic maps is like RDF initially. Everyone was sure what the spec meant, but they didn't always agree, and had no means to resolve the issue (topic maps is still like that from what I've seen).



That doesn't mean that one has to be a logician to use rdf. But it's nice knowing I can ask about something, and people like Graham or Jos will show why it is or isn't the case, and agree.



Of course I'd like to have a class for things which aren't literals (like resources used to be, and I still think that way). I don't think OWL-DL is that useful, but full has some good things. These are pretty minor gripes, though. SPARQL has reminded me how fun a dumb bunch of triples can be, and all without the aid of a DL reasoner.
Chime,



well put, top to bottom.



Damian,



The desperate hacker doesn't need logical proofs run on his graphs, and so the paraphernalia for such proofs shouldn't interfere with a bog simple understanding of the graphs.  That's what I mean when I bemoan the complications of Sem Web think.



More importantly, you're not going to sell RDF to anyone with a promise of proofs, except perhaps AI types, and look what a large population that is.



Generally, if I want to ask whether or not something is the case in my usage of RDF, I'll consult the actual resources, and not do any fancy AI on the graph itself.  If I needed such, I'd use OWL, and that's why I think a lot of the RDF MT should have been reserved for such a level.



Finally, my own problem with Topic Maps is different from yours: I don't like it because it's too conceptually complex.  RDF is becoming that way as well, and regardless of the fact that a handful of hard-core logicians in the RDF community can miracululously agree on things, conceptual complexity is just what the desperate hacker is going to steer clear of.
"[Microformats] certainly don't solve my desperate hacker need for indexing and expressing relationships across XML documents."



Didn't you hear? Just use the rel attribute!



Just kidding.



A hearty "hear, hear" to every paragraph. I've wondered several times whether some of the Architectural Forms ideas couldn't be implemented in RELAX NG. Don't bring up HyTime though, which makes the RDF discussions look straightforward, down-to-earth, and immediately applicable to real business problems. (At SGML '97 my talk was titled "Architectural Forms without HyTime." See also the joke "How many HyTime consultants does it take to screw in a lightbulb?" at the end of http://www.flightlab.com/~joe/sgml/faq-not.txt.) 



Keep us posted on the evolution of your ideas about these things...



Bob
[Empty comment]