A Perspective on Temporal Modeling in RDF

I just read the follow-up to a thread (Why we need explicit temporal labelling) on the formal modeling of time and time related semantics to RDF, specifically. I wanted to put my $0.02 in since, I spend a good deal of my time at work buried nose-deep in large volumes of bioinformatic cardiovascular data, most of which is largely temporal. I guess, to put it succintly, I just don't see the value in merging temporal semantics (not a very light weight model) into the fabric of your representation model.

We found (for our purposes) that by including our own specific temporal semantic vocabulary, we could ensure that we can answer questions such as:

How many patients had complained about chest pains witin 30 days of a specific surgical operation.

While at the same time avoiding the rigidness of temporal reasoning that formal models impose. Such formalisms (especially in distributed systems) are unecessary when you consider that most often, data as it is fetched (at any point in time) is 'complete' regardless of how it has varied over time.

Consider the RDF schema for OWL, whose identifier (the identifier of the URL from where it's content can be loaded) includes some temporal semantics (when it was published, and the suggestion that there are prior versions). Though the content might have changed over time, the entire document as it was at any point was 'consistent' in what it conveys. No additional temporal semantics is needed to capture the relations between versions or to maintain some 'sanity' (if you will) over the fact that the data changed over time.

And if such formalism is needed, it's rather easy to piggy back off existing ontologies ("Time Ontology in OWL" for instance.)

Furthermore, If you think about it, named contexts (graphs, scopes, etc..) already provide a more adequate solution to the issue of inconsistency of data (over time) from the same source. For instance, you can take advantage of syntactic RDF/XML and N3 sugar such as:

<> a owl:Ontology;
   dc:date "2002-07-13";

or it's RDF/XML equivalent:

<owl:Ontology 
  rdf:about="">
  <dc:date>2002-07-13</dc:date>
</owl:Ontology>

In order to capture enough provenance data to accomodate change.

Ironically, the ability to make provenance statements (one of which includes the date associated with this 'representation') about a named graph (identified by the URL from which it was loaded) is beyond the semantics of the RDF model. However, through it's use you can be specific about the source of triples and (in addition), you can include the specifics of version either within the identifier of the source of through provenance statements made about it.

I think the problem is more a modeling issue (and having the foresight to determine how you accomodate the change of data over time) than a shorcoming of the framework.

Chimezie Ogbuji

via Copia

Semantic hairball, y'all

I'm in San Jose and the Semantic Technology Conference 2006 has just wrapped up. A good time, as always, and very well attended (way up from even last year. This is an extraordinarily well organized conference). But I did want to throw up one impression I got from one of the first talks I went to.

The talk discussed an effort in "convergence" of MDA/UML, RDF/OWL, Web Services and Topic Maps. Apparently all the big committees are involved, from OMG, W3C, ISO, etc. Having been an enthusiastic early adopter in the first three technologies, I was violently struck by the casually side-stepped enormousness of this undertaking. In my view, all four projects had promising roots and were all eventually buried under the weight of their own complexity. And yet the convergence effort that's being touted seems little more sophisticated than balling all these behemoths together. I wonder what's the purpose. I can't imagine the result will be greater adoption for these technologies taken together. Many potential users already ignore them because of the barrier of impenetrable mumbo-jumbo. I can't imagine there would be much cross-pollination within these technologies because without brutal simplification and profiling model mismatches would make it impractical for an application to efficiently cross the bridge from one semantic modeling technology to the other.

I came to this conference to talk about how Microformats might present a slender opportunity for semantic folks to harness the volume of raw material being generated in the Web 2.0 craze. The trade-off is that the Web 2.0 craze produces a huge amount of crap metadata, and someone will have to clean up the mess in the resulting RDF models even if GRDDL is ever deployed widely enough to generate models worth the effort. And let's not even start on the inevitable meltdown of "folksonomies" (I predict formation of a black hole of fundamental crapitational force). I replaced my previous year's talk about how managers of controlled information systems could harness XML schemata for semantic transparency. I think next year I should go back to that. It's quite practical, as I've determined in my consulting experience. I'm not sure hitching information pipelines to Web 2.0 is the least bit practical.

I'm struck by the appearance of two extremes in popular fields of distributed information management (and all you Semantic Technology pooh-pooh-ers would be gob-smacked if you had any idea how deadly seriously Big Business is taking this stuff: it's popular in terms of dollars and cents, even if it's not the gleam in your favorite blogger's eye). On one hand we have the Daedalos committee fastening labyrinth to labyrinth. On the other hand we have the tower of Web 2.0 Babel. We need a mob in the middle to burn 80% of the AI-one-more-time-for-your-mind-magic off of RDF, 80% of the chicago-cluster-consultant-diesel off of MDA, 80% of the toolkit-vendor-flypaper off of Web services. Once the ashes clear, we need folks to build lightweight tools that actually would help with extracting value from distributed information systems without scaring off the non-Ph.D.s. I still think XML is the key, and that XML schema systems should have been addressing semantic transparency from the start, rather than getting tied up in static typing bondage and discipline.

I have no idea whether I can do anything about the cluster-fuck besides ranting, but I'll be squeezing neurons hard until XTech, which does have the eminent advantage of being an in-person meeting of the semantic, XML and Web 2.0 crowds.

Let's dance in Amsterdam, potnas.

See also:

[Uche Ogbuji]

via Copia

Closed World Assumptions, Conjunctive Querying, and Oracle 10g

I promised myself I would write at least one entry related to my experience at the 2006 Semantic Technology Conference here in San Jose, which has been an incredibly well attended and organized conference. I've found myself wanting to do less talking and more problem solving lately, but I came across an issue that has generated the appropriate amount of motivation.

For some time I've been (eagerly) monitoring Oracle's recent advances with their latest release (10g R2) which (amongst other things) introduced (in my estimation) what will turn out to be a major step in bridging the gap between the academic dream of the Semantic Web and the reality of the day-to-day problems that are relevant to technologies in that sphere of influence.

But first things first (as the saying goes). Basically, the Oracle 10g R2 RDF implementation supports the logical separation of RDF triples into named Models as well as the ability to query across explict sets of Models. However, the querying mechanism (implemented as an extension to SQL – SDO_RDF_MATCH) doesn't support the ability to query across the entire fact / knowledge base – i.e., the aggregation of all the named Models contained within.

I like to refer to this kind of a query as a Conjunctive Query. The term isn't mine, but it has stuck, and has made its way into the rdflib Store API. In fact, the rdflib API now has the concept of a Conjunctive Graph which behaves like a named graph with the exception that the query space is the set of all named graphs in the knowledge base.

Now, it would be an easy nitpick to suggest that since the formal RDF Model doesn't provide any guidance on the separation of RDF triples into addressable graphs, implementors can not be held at fault for deciding not to support such a separation. However, the large body of literature on Named Graphs as well as the support for querying within named sets of graphs in the more contemporary RDF querying languages does suggest that there is real value in separating raw triples this way and in being able to query across these logical separations transparently.

I think the value is twofold: Closed World Assumptions and query performance. Now, the notion of a boundary of known facts, will probably raise a red flag amongst semantic web purists and some may suggest that closed world assumptions cut against the grain of a vision of a massively distributed expert system. For the uninitiated, open world assumptions are where the absence of an assertion in your fact base does not necessarily suggest that the assertion (or statement) isn't true. That is, if the statement 'the sky is blue' is not in the knowledge base, you can not automatically assume that the sky is not blue.

This limitation makes sense where the emphasis is on the distribution of data (a key component of the semantic web vision), however it essentially abandons the value in applying the formal semantics of RDF (and knowledge representation, in general) to closed systems – systems where the data is complete to a certain extent and makes sense to live in a query silo.

The most practical example I can think of is the one I work with daily: medical research data that is often subjected to statistical analysis for deducing trends. You can't make suggestions derived from statistical trends in your data if you don't have some minimal confidence that the set of data you are working with is 'complete' enough to answer the questions you set out to ask.

Closed world assumptions also open the door to other heuristic optimizations that are directly relevant to query processors.

Finally, where RDF databases are built on top of SQL stores, being able to partition your query space into an additional indexable constraint (I say additional, because there are other partitioning techniques that impact scalability and response) makes a world of difference in a framework that has already been rigorously tuned to take maximal advantage of such rectangular partitioning. To a SQL store implementing an RDF model, the name (or names) of a graph is a low cardinality, indexable, constraint (there will always be less graphs than total triples) that can be the difference of several orders of magnitude in the overall query response time.

Named contexts lend themselves quite well to two-part queries where the first part identifies a set of named graphs (within a conjunctive graph or known universe) that match a certain criteria and then query only within those matching graphs. Once the query resolver has identified the named graphs, the second part of the query can be dispatched in a very targeted fashion. Any RDF knowledge base that takes advantage of the logical seperation that named graphs provide will inevitably find itself being asked such questions.

Now I've said all this not to berate the current incarnation of Oracle's RDF solution but to take the opportunity to underline the value in a perspective that is often shoved aside by the vigor of semantic web evangelism. To be fair, the inability to dispatch conjunctive queries is pretty much the only criticism of the Oracle 10g R2 RDF model. I've been aware of it for some time, but didn't want to speak to the point directly until it was 'public knowledge.'

The Oracle 10g R2 RDF implementation demonstrates amongst other things:

  • Namespace management
  • Interned identifiers
  • Reification
  • Collections / Containers
  • Forward-chained rule firing (with a default ruleset for RDFS entailment)
  • Off the chart volume capability (.5 - 5 second response time on 80 Million triples - impressive regardless of the circumstance of the benchmark)
  • Native query format (SPARQL-ish SDORDFMATCH function)

You can check it out the DBA manual for the infrastructure and the uniprot benchmarks for performance.

I've too long been frustrated by the inability of 'industry leaders' to put their money where their mouth is when it comes to adoption of 'unproved' technologies. Too often, the biggest impedance to progress from the academic realm to the 'enterprise' realm is politics. People who simply want to solve difficult problems with intelligent and appropriate technologies have their work cut out for them against the inevitable collisions with politics and technological camp warefare (you say microformats, I say architectural forms, you say SOA, I say REST). So for that reason, it makes me somewhat optimistic that a company that truly has every thing to lose in doing so decided to make such a remarkable first step. Their recent purchase of Berkeley DB XML (the most widely supported open-source Native XML datastore) is yet another example of a bold step towards ubiquitous semi-structured persistence. But please, top it off with support for conjunctive queries.

[Uche Ogbuji]

via Copia

A RESTful Scutter Protocol for Redfoot Kernel

Redfoot recently had 'native' scuttering capabilities added to it's kernel. The original motivation was as a testbed to determine some reasonable parameters for a 'scuttering protocol'. That document was prepared in haste, but for the most part, the load function on Redfoot has been extended to provide built-in scuttering capabilities - using that protocol as a guide.

Redfoot provides a framework for loading (and persisting) remotely hosted chunks of executable code (redfoot programs – the current ones are mostly written in Python / KID). The most common context in which scuttering is discussed is the interpretation of FOAF graphs (social networks). However, I found the idea of a network of executable code with the dependencies (on applicaton 'data' and other third party 'bits' of funcionality / code) expressed via rdfs:seeAlso and rdfs:isDefinedBy very appealing as an alternative subtext to the whole 'Semantic Web' idea.

The main point of the scuttering protocol above is the use of a provenance graph (RDF graphs which contain statements about other RDF graphs) which uses a vocabulary to express (and persist) the HTTP headers of remote RDF graphs.

The cached HTTP headers are used to automate subsequent content-negotiated requests to the same resources to effectively mirror an HTTP network of RDF graphs (where the links are expressed by rdfs:seeAlso, owl:import, and rdfs:isDefinedBy) in a local store – applying RESTful best practices.

Below is an example of the statements added after a fetch of the URL http://metacognition.info/profile/webwho.xrdf.

For every URL fetched, the 'scutter links' (rdfs:seeAlso,rdfs:isDefinedBy,and owl:import) are traversed recursively (up to a system-wide maximum recursion depth). Links that do not resolve to an RDF graph are marked in the local cache as “non-rdf” to avoid redundant fetches to URLs known not to be RDF.

Chimezie Ogbuji

via Copia

A univesal feed -> RDF mapping for Emeka

I found a nice mapping from Universal Feed Parser to RDF (SKOS,DC,AtomOWL), that Emeka will employ:

Each entry is an instance of (atomOwl:Entry,rss:item)

  • The URL of the feed -> an instance of atomOwl:Feed
  • Feed - atomOwl:entry -> entries
  • entry (link or id as URI) - rdfs:label,skos:prefLabel,dc:title -> entry.title
  • entry - dc:description,atomOwl:summary,rdfs:comment -> entry.summary
  • entry,feed - dc:creator, foaf:maker -> foaf:Person
  • entry.author_detail.name -> foaf:name
  • entry.author_detail.email -> foaf:mbox
  • entry.author_detail.href is the URL of the author
  • entries.tags -> skos:Collection
  • entries.tags.label -> skos:prefLabel
  • entries.tags.scheme + entries.tags.term (URI resolution) -> URI of skos:Concept
  • entry - dc:created,dc:date,atomOwl:published -> entry.published

Chimezie Ogbuji

via Copia

Learn how to invent XML languages, then do so

There has been a lot of chatter about Tim Bray's piece "Don’t Invent XML Languages". Good. I'm all for anything that makes people think carefully about XML language design and problems of semantic transparency (communicated meaning of XML structure). I'm all for it even though I generally disagree with Tim's conclusions. Here are some quick thoughts on Tim's essay, and some of the responses I've seen.

Here’s a radical idea: don’t even think of making your own language until you’re sure that you can’t do the job using one of the Big Five: XHTML, DocBook, ODF, UBL, and Atom.—Bray

This is a pretty biased list, and happens to make sense for the circles in which he moves. Even though I happen to move in much the same circles, the first thing I'd say is that there could hardly ever be an authoritative "big 5" list of XML vocabs. There is too much debate and diversity, and that's too good a thing to sweep under the rug. MS Office XML or ODF? OAGIS or UBL? RSS 2.0 or Atom? Sure I happen to plump for the latter three, as Tim does, but things are not so clear cut for the average punter. (I didn't mention TEI or DocBook because it's much less of a head to head battle).

I made my own list in "A survey of XML standards: Part 3—The most important vocabularies" (IBM developerWorks, 2004). It goes:

  • XHTML
  • Docbook
  • XSL-FO
  • SVG
  • VoiceXML
  • MathML
  • SMIL
  • RDF
  • XML Topic Maps

And in that article I admit I'm "just scratching the surface". The list predates first full releases of Atom and ODF, or they would have been on it. I should also mention XBEL, which is, I think, not as widely trumpetd, but just about as important as those other entrants. BTW, see the full cross-reference of my survey of XML standards.

Designing XML Languages is hard. It’s boring, political, time-consuming, unglamorous, irritating work. It always takes longer than you think it will, and when you’re finished, there’s always this feeling that you could have done more or should have done less or got some detail essentially wrong.—Bray

This is true. It's easy to be flip and say "sure, that's true of programming, but we're not being advised to write no more programs". But then I think this difficulty is even more true of XML design than of programming, and it's worth reminding people that a useful XML vocabulary is not something you toss off in the spare hour. Simon St.Laurent has always been a sound analyst of the harm done by programmers who take shortcuts and abuse markup in order to suite their conventions. The lesson, however, should be to learn best practices of markup design rather than to become a helpless spectator.

If you’re going to design a new language, you’re committing to a major investment in software development. First, you’ll need a validator. Don’t kid yourself that writing a schema will do the trick; any nontrivial language will have a whole lot of constraints that you can’t check in a schema language, and any nontrivial language needs an automated validator if you’re going to get software to interoperate.

If people would just use decent schema technology, this point would be very much weakened. Schema designers rarely see beyond plain W3C XML Schema or RELAX NG. Too bad. RELAX NG plus Schematron (with XPath 1.0/XSLT 1.0 drivers) covers a huge number of constraints. Add in EXSLT 1.0 drivers for Schematron and you can cover probably 95+% of Atom's constraints (probably more, actually). Throw in user-defined extensions and you have a very powerful and mostly declarative validation engine. We should do a better job of rendering such goodness to XML developers, rather than scaring them away with duct-tape-validator bogeymen.

Yes, XHTML is semantically weak and doesn’t really grok hierarchy and has a bunch of other problems. That’s OK, because it has a general-purpose class attribute and ignores markup it doesn’t know about and you can bastardize it eight ways from center without anything breaking. The Kool Kids call this “Microformats”...

This understated bit is, I think, the heart of Tim's argument. The problem is that I still haven't been able to figure out why Microformats have any advantage in Semantic transparency over new vocabularies. Despite the fuzzy claims of μFormatters, a microformat requires just as much specification as a small, standalone format to be useful. It didn't take me long kicking around XOXO to solve a real-world problem before this became apparent to me.

Some interesting reactions to the piece

Dare Obasanjo. Dare indirectly brought up that Ian Hickson had argued against inventing XML vocabularies in 2003. I remember violently and negatively reacting to the idea that everyone should stick to XHTML and its elite companions. Certainly such limitations make sense for some, but the general case is more nuanced (thank goodness). Side note: another pioneer of the pessimistic side of this argument is Mark Pilgrim http://www.xml.com/pub/au/164. Needless to say I disagree with many of his points as well.

I've always considered it a gross hack to think that instead of having an HTML web page for my blog and an Atom/RSS feed, instead I should have a single HTML page with <div class="rss:item"> or <h3 class="atom:title"> embedded in it instead. However given that one of the inventors of XML (Tim Bray) is now advocating this approach, I wonder if I'm simply clinging to old ways and have become the kind of intellectual dinosaur I bemoan.—Obasanjo

Dare is, I think, about as stubborn and tart as I am, so I'm amazed to see him doubting his convictions in this way. Please don't, Dare. You're quite correct. Microformats are just a hair away from my pet reductio ad absurdum<tag type="title"> rather than just <title>. I still haven't heard a decent argument for such periphrasis. And I don't see how the fact that tag is semantically anchored does anything special for the stepchild identifier title in the microformats scenario.

BTW, there is a priceless quote in comments to Dare:

OK, so they're saying: don't create new XML languages - instead, create new HTML languages. Because if you can't get people to [separate presentation from data], hijack the presentation!—"Steve"

Wot he said. With bells on.

Danny Ayers .

I think most XML languages have been created by one of three processes - translating from a legacy format; mapping directly from the domain entities to the syntax; creating an abstract model from the domain, then mapping from that to the XML. The latter two of these are really on a greyscale: a language designer probably has the abstract entities and relationships in mind when creating the format, whether or not they have been expressed formally.—Ayers

I've had my tiffs with RDF gurus lately, but this is the sort of point you can trust an RDF guru to nail, and Danny does so. XML languages are, like all languages, about expression. The farther the expression lies from the abstraction being expressed (the model), the more expensive the maintenance. Punting to an existing format that might have some vague ties to the problem space is a much worse economic bet than the effort of designing a sound and true format for that problem space.

To slightly repurpose another Danny quote towards XML,

...in most cases it’s probably best to initially make up afresh a new representation that matches the domain model as closely as possible(/appropriate). Only then start looking to replacing the new terms with established ones with matching semantics. But don’t see reusing things as more important than getting an (appropriately) accurate model.—Ayers

Ned Batchelder. He correctly identifies that Tim Bray's points tend to be most applicable to document-style XML. I've long since come to the conclusion (again with a lot of influence from Simon St.Laurent) that XML is too often the wrong solution for programmer-data-focused formats (including software configuration formats). Yeah, of course I've already elaborated in the Python context.

[Uche Ogbuji]

via Copia

Moving FuXi onto the Development Track

[by Chimezie Ogbuji]

I was recently prompted to consider updating FuXi to use the more recent CVS versions of both Pychinko and rdflib. In particular, I've been itching to get Pychinko working with the new rdflib API – which (as I've mentioned) has had it's API updated significantly to support (amongst other things) support for Notation 3 persistence.

Currently, FuXi works with frozen versions of cwm, rdflib, and Pychiko.

I personally find it more effective to work with reasoning capabilities within the context of a querying language than as a third party software library. This was the original motivation for creating FuXi. Specifically, the process of adding inferred statements, dispatching a prospective query and returning the knowledge base to it's original state is a perfect compromise between classic backward / forward chaining.

It frees up both the query processor and persistence layer from the drudgery of logical inference – a daunting software requirement in its own right. Of course, the price paid in this case is the cumbersome software requirements.

It's well worth noting that such on-demand reasoning also provides a practical way to combat the scalability limitations of RDF persistence.

To these ends, I've updated FuXi to work with the current (CVS) versions of rdflib, 4Suite RDF, and pychinko. It's essentially a re-write and provides 3 major modules:

  • FuXi.py (the core component – a means to fire the pychinko interpreter with facts and rules from rdflib graphs)
  • AgentTools.py (provides utility functions for the parsing and scuttering of remote graphs)
  • VersaFuXiExtensions.py (defines Versa extension functions which provide scutter / reasoning capabilities)

Versa Functions:

reason(expr)

This function takes a Versa expression as a string and evaluates it after executing FuXi using any rules associated with the current graph (via a fuxi:ruleBase property). FuXi (and Pychinko, consequently) use the current graph (and any graphs associated by rdfs:isDefinedBy or rdfs:seeAlso) as the set of facts against which the rules are fired.

class(instances)

This function returns the class(es) – rdfs:Class or owl:Class – of the given list of resources. If the current graph has already been extended to include inferred statements (via the reason function, perhaps), it simply returns the objects of all rdf:type statements made against the resources. Otherwise, it registers, compiles, and fires a set of OWL/RDFS rules (a reasonable subset of owl-rules.n3 and rdfs-rules.n3 bundled with Euler) against the current graph (and any associated graphs) before matching classes to return.

type(klasses)

This essentially overrides the default 4Suite RDF implementation of this 'built-in' Versa function which attempts to apply RDFS entailment rules in brute force fashion. It behaves just like class with the exception that it returns instances of the given classes instead (essentially it performs the reverse operation).

scutter(url,expr,steps=5)

This function attempts to apply some best practices in the interpretation of a network of remote RDF graphs. In particular it uses content negotiation and Scutter principles to parse linked RDF graphs (expressed in either RDF/XML or Notation 3). The main use case for this function (and the primary motivation for writing it) is identity-reasoning within a remsotely-hosted set of RDF Graphs (FOAF smushing for example)

The FuXi software bundle includes a short ontology documenting the two RDF terms: one is used to manage the automated association of a rule base with a graph and the other identifies a graph that has been expanded by inference.

I have yet to write documentation, so this piece essentially attempts to serve that purpose, however included in the bundle are some unittest cases for each of the above functions. It works off a small set of initial facts.

Unfortunately, a majority of the aforementioned software requirement liability has to do with Pychinko's reliance on the SWAP code base. Initially, I began looking for a functional subset to bundle but later decided it was against the spirit of the combined body of work. So, until a better solution surfaces, the SWAP code can be checked out from CVS like so (taken from ):

$ cvs -d:pserver:anonymous@dev.w3.org:/sources/public login
password? anonymous
$ cvs -d:pserver:anonymous@dev.w3.org:/sources/public get 2000/10/swap

The latest 4Suite CVS snapshot can be downloaded from ftp://ftp.4suite.org/pub/cvs-snapshots/4Suite-CVS.tar.gz,
Pychinko can be retrieved from the Mindswap svn repository, and rdflib can also be retrieved from it's svn repository.

Chimezie Ogbuji

via Copia

Store-agnostic REGEX Matching and Thread-safe Transactional Support in rdflib

[by Chimezie Ogbuji]

rdflib now has (checked into svn trunk) support for REGEX matching of RDF terms and thread-safe transactional support. The transactional wrapper provides Atomicity, Isolation, but not Durability (a list of reversal RDF operations is stored on the live instance - so they won't survive a system failure). The store implementation is responsible for Consistency.

The REGEX wrapper provides a REGEXTerm which can be used in any of the RDF term 'slots' with:

It replaces any REGEX term with a wildcard (None) and performs the REGEX match after the query invokation is dispatched to the store implementation it is wrapping.

Both are meant to work with a live instance of an RDF Store, but behave as a proxy for the store (providing REGEX and/or transactional support).

For example:

from rdflib.Graph import ConjunctiveGraph, Graph
from rdflib.store.REGEXMatching import REGEXTerm, REGEXMatching
from rdflib.store.AuditableStorage import AuditableStorage
from rdflib.store import Store
from rdflib import plugin, URIRef, Literal, BNode, RDF

store = plugin.get('IOMemory',Store)()
regexStorage = REGEXMatching(store)
txRegex =  AuditableStorage(regexStorage)
g=Graph(txRegex,identifier=URIRef('http://del.icio.us/rss/chimezie'))
g.load("http://del.icio.us/rss/chimezie")
print len(g),[t for t in g.triples((REGEXTerm('.*zie$'),None,None))]
g.rollback()
print len(g),[t for t in g]

Results in:

492 [(u'http://del.icio.us/chimezie', u'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', u'http://purl.org/rss/1.0/channel'), (u'http://del.icio.us/chimezie', u'http://purl.org/rss/1.0/link', u'http://del.icio.us/chimezie'), (u'http://del.icio.us/chimezie', u'http://purl.org/rss/1.0/items', u'QQxcRclE1'), (u'http://del.icio.us/chimezie', u'http://purl.org/rss/1.0/description', u''), (u'http://del.icio.us/chimezie', u'http://purl.org/rss/1.0/title', u'del.icio.us/chimezie')] 0 []

[Chimezie Ogbuji]

via Copia

Thinking XML #34: Search engine enhancement using the XML WordNet server system

Updated—Fixed link to "Serving up WordNet as XML"

"Thinking XML: Search engine enhancement using the XML WordNet server system"

Subtitle: Also, use XSLT to create an RDF/XML representation of the WordNet data
Synopsis: In previous installments of this column, Uche Ogbuji introduced the WordNet natural language database, and showed how to represent database nodes as XML and serve this XML though the Web. In this article, he shows how to convert this XML to an RDF representation, and how to use the WordNet XML server to enrich search engine technology.

This is the final part of a mini-series within the column. The previous articles are:

In this article I write my own flavor of RDF schema for WordNet, a transform for conversion from the XML format presented previously, and a little demo app that shows how you can use WordNet to enhance search with synonym capabilities (and this time it's a much faster approach).

I hope to publicly host the WordNet server I've developed in this series once I get my home page's CherryPy setup updated for 2.2.

See other articles in the column. Comments here on Copia or on the column's official discussion forum. Next up in Thinking XML, RDF equivalents for the WordNet/XML.

[Uche Ogbuji]

via Copia

Getting Some Mileage out of Semantic Works

Well, I recently had a need to write-up an OWL ontology describing the components of a 4Suite repository configuration file (which is expressed as an RDF graph, hence the use of OWL to formalize the format). There has been some mention (with regards to the long-term roadmap of the 4Suite repository component) of the possiblity of moving to a pure XML format.

Anyways, below is a diagram of the model produced by Semantic Works. I still think Protege and SWOOP provide much more bang for your buck (when you consider that they are free and Semantic Works isn't) and produce much more concise OWL/RDFS XML. But the ability to produce diagrams of this quality of a complex OWL ontology is definately a plus.

Semantic Works Diagram of 4Suite Repository Configuration Ontology Semantic Works Diagram of 4Suite Repository Configuration Ontology Semantic Works Diagram of 4Suite Repository Configuration Ontology

[Chimezie Ogbuji]

via Copia