2006 conferences, part 3

Semantic Technology Conference 2006

Copia wasn't around for me to post about what a great time I had at STC05, which was in downtown San Francisco, but I did type up some notes in my developerWorks column . It doesn't hurt that my presentation was packed (pretty big room, too) with very attentive folks who provided excellent feedback throughout the rest of the conference. I'm wondering whether to propose the same talk again (updated, of course). I'm definitely looking forward to the next one. (March 6-9 in San Jose). STC05 was quite visibly a big success and if this one is as well organized, I expect it will become a fixture in my calendar year to year.

Three is probably my conference quotient for next year, but who knows? Things may work out for me to wander yet more. Extreme is the conference that I regret missing most year after year, but August is almost always an impossible time for me. We'll see about 2006.

[Uche Ogbuji]

via Copia

RDF IRC Agent - Emeka

I've recently been working on an IRC bot (based on Sean Palmer's phenny) called Emeka which is meant as a tool for demonstrating Versa and other related RDF facilities. Currently, it supports two functions:

  • .query <abritrary URI> " .. Versa Query .. "
  • .query-deep <arbitrary URI> steps " .. Versa Query .. "

The first, causes Emeka to parse the RDF document (associated with the given URI) as RDF/XML and then as N3 (if the first attempt fails). He then evaluates the submitted Versa Query against the Graph and responds with a result. The second function does the same with the exception that it recursively loads RDF graphs (following rdfs:seeAlso statements) N times, where N is the second argument. This is useful for extracting FOAF communities from a starting document (which was my original motivation for this).

By default Emeka has the following namespace prefixes bound:

daml,rdf,rdfs,owl,xsd,log (n3's log), dc,rss,foaf

Emeka is a work in progress and is currently idling in #swhack and #4suite (as we speak and #foaf,#swig eventually). Some ideas for other services available from this bot:

  • augmenting it's default namespace mapping
  • stored queries (for example: query for retrieving the latest rss:item in a feed)
  • Rule invokation (through FuXi's prospective-query function)
  • Interactive question and example demonstration of Versa function(s)
  • More sophisticated interaction with Del.icio.us RSS feeds (for web page cataloging)

Other suggestions are welcome

see #swhack logs for an example

Chimezie Ogbuji

via Copia

Identifying BNodes via RDF Query

Sorry couldn't help but commence round 3 (I believe it is) of Versa vs SPARQL. In all honesty, however, this is has to do more with RDF itself than it with either query language. It is primarily motivated by a very informative and insightful take (by Benjamin Nowack) on the problems regarding identifying BNodes uniquely in a query. His final conclusion (as I understood it) is that although the idea of identifying BNodes directly by URI seems counter-inituitive to the very nature of BNodes (anonymous resources) it is a practical necessity (one that I have had to use more often than not with Versa and caused him to have to venture outside the boundaries of the SPARQL specification for a solution). This is especially the case when you don't have much identifying metadata associated with the BNode in question (where if you did you could rely on inferencing - explicit or otherwise).

Well, ironically, the reason why this issue never occured to me is that in Versa, you refer to resources (for identification purposes) by URI regardless of whether they are blank nodes or not. I guess I would interpet this functionality as leaving it up to the author of the query to understand the exact nature of BNode URI's (that they are transient,possibly inconsistent, etc.)

Chimezie Ogbuji

via Copia

Some DCMI clarification on dc:type vocbularies

In an earlier entry I discussed the strange hotch-potch that's the set of recommended values for dc:type.

I did find a resource with some insight, straight from the Dublin Core (it seems). The question is:

we classify text documents with a descriptive term from a controlled vocabulary -- 'News', 'Press Releases', 'Situation Reports', etc. Is this a DCM element set 'Description', a 'Format', a 'Type' or, if not any of these, then what? If it is a 'Type', what is the appropriate sub-element? We could use Type -> Text, but that doesn't refer to a descriptor used for categorization of data, which is our purpose for using such terms in the first place.

This to me is clearly the province of dc:type, and the question is how one reconciles a value such as "Press Release" with the approach DCMI has taken with their recommended values. The DCMI reply (drafted by Pete Johnston):

The examples you cite for your "descriptor used for categorization of data" look to me like examples of values for dc:type. They describe (I think! ) "the nature or genre of the content of the resource". DCMI does provide a simple, high level "DCMI Type Vocabulary" (which, as you note, includes the term "Text" ), but you are not limited to using that Type Vocabulary and can use your own "local" vocabulary to provide values for dc:type. Having said that, you need to bear in mind what applications will be consuming your metadata and whether those applications are programmed to interpret your values as you expect. Especially if you are sharing your metadata with applications that may not have "built-in knowledge" of your "local" Type Vocabulary, it is considered good practice to include a term from the DCMI Type Vocabulary where possible. All DC metadata elements are repeatable (unless in the context of your application some additional constraints have been specified ), so you could include two occurrences of dc:type in each metadata description: dc:type = "Text" dc:type = "Press Release"

Pete goes on to explain the concept of DCMI element qualification (I do suggest you be familiar with this concept if you're using DCMI), but that doesn't seem to me to be at the heart of my issue. I'm just surprised that DCMI doesn't have such a sample vocabulary available.

Anyway, in my work at Sun, we're probably going to have to roll our own. One source of fodder I found are in LOM (see my article). The values are:

Exercise, Simulation, Questionnaire, Diagram, Figure, Graph, Index, Slide, Table, Narrative Text, Exam, Experiment, ProblemStatement, SelfAssesment

Too pedagogically oriented for our use, but some fodder nevertheless. Another source I found is PRISM (see my article). It's table #16 in the PRISM 1.2 spec. And here is how the PRISM spec introduces dc:type (from section 5.2.15):

Definition: The style of presentation of the resource's content, such as image vs. sidebar.
Comment: The `type' of a resource can be many different things. In PRISM descriptions, the dc:type element takes values that indicate the style of presentation of the content, such as "Map", "Table", or "Chart". This is in contrast to prism:category, which represents the genre, or stereotypical intellectual content type, of the resource. For example, the genre `electionResults' can be presented in a map, a table, or a chart. Recommended practice for PRISM implementations is to use a value from Table 16: Controlled Vocabulary of Presentation Styles, expressed as a URI reference. Implementations MUST also be able to handle text values, but interoperation with text values cannot be guaranteed. To describe the physical size or digital file format of the resource, use the dc:format element.
Example: The two examples below show how prism:type, prism:category, and dc:format all describe different aspects of a resource. For brevity, the examples below use relative URI references. Assume that they are within the scope of a base URI declaration: xml:base="http://prismstandard.org/vocabularies/1.2/"

<dc:type rdf:resource="resourcetype.xml#article"/>
<prism:category rdf:resource="category.xml#column"/>
<dc:format>text/html</dc:format>
<dc:type rdf:resource="resourcetype.xml#birdsEye"/>
<prism:category rdf:resource="category.xml#photo"/>
<dc:format>image/jpeg</dc:format>

[Uche Ogbuji]

via Copia

Pythonic SPARQL API over rdflib

I've recently been investigating the possiblity of adapting an existing SPARQL parser/query engine on top of 4RDF - mostly for the eventual purpose of implementing a sparql-eval Versa extension function - was pleased to see there has already been some similar work done:

Although this isn't exactly what I had in mind (the more robust option would be to write an adaptor for Redland's model API and execute SPARQL queries via rasqal ), it provides an interesting pythonic analog to querying RDF.

Chimezie Ogbuji

via Copia

Versa by Deconstruction

I was recently compelled to write an introductory companion to the Versa specification. The emphasis for this document (located here) is with readers with little to no experience with formal language specifications and/or with the RDF data model. It is inspired by it's predecessors (which make good follow-up material):

I initially started using Open Office Writer to compose an Open Office Document and export it to an HTML document. But I eventually decided to write it in MarkDown and use pymarkdown to render it to an HTML document stored on Copia.

The original MarkDown source is here.

-- Chimezie

[Uche Ogbuji]

via Copia

What's up with the dc:type value recommendations?

In my work at Sun we've been looking for better ways to rationalize content purpose metadata for management of aggregated XML records. I had occasion to look at the DCMI Type Vocabulary. DCMI Recommendation. This is an ancient document, and was not sure what to make of it. One thing for sure is that we can't use it, or anything like it. We'll have to come up with our own values. I do wonder about the rationale behind that list. It seems quite the hotch-potch:

Now the definition of dc:type is: "The nature or genre of the content of the resource". I can see how one could fit parts of the above list into this definition, but when I read this definition before seeing the list, I assumed I'd see things such as "poem", "short story", "essay", "news report", etc. From the business point of view, I'd be looking for "brochure", "white paper", "ad copy", "memo", etc. I tend to think this would be more generally useful (if much harder to standardize). Maybe ease of standardization was the rationale for the above? But even if so, it seems an odd mix. I've run out of time for now to ponder the matter further (gotta get back to that client work), but do I wonder whether there are recommendations for dc:type that more closely meet my expectations.

[Uche Ogbuji]

via Copia

FuXi v0.6 - Rearchitected / Repackaged

I've been experimenting with the use of FuXi as an alternative in situations where I had been manipulating application-specific RDF content using Versa within a host language (XSLT). In some cases I've been able to reduce a very complex set of XSLT logic to 1-2 queries on RDF models extended via a handful of very concise rules (expressed as N3). I'm hoping to build some usecases to demonstrate this later.

The result is that I've rearchitected FuXi to work as a blackbox directly with a 4RDF Model instance (it is now query agnostic, so it can be plugged in as an extension library to any other/future RDF querying language bound to a 4RDF model). Prior to this version, it was extracting formulae statements by Versa queries instead of directly through the Model interfaces.

Right now I primarily use it through a single Versa function prospective-query. Below is an excerpt from the README.txt describing it's parameters:

prospective-query

prospective-query( ruleGraph, targetGraph, expr, qScope=None)

Using FuXi, it takes all the facts from the current query context (which may or may not be scoped) , the rules from the <ruleGraph> scope and invokes/executes the Rete reasoner. It adds the inferred statements to the <targetGraph> scope. Then, it performs the query <expr> within the <qScope> (or the entire model if None), removing the inferred statements upon exit


FuXi is is now a proper python package (with a setup.py) and I've moved it (permanently - I hope) to: http://copia.ogbuji.net/files/FuXi

I was a little unclear on Pychinko's specific dependencies with rdflib and cwm in my previous post, but Yarden Katz cleared up the confusion in his comments (thanks).

The installation and use of FuXi should be significantly easier than before with the recent inclusion of the N3 deserializer/parser into 4Suite.

Chimezie Ogbuji

via Copia

FuXi - Versa / N3 / Rete Expert System

Pychinko is a python implementation of the classic Rete algorithm which provides the inferencing capabilities needed by an Expert System. Part of Pychinko works ontop of cwm / afon out of the box. However, it's Interpreter only relies on rdflib to formally represent the terms of an RDF statement.

FuXi only relies on Pychinko itself, the N3 deserializer for persisting N3 rules, and rdflib's Literal and UriRef to formally represent the corresponding terms of a Pychinko Fact. FuXi consists of 3 components (in addition to a 4RDF model for Versa queries):

I. FtRdfReteReasoner

Uses Pychinko and N3RuleExtractor to reason over a scoped 4RDF model.

II. N3RuleExtractor

Extracts Pychinko rules from a scoped model with statements deserialized from an N3 rule document

III. 4RDF N3 Deserializer

see: N3 Deserializer

The rule extractor reverses the reification of statements contained in formulae/contexts as performed by the N3 processor. It uses three Versa queries for this

Using the namespace mappings:

Extract ancendent statements of logical implications

distribute(
  all() |- log:implies -> *,
  '.',
  '. - n3r:statement -> *'
)

Extract implied / consequent statements of logical implications

distribute(
  all() - log:implies -> *,
  '.',
  '. - n3r:statement -> *'
)

Extract the terms of an N3 reified statement

distribute(
  <statement>,
  '. - n3r:subject -> *',
  '. - n3r:predicate -> *',
  '. - n3r:object -> *'
)

The FuXi class provides methods for performing scoped Versa queries on a model extended via inference or on just the inferred statements:

For example, take the following fact document deserialized into a model:

@prefix : <http://foo/bar#> .
:chimezie :is :snoring .

Now consider the following rule:

@prefix ex: <http://foo/bar#> .
{?x ex:is ex:snoring} => {?x a ex:SleepingPerson} .

Below is a snapshot of Fuxi perforing the Versa query “type(ex:SleepingPerson)” on a model extended by inference using the above rule:

Who was FuXi? Author of the predecessor to the King Wen Sequence

Chimezie Ogbuji

via Copia