Rewriting Source Content Descriptions as Versa Queries

I recently read Morten Frederiksen's blog entry about implementing Source Content Descriptions as SPARQL queries in Redland and was quite interested. Especially the consideration that such queries could be automatically generated and the set of these queries you would want to ask is small and straight forward. Even more interesting was Morten's step-by-step walk-thru of how such queries would be translated to SQL queries on a Redland Triple store sitting on top of MySQL (my favorite RDBMS deployment for 4RDF as well).

However, I couldn't help but wonder how such a set of queries would be expressed in Versa (in my opinion, a language more aligned with the data model it queries than it's SQL-RDQL counter-parts). So below was my attempt to port the queries into versa:

Classes used in the store

SPARQL
SELECT DISTINCT ?Class
WHERE { ?R rdf:type ?Class }
Versa
set(all() - rdf:type -> *)

Predicates that are used with instances of each class

SPARQL
SELECT DISTINCT ?Class, ?Property
  WHERE { ?R rdf:type ?Class .
        OPTIONAL { ?R ?Property ?Object .
                   FILTER ?Property != rdf:type } }
Versa
difference(
  properties(set(all() - rdf:type -> *)),
  set(rdf:type)
)

Do all instances of each class have a statement with each predicate?

It wasn't clear to me if the intent was to check if all classes have a statement with each predicate as specified by an ontology or to just count how many properties each class instance has. The latter interpretation is the one I went with (it's also simpler). This particular query will return a list of lists, each inner list consisting of two values: the URI of a distinct class instance and the number of distinct properties described in a statements about it (except rdf:type)

Versa
distribute(
  set(all() |- rdf:type -> *),
  '.',
  'length(
    difference(
      properties(.),
      set(rdf:type)
    )
  )'
)

Is the type of object in a statement with each class/predicate combination always the same?

I wasn't clear on the intent of this query, either. I wasn't sure if he meant to ask this of the combination with all predicates defined in an ontology or all predicates on class instances in the graph being queried.

But there you have it.

NOTE: The use of the set function was in order to guarantee that only distinct values were returned and may have been used redundantly with functions and expressions that already account for duplication.

[Uche Ogbuji]

via Copia
2 responses
"It wasn't clear to me if the intent was to check if all classes have a statement with each predicate as specified by an ontology or to just count how many properties each class instance has"



I may have misunderstood you, but just to clarify what Morten was doing:



A source content description is an expression (using OWL) of what an RDF store contains -- this might be useful for query routing, query optimisation, or just knowing 'what can I ask?'. This is really about SPARQL the protocol, rather than the query language.



My original example was written by hand (I put the data in, so I knew what kind of thing was in there). But, of course, the OWL description is just a reflection of what's in the store, so we ought to be able to mine that description. And that's what Morten has done using SQL.



So the short answer is: counting is what we want, and you did the right thing :-)



It's cool that Versa could be used to do this mining, btw.
PS:



"Is the type of object in a statement with each class/predicate combination always the same?"



That's to work out if an 'owl:allValuesFrom' restriction applies. If things of type X always have Ys at the end of some property, then it's useful to put that in the description. Deeper queries are possible, for example.