BNode Drama for your Mama

You know you are geek when it's 5am in the morning and you are wrestling with existential quantification and their value in querying. This was triggered originally by the ongoing effort to extend an already expressive pattern-based RDF querying language to cover more usecases. The motivation being that such patterns should be expressive beyond just the level of triple-matching since the core RDF model consists of a level of granularity below statements (you have literals, resources, and bnodes, ..). I asked myself if there was a justifiable reason why Versa at it's core does not include BNodes?:

Blank nodes are treated as simply indicating the existence of a thing, without using, or saying anything about, the name of that thing. (This is not the same as assuming that the blank node indicates an 'unknown' URI reference; for example, it does not assume that there is any URI reference which refers to the thing. The discussion of Skolemization in appendix A is relevant to this point.)

I don't remember the original motivation for leaving BNodes out of the core query data types, but in retrospect I think it it was a good decision and not only because the SPARQL specification does something similar (in interpreting BNodes as an open-ended variable). But it's worth noting that the section on blank nodes appearing in a query as opposed to appearing to a query result (or existing in the underlying knowledge base) is quite short:

A blank node can appear in a query pattern. It behaves as a variable; a blank node in a query pattern may match any RDF term.

Anyways, at the time I noticed this lack of BNodes in query languages, I had a misconception about BNodes. I thought they represented individual things we want to make statements about but don't know their identification or don't want to have to worry about assigning identification about them (this is probably 90% of the way BNodes are used in reality). This confusion came from the practical way BNodes are almost always handled by RDF data stores (Skolemization):

Skolemization is a syntactic transformation routinely used in automatic inference systems in which existential variables are replaced by 'new' functions - function names not used elsewhere - applied to any enclosing universal variables. In RDF, Skolemization amounts to replacing every blank node in a graph by a 'new' name, i.e. a URI reference which is guaranteed to not occur anywhere else. In effect, it gives 'arbitrary' names to the anonymous entities whose existence was asserted by the use of blank nodes: the arbitrariness of the names ensures that nothing can be inferred that would not follow from the bare assertion of existence represented by the blank node.

This misconception was clarified when Bijan Parsia ({scope(PyChinko)} => {scope(FuXi)}) expressed that he had issue with my assertion(s) that there are some compromising redundancies with BNodes, Literals, and simple entailment with regards to building programmatic APIs for them.

Then the light bulb went off that the semantics of BNodes are (as he put it) much stronger than they are most often used. Most people who use BNodes don't mean to use it to state that there is a class of things which have the asserted set of statements made about them. Consider the difference between:

  1. Who are all the people Chime knows?
  2. There is someone Chime knows, but I just don't know his/her name right now
  3. Chime knows someone! (dudn't madder who)

The first scenario is the basic use case for variable resolution in an RDF query and is asking for the resolution of variable ?knownByChime in:

<> foaf:knows ?knownByChime.

Which can be [expressed] in Versa (currently) as:


Or eventually (hopefully) as:


And in SPARQL as:

  <> foaf:knows ?knownByChime

The second case is the most common way people use BNodes. You want to say Chime knows someone but don't know a permanent identifier for this person or care to at the time you make the assertion: foaf:knows _:knownByChime

But RDF-MT specifically states that BNodes are not meant to be interpreted in this way only. Their semantics are much stronger. In fact, as Bijaan pointed out to me, the proper use for BNodes is as scoped existentials within ontological assersions. For example owl:Restrictions which allow you to say things like: The named class KnowsChime consists of everybody who knows Chime:

@prefix mc <>.
  @prefix owl <>.
  :KnowsChime a owl:Class;
          a owl:Restriction;
          owl:onProperty foaf:knows;
          owl:hasValue mc:chime
        rdfs:label "KnowsChime";
        rdfs:comment "Everybody who knows Chime";

The fact that BNodes aren't meant to be used in the way they often are leads to some suggested modifications to allow BNodes to be used as 'temporary identifiers' in order to simplify query resolution. But as clarified in the same thread, BNodes in a query doesn't make much sense - which is the conclusion I'm coming around to: There is no use case for asserting an existential quantification while querying for information against a knowledge base. Using a variable (in the way SPARQL does) should be sufficient. In fact, all RDF querying usecases (and languages) seem to be reducable to variable resolution.

This last part is worth noting because it suggests that if you have a library that handles variable resolution (such as rdflib's most recent addition) you can map any query language to (Versa/SPARQL/RDFQueryLanguage_X) it by reducing it to a set of triple patterns with the variables you wish to resolve.

So my conclusions?:

  • Blank Nodes are a neccessary component in the model (and any persistence API) that unfortunately have much stronger semantics (existential quanitifcation) than their most common use (as temporary identifiers)
  • The distinction between the way BNodes are most often used (as a syntactic shorthand for a single resource for which there is no known identity - at the time) and the formal definition of BNodes is very important to note - especially to those who are very much wed to their BNodes as Shelly Powers has shown to be :).
  • Finally, BNodes emphatically do not make sense in the context of a query - since they become infinitely resolvable variables: which is not very useful. This confusion is further proof that (once again), for the sake of minimizing said confusion and misinterpretation of some very complicated axioms there is plenty value in parenthetically (if not logically) divorcing (pun intended) RDF model theoretics from the nuts and bolts of the underlying model

Chimezie Ogbuji

via Copia