Binary Predicates in FOPL and at Large Volumes

I've wanted to comeback to the issue of RDF scalability of a relational model for some time (a topic that has been on my mind for some time). Earlier, I mentioned a Description Logics (DL) representation technique that would dramatically reduce the amount of size needed for most RDF graphs. I only know of one other RDF store (besides rdflib) that does this. At large volumes, metrics of query response time are more succeptible to space efficiency than pure seek time. At some point along the numerical scale, there will be a point where the amount of time it takes to resolve a query is more directly affected by the size of the knowledge base than anything else. When you consider the URI lexical grammar, skolemization, seek times, BTrees, and Hash-tables even interning (by that I mean the general reliance on uniqueness in crafting identifiers) has little effect to the high-volume metrics of FOPL.

Perhaps something more could be said about the efficiency of DL? I've suggested the possiblity of semantic compression (or 'forward consumption' if you think of it as analagous to forward chaining) where what can be implied is never added or is removed by some intelligent process (perhaps periodically). For example, consider a knowledge base that only stored 'knows' relationships (foaf:knows, perhaps) between between people. It would be very redundant to state that two individual are 'People' (foaf:Person) if they know each other (66.6% space saving right there). Couldn't the formality of DL be used to both enhance expressiveness as well as efficiency? In the same way that invariant representations make our neocortex so much more efficient at logical prediction? If not DL, perhaps at least the formality of a local domain ontology and it's rules? I was able to apply the same principle (though not in any formal way you could automate) to improve the speed of a content management knowledge base.

[Uche Ogbuji]

via Copia