Ontological Definitions for an Information Resource

I've somehow found myself wrapped-up in this dialog about information resources, their representations, and the relation to RDF. Perhaps it's the budding philosopher in me which finds the problem interesting. There seems to be some controversy about what is an appropriate definition for an information resource. I'm a big fan of not reinventing wheels if they have already been built, tested, and deployed.

The Architecture of the World-Wide Web says:

The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as "information resources."

I know of at least 4 very well-organized upper ontologies which have readily-available OWL representations: SUMO, Cyc, Basic Formal Ontology, and DOLCE. These are the cream of the crop in my opinion (and in the opinion of many others who are more informed about this type of thing). So, let us spend some time investigating where the poorly-defined Web Architecture term fits in these ontologies. This exercise is mostly meant for the purpose of reference. Every well-organized, upper ontology will typically have a singular, topmost term which covers everything. This would be (for the most part) the equivalent of owl:Thing and rdf:Resource

Suggested Upper Merged Ontology (SUMO)

Sumo has a term called "FactualText" which seems appropriate. The definition states:

The class of Texts that purport to reveal facts about the world. Such texts are often known as information or as non-fiction. Note that something can be an instance of FactualText, even if it is wholly inaccurate. Whether something is a FactualText is determined by the beliefs of the agent creating the text.

The SUMO term has the following URI for FactualText (at least in the OWL export I downloaded):

http://reliant.teknowledge.com/DAML/SUMO.owl#FactualText

Climbing up the subsumption tree we have the following ancestral path:

  • Text: "A LinguisticExpression or set of LinguisticExpressions that perform a specific function related to Communication, e.g. express a discourse about a particular topic, and that are inscribed in a CorpuscularObject by Humans."

The term Text has multiple parents (LinguisticExpression and Artifact). Following the path upwards from the first parent we have:

  • LinguisticExpression: "This is the subclass of ContentBearingPhysical which are language-related. Note that this Class encompasses both Language and the the elements of Languages, e.g. Words."
  • ContentBearingPhysical: "Any Object or Process that expresses content. This covers Objects that contain a Proposition, such as a book, as well as ManualSignLanguage, which may similarly contain a Proposition."
  • Physical "An entity that has a location in space-time. Note that locations are themselves understood to have a location in space-time."
  • Entity "The universal class of individuals. This is the root node of the ontology."

Following the path upwards from the second parent we have:

  • Artifact: "A CorpuscularObject that is the product of a Making."
  • CorpuscularObject: "A SelfConnectedObject whose parts have properties that are not shared by the whole."
  • SelfConnectedObject: "A SelfConnectedObject is any Object that does not consist of two or more disconnected parts."
  • Object: "Corresponds roughly to the class of ordinary objects. Examples include normal physical objects, geographical regions, and locations of Processes"

Objects are a specialization of Physical, so from here we come to the common Entity ancestor

Cyc

Cyc has a term called InformationBearingThing:

A collection of spatially-localized individuals, including various actions and events as well as physical objects. Each instance of information-bearing thing (or IBT ) is an item that contains information (for an agent who knows how to interpret it). Examples: a copy of the novel Moby Dick; a signal buoy; a photograph; an elevator sign in Braille; a map ...

The Cyc URI for this term is:

http://sw.cyc.com/2006/07/27/cyc/InformationBearingThing

This term has 3 ancestors: Container-Underspecified, SpatialThing-Localized, and InformationStore. The latter seems most relevant, so we'll traverse its ancestry first:

  • InformationStore : "A specialization of partially intangible individual. Each instance of store of information is a tangible or intangible, concrete or abstract repository of information. The information stored in an information store is stored there as a consequence of the actions of one or more agents."
  • PartiallyIntangibleIndividual : "A specialization of both individual and partially intangible thing. Each instance of partially intangible individual is an individual that has at least some intangible (i.e. immaterial) component. The instance might be partly tangible (e.g. a copy of a book) and thus be a composite tangible and intangible thing, or it might be fully intangible (e.g. a number or an agreement) and thus be an instance of intangible individual object. "

From here, there are two ancestral paths, so we'll leave it at that (we already have the essense of the definition).

Going back to InformationBearingThing, below is the ancestral path starting from Container-Underspecified:

  • Container-Underspecified : "The collection of objects, tangible or otherwise, which are typically conceptualized by human beings for purposes of common-sense reasoning as containers. Thus, container underspecified includes not only the set of all physical containers, like boxes and suitcases, but metaphoric containers as well"
  • Area: "The collection of regions/areas, tangible or otherwise, which are typically conceptualized by human beings for purposes of common-sense reasoning as spatial regions."
  • Location-Underspecified: Similar definition as Area
  • Thing: "thing is the universal collection : the collection which, by definition, contains everything there is. Every thing in the Cyc ontology -- every individual (of any kind), every set, and every type of thing -- is an instance of (see Isa) thing"

Basic Formal Ontology (BFO)

BFO is (as the name suggests) very basic and meant to be an axiomatic implementation of the philosophy of realism. As such, the closest term for an information resource is very broad: Continuant

Definition: An entity that exists in full at any time in which it exists at all, persists through time while maintaining its identity and has no temporal parts.

However, I happen to be quite familiar with an extension of BFO called the Ontology of Biomedical Investigation (OBI) which has an appropriate term (derived from Continuant): information_content_entity

The URI for this term is:

http://obi.sourceforge.net/ontology/OBI.owl#OBI_342

Traversing the (short) ancestral path, we have the following definitions:

  • OBI_295 : "An information entity is a dependent_continuant which conveys meaning and can be documented and communicated."
  • OBI_321 : "generically_dependent_continuant"
  • Continuant : "An entity that exists in full at any time in which it exists at all, persists through time while maintaining its identity and has no temporal parts."
  • Entity

The Descriptive Ontology of Linguistics and Cognitive Engineering (DOLCE)

DOLCE's closest term for an information resource is information-object:

Information objects are social objects. They are realized by some entity. They are ordered (expressed according to) by some system for information encoding. Consequently, they are dependent from an encoding as well as from a concrete realization.They can express a description (the ontological equivalent of a meaning/conceptualization), can be about any entity, and can be interpreted by an agent.From a communication perspective, an information object can play the role of "message". From a semiotic perspective, it playes the role of "expression".

The URI for this term is:

http://www.loa-cnr.it/ontologies/ExtendedDnS.owl#information-object

Traversing the ancestral path we have:

  • non-agentive-social-object: "A social object that is not agentive in the sense of adopting a plan or being acted by some physical agent. See 'agentive-social-object' for more detail."
  • social-object: "A catch-all class for entities from the social world. It includes agentive and non-agentive socially-constructed objects: descriptions, concepts, figures, collections, information objects. It could be equivalent to 'non-physical object', but we leave the possibility open of 'private' non-physical objects."
  • non-physical-object : "Formerly known as description. A unitary endurant with no mass (non-physical), generically constantly depending on some agent, on some communication act, and indirectly on some agent participating in that act. Both descriptions (in the now current sense) and concepts are non-physical objects."
  • non-physical-endurant: "An endurant with no mass, generically constantly depending on some agent. Non-physical endurants can have physical constituents (e.g. in the case of members of a collection)."
  • endurant : "The main characteristic of endurants is that all of them are independent essential wholes. This does not mean that the corresponding property (being an endurant) carries proper unity, since there is no common unity criterion for endurants. Endurants can 'genuinely' change in time, in the sense that the very same endurant as a whole can have incompatible properties at different times."
  • particular: "AKA 'entity'.Any individual in the DOLCE domain of discourse. The extensional coverage of DOLCE is as large as possible, since it ranges on 'possibilia', i.e all possible individuals that can be postulated by means of DOLCE axioms. Possibilia include physical objects, substances, processes, qualities, conceptual regions, non-physical objects, collections and even arbitrary sums of objects."

Discussion

The definitions are (in true philosophical form) quite long-winded. However, the point I'm trying to make is:

  • Alot of pain has gone into defining these terms
  • Each of these ontologies is very richly-axiomatized (for supporting inference)
  • Each of these ontologies is available in OWL/RDF

Furthermore, these ontologies were specifically designed to be domain-independent and thus support inference across domains. So, it makes sense to start here for a decent (axiomatized) definition. What is interesting is that SUMO and BFO are the only upper ontologies which treat information resources (or their equivalent term) as strictly 'physical' things. Cyc's definition includes both tangible and intangible things while DOLCE's definition is strictly intangible (non-physical-endurant)

Some food for thought

Chimezie Ogbuji

via Copia