A Perspective on Temporal Modeling in RDF

I just read the follow-up to a thread (Why we need explicit temporal labelling) on the formal modeling of time and time related semantics to RDF, specifically. I wanted to put my $0.02 in since, I spend a good deal of my time at work buried nose-deep in large volumes of bioinformatic cardiovascular data, most of which is largely temporal. I guess, to put it succintly, I just don't see the value in merging temporal semantics (not a very light weight model) into the fabric of your representation model.

We found (for our purposes) that by including our own specific temporal semantic vocabulary, we could ensure that we can answer questions such as:

How many patients had complained about chest pains witin 30 days of a specific surgical operation.

While at the same time avoiding the rigidness of temporal reasoning that formal models impose. Such formalisms (especially in distributed systems) are unecessary when you consider that most often, data as it is fetched (at any point in time) is 'complete' regardless of how it has varied over time.

Consider the RDF schema for OWL, whose identifier (the identifier of the URL from where it's content can be loaded) includes some temporal semantics (when it was published, and the suggestion that there are prior versions). Though the content might have changed over time, the entire document as it was at any point was 'consistent' in what it conveys. No additional temporal semantics is needed to capture the relations between versions or to maintain some 'sanity' (if you will) over the fact that the data changed over time.

And if such formalism is needed, it's rather easy to piggy back off existing ontologies ("Time Ontology in OWL" for instance.)

Furthermore, If you think about it, named contexts (graphs, scopes, etc..) already provide a more adequate solution to the issue of inconsistency of data (over time) from the same source. For instance, you can take advantage of syntactic RDF/XML and N3 sugar such as:

<> a owl:Ontology;
   dc:date "2002-07-13";

or it's RDF/XML equivalent:

<owl:Ontology 
  rdf:about="">
  <dc:date>2002-07-13</dc:date>
</owl:Ontology>

In order to capture enough provenance data to accomodate change.

Ironically, the ability to make provenance statements (one of which includes the date associated with this 'representation') about a named graph (identified by the URL from which it was loaded) is beyond the semantics of the RDF model. However, through it's use you can be specific about the source of triples and (in addition), you can include the specifics of version either within the identifier of the source of through provenance statements made about it.

I think the problem is more a modeling issue (and having the foresight to determine how you accomodate the change of data over time) than a shorcoming of the framework.

Chimezie Ogbuji

via Copia
5 responses
Hi,



I think you're conflating provenance with "valid time". These require different solutions.



Your cadiovascular solution concerns valid time (ie the time at which the pain was suffered, rather than when it was recorded in a database - though this is a trickier example since it concerns the patients' reports of pain). Your solutions concern provenance.



As it happens, your particular example is a strawman as it doesn't need any treatment of time beyond RDF and e.g. OWL time. This is because the things that is time-indexed is naturally modeled as in instance of an event, e.g. the event of suffering pain.



However, consider the cases where you would wish to time-index a relation involving at least one continuant (3D object roughyl speaking) and there is no natural class counterpart for the relation. For example, this-nodule part-of this-lung at-time t. I don't think you can come up with a solution to this using RDF that isn't deeply problematic (you lose either transitivity or are forced to represent objects as time-slices).



See my comments here:

http://www.semergence.com/archives/2006/03/17/02/56/22/
No, the difference is clear to me (between provenance time stamps and 'formal' temporal semantics).  What I'm suggesting is that very rarely do applications need the formal, rigor of temporal semantics (where there is a need to calculate interval intersections, overlaps, etc..) and where they do, a well modelled Temporal ontology will suffice - which is what we did in our case. 



The suggestion that temporal semantics should be part and parcel of the representation framework (the RDF model and it's semantics) is unecessary.



There are varying levels of temporal semantic requirements (from as simple as time stamping, to as complex as interval calculations) and the solution adopted should fit the requirements of the application instead of attempting to retrofit the framework with the neccessary hooks to accomodate the entire spectrum.
Can you provide an example of how a well modeled temporal ontology will resolve my "this-nodule part-of this-lung at-time t" example? The issue of time-points vs time-intervals is orthogonal (I don't care if your solution treats t as a point or interval). You simply cannot do this in RDF without introducing monstrous hacks such as time-slices, or by turning relations into instances.



I agree that applications can get away without a particular axiomatisation of time. However, as we both work in the same domain I'm sure you'll agree that real-world biological and biomedical instance relations require some kind of time indexing.



cheers

chris
Interesting discussion, particularly the medical use case. I'm working on similar issues currently, around patient healthcare record integration. Do you have any more details public?
The datasets I'm currently focused on right now are mutant gene to mutant phenotype associations for model organisms like fruitfly and zebrafish. The database I'm building will eventually cover things like electronic health records and clinical trials (though I'm mostly focused on the biological aspects rather than billing etc).



On many levels, the kind of bizarre mutational effects you get in laboratory flies couldn't be more distant from the kind of data that might be recorded in an electronic health record (unless you were a really unlucky patient! ).



However, the ontological building blocks (relations such as part_of, anatomical ontologies, and the general pattern of qualities inhering in dependent entities). And on a biological level, orthologous genes can have similar phenotypic effects at various levels of granularity, even across evolutionary distances of hundreds of millions of years.



Anyway, this work is part of an effort by the National Center for Biomedical Ontologies (http://www.ncbo.us). There is a preliminary page on this project at http://www.fruitfly.org/~cjm/obd



We're evaluating various RDF databases. I am a little worried about representing time. Ideally we'd have some kind of general purpose deductive database on which we could layer RDF views and various fragments of OWL entailment. Most existing solutions don't really allow to mix and match RDF and more general purpose deductive db reasoning - it's triples or nothing.