Practical Temporal Reasoning with Notation 3

Recently, Dan Brickley expressed an interest in the extent to which Bioinformatic research efforts are leveraging RDF for temporal reasoning (and patient healthcare record integration - in general). The thread on the value of modeling temporal relations explicitly versus relying on them being built into core RDF semantics left me feeling like a concrete example was in order.

We have a large (3500+ assertions) OWL Full ontology describing all the data we collect about Cardiothoracic procedures (the primary purpose of our database as currently constituted – in a relational model). There are several high-level classes we use to model concepts that, though core to our model, can be thought of as general enough for a common upper ontology for patient data.

One of the classes is ptrec:TemporalData (from here on out, I'll be using the ptrec prefix to describe vocabulary terms in our ontology) which is the ancestor of all classes that are expressed on an axis of time. We achieve a level of precision in modeling data on a temporal axis that enhances the kind of statistical analysis we perform on a daily basis.

In particular we use three variables:

  • ptrec:startDT (xs:dateTime)
  • ptrec:stopDT (xs:dateTime)
  • ptrec:instantDT (xs:dateTime)

The first two are used to describe an explicit (and 'proper') interval for an event in a patient record. This is often the case where the event in question only had a date associated with it. The latter variable is used when the event is instantaneous and the associated date / time is known.

The biggest challenge isn't simply the importance of time in asking questions of our data but of temporal factors that are keyed off specific, moving points of reference. For example, consider a case study on the effects of administering a medication within X days of specific procedure. The qualifying procedure is key to the observations we wish to make and behaves as a temporal anchor. Another case study interested in the effects of administering the same medication but with respect to a different procedure should be expected to rely on the same temporal logic – but keyed off a different point in time. However, by being explicit about how we place temporal data on a time axis (as instants or intervals) we can outline a logic for general temporal reasoning that can be used by either case study.

Linking into an OWL time ontology we can setup some simple Notation 3 rules for inferring interval relationships to aid such questions:

#Infering before and after temporal relationships (between instants and intervals alike)
{?a a ptrec:TemporalData;
    ptrec:instantDT ?timeA. 
 ?b a ptrec:TemporalData;
    ptrec:instantDT ?timeB. ?timeA str:greaterThan ?timeB} 

         => {?a time:intAfter ?b.?b time:intBefore ?a}

{?a a ptrec:TemporalData;
    ptrec:startDT ?startTimeA;
    ptrec:stopDT ?stopTimeA.  
 ?b a ptrec:TemporalData;
    ptrec:startDT ?startTimeB;
    ptrec:stopDT ?stopTimeB. ?startTimeA str:greaterThan ?stopTimeB} 

         => {?a time:intAfter ?b.?b time:intBefore ?a}

#Infering during and contains temporal relationships (between proper intervals)
#Since there is no str:greaterThanOrEqual CWM function, the various permutations
#Are spelled out explicitely
{?a a ptrec:TemporalData;
    ptrec:startDT ?startTimeA;
    ptrec:stopDT ?stopTimeA.  
 ?b a ptrec:TemporalData;
    ptrec:startDT ?startTimeB;
    ptrec:stopDT ?stopTimeB.
 ?startTimeA str:lessThan ?startTimeB. ?stopTimeA str:greaterThan ?stopTimeB} 

         => {?a time:intContains ?b.?b time:intDuring ?a}

{?a a ptrec:TemporalData;
    ptrec:startDT ?startTimeA;
    ptrec:stopDT ?stopTimeA.  
 ?b a ptrec:TemporalData;
    ptrec:startDT ?startTimeB;
    ptrec:stopDT ?stopTimeB.
 ?startTimeA str:equalIgnoringCase ?startTimeB. ?stopTimeA str:greaterThan ?stopTimeB} 

     => {?a time:intContains ?b.?b time:intDuring ?a}

{?a a ptrec:TemporalData;
    ptrec:startDT ?startTimeA;
    ptrec:stopDT ?stopTimeA.  
 ?b a ptrec:TemporalData;
    ptrec:startDT ?startTimeB;
    ptrec:stopDT ?stopTimeB.
 ?startTimeA str:lessThan ?startTimeB. ?stopTimeA str:equalIgnoringCase ?stopTimeB} 

     => {?a time:intContains ?b.?b time:intDuring ?a}

Notice the value in xs:dateTime values being ordered temporally and as unicode, simultaneously. This allows us rely on str:lessThan and str:greaterThan for determining interval intersection and overlap.

Terms such as 'preoperative' (which refer to events that occurred before a specific procedure / operation) and 'postoperative' (events that occurred after a specific procedure / operation), which are core to general medical research nomenclature, can be tied directly into this logic:

{?a a ptrec:TemporalData.  ?b a ptrec:Operation. ?a time:intBefore ?b}
   => {?a ptrec:preOperativeWRT ?b}

{?a a ptrec:TemporalData.  ?b a ptrec:Operation. ?a time:intAfter ?b}
   => {?a ptrec:postOperativeWRT ?b}

Here we introduce two terms (ptrec:preOperativeWRT and ptrec:postOperativeWRT) which relate temporal data with an operation in the same patient record. Using interval relationships as a foundation you can link in domain-specific, temporal vocabulary into your temporal reasoning model, and rely on a reasoner to setup a framework for temporal reasoning.

Imagine the value in using a backward-chaining prover (such as Euler) to logically demonstrate exactly why a specific medication (associated with the date when it was administered) is considered to be preoperative with respect to a qualifying procedure. This would complement the statistical analysis of a case study quite nicely with formal logical proof.

Now, it's worth noting that such a framework (as it currently stands) doesn't allow precision of interval relationships beyond simple intersection and overlap. For instance, in most cases you would be interested primarily in medication administered within a specific length of time. This doesn't really impact the above framework since it is no more than a functional requirement to be able to perform calendar math. Imagine if the built-in properties of CWM were expanded to include functions for performing date math. for instance:

With such a function we can expand our logical framework to include more explicit temporal relationships.
For example, if we only wanted to consider medications that were done 30 days prior to an operation to be considered 'preoperative':

{?a a ptrec:TemporalData;
    ptrec:startDT ?startTimeA;
    ptrec:stopDT ?stopTimeA.  
 ?b a ptrec:Operation;
    ptrec:startDT ?opStartTime;
    ptrec:stopDT ?opStopTime.  
 ?a time:intBefore ?b.
 (?stopTime "-P30D") time:addDT ?preOpMin. ?stopTimeA str:lessThan ?preOpMin}
    => {?a ptrec:preOperativeWRT ?b}

It's worth noting that such an addition (to facilitate calendar math) would be quite useful as a general extension for RDF processors.

For the most part, I think a majority of the requirements needed for temporal reasoning (in any domain) can be accommodated by explicit modeling, because FOPL (the foundation upon which RDF is built) was designed to be expressive enough to represent all human concepts.

Chimezie Ogbuji

via Copia

A Perspective on Temporal Modeling in RDF

I just read the follow-up to a thread (Why we need explicit temporal labelling) on the formal modeling of time and time related semantics to RDF, specifically. I wanted to put my $0.02 in since, I spend a good deal of my time at work buried nose-deep in large volumes of bioinformatic cardiovascular data, most of which is largely temporal. I guess, to put it succintly, I just don't see the value in merging temporal semantics (not a very light weight model) into the fabric of your representation model.

We found (for our purposes) that by including our own specific temporal semantic vocabulary, we could ensure that we can answer questions such as:

How many patients had complained about chest pains witin 30 days of a specific surgical operation.

While at the same time avoiding the rigidness of temporal reasoning that formal models impose. Such formalisms (especially in distributed systems) are unecessary when you consider that most often, data as it is fetched (at any point in time) is 'complete' regardless of how it has varied over time.

Consider the RDF schema for OWL, whose identifier (the identifier of the URL from where it's content can be loaded) includes some temporal semantics (when it was published, and the suggestion that there are prior versions). Though the content might have changed over time, the entire document as it was at any point was 'consistent' in what it conveys. No additional temporal semantics is needed to capture the relations between versions or to maintain some 'sanity' (if you will) over the fact that the data changed over time.

And if such formalism is needed, it's rather easy to piggy back off existing ontologies ("Time Ontology in OWL" for instance.)

Furthermore, If you think about it, named contexts (graphs, scopes, etc..) already provide a more adequate solution to the issue of inconsistency of data (over time) from the same source. For instance, you can take advantage of syntactic RDF/XML and N3 sugar such as:

<> a owl:Ontology;
   dc:date "2002-07-13";

or it's RDF/XML equivalent:

<owl:Ontology 
  rdf:about="">
  <dc:date>2002-07-13</dc:date>
</owl:Ontology>

In order to capture enough provenance data to accomodate change.

Ironically, the ability to make provenance statements (one of which includes the date associated with this 'representation') about a named graph (identified by the URL from which it was loaded) is beyond the semantics of the RDF model. However, through it's use you can be specific about the source of triples and (in addition), you can include the specifics of version either within the identifier of the source of through provenance statements made about it.

I think the problem is more a modeling issue (and having the foresight to determine how you accomodate the change of data over time) than a shorcoming of the framework.

Chimezie Ogbuji

via Copia