Numerical type with units - via Python Cookbook

I implemented perhaps eight years ago as an exercise and have used it occasionally ever since.

It allows doing math with dimensioned values in order to automate unit conversions (you can add m/s to mile/hour) and dimensional checking (you can't add m/s to mile/lightyear). It specifically does not convert 212F to 100C but rather will convert 9F to 5C (valid when converting temperature differences).

It is similar to unums ( but with a significant difference:

I used a different syntax Q(25,'m/s') as opposed to 100*m/s (I recall not wanting to have all the base SI units directly in the namespace). I'm not entirely sure which approach is really better.

I also had a specific need to have fractional exponents on units, allowing the following:

>>> km=Q(10,'N*m/W^(1/2)')
>>> km
Q(10.0, 'kg**0.5*m/s**0.5')

Looking back I see a few design decisions I might do differently today, but I'll share it anyway.

Some examples are in the source below the line with if __name__ == "__main__":

Note that I've put two files into the code block below, and, so please cut them apart if you want to try it.

Very impressive library. I recently incorporated the use of the Measurement Unit Ontology into the Computer-based Patient Record (CPR) ontology and (on the surface) it seems like a library like this can provide the unit conversion machinery for RDF instances that use such a framework.

Segmenting and Merging Domain-specific Ontology Modules for Clinical Informatics (Presentation)

A significant set of challenges to the use of large, source ontologies in the medical domain include: automated translation, customization of source ontologies, and performance issues associated with the use of logical reasoning systems to interpret the meaning of a domain captured in a formal knowledge representation. SNOMED-CT and FMA are two reference ontologies that cover much of the domain of clinical informatics and motivate a better means for re-use. In this paper, we present a method for segmenting and merging modules from these ontologies for a specific domain that preserve the meaning of the anatomy terms they have in common.

Our presentation for this FOIS 2010 paper is below.

Ontological Definitions for an Information Resource

I've somehow found myself wrapped-up in this dialog about information resources, their representations, and the relation to RDF. Perhaps it's the budding philosopher in me which finds the problem interesting. There seems to be some controversy about what is an appropriate definition for an information resource. I'm a big fan of not reinventing wheels if they have already been built, tested, and deployed.

The Architecture of the World-Wide Web says:

The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as "information resources."

I know of at least 4 very well-organized upper ontologies which have readily-available OWL representations: SUMO, Cyc, Basic Formal Ontology, and DOLCE. These are the cream of the crop in my opinion (and in the opinion of many others who are more informed about this type of thing). So, let us spend some time investigating where the poorly-defined Web Architecture term fits in these ontologies. This exercise is mostly meant for the purpose of reference. Every well-organized, upper ontology will typically have a singular, topmost term which covers everything. This would be (for the most part) the equivalent of owl:Thing and rdf:Resource

Suggested Upper Merged Ontology (SUMO)

Sumo has a term called "FactualText" which seems appropriate. The definition states:

The class of Texts that purport to reveal facts about the world. Such texts are often known as information or as non-fiction. Note that something can be an instance of FactualText, even if it is wholly inaccurate. Whether something is a FactualText is determined by the beliefs of the agent creating the text.

The SUMO term has the following URI for FactualText (at least in the OWL export I downloaded):

Climbing up the subsumption tree we have the following ancestral path:

  • Text: "A LinguisticExpression or set of LinguisticExpressions that perform a specific function related to Communication, e.g. express a discourse about a particular topic, and that are inscribed in a CorpuscularObject by Humans."

The term Text has multiple parents (LinguisticExpression and Artifact). Following the path upwards from the first parent we have:

  • LinguisticExpression: "This is the subclass of ContentBearingPhysical which are language-related. Note that this Class encompasses both Language and the the elements of Languages, e.g. Words."
  • ContentBearingPhysical: "Any Object or Process that expresses content. This covers Objects that contain a Proposition, such as a book, as well as ManualSignLanguage, which may similarly contain a Proposition."
  • Physical "An entity that has a location in space-time. Note that locations are themselves understood to have a location in space-time."
  • Entity "The universal class of individuals. This is the root node of the ontology."

Following the path upwards from the second parent we have:

  • Artifact: "A CorpuscularObject that is the product of a Making."
  • CorpuscularObject: "A SelfConnectedObject whose parts have properties that are not shared by the whole."
  • SelfConnectedObject: "A SelfConnectedObject is any Object that does not consist of two or more disconnected parts."
  • Object: "Corresponds roughly to the class of ordinary objects. Examples include normal physical objects, geographical regions, and locations of Processes"

Objects are a specialization of Physical, so from here we come to the common Entity ancestor


Cyc has a term called InformationBearingThing:

A collection of spatially-localized individuals, including various actions and events as well as physical objects. Each instance of information-bearing thing (or IBT ) is an item that contains information (for an agent who knows how to interpret it). Examples: a copy of the novel Moby Dick; a signal buoy; a photograph; an elevator sign in Braille; a map ...

The Cyc URI for this term is:

This term has 3 ancestors: Container-Underspecified, SpatialThing-Localized, and InformationStore. The latter seems most relevant, so we'll traverse its ancestry first:

  • InformationStore : "A specialization of partially intangible individual. Each instance of store of information is a tangible or intangible, concrete or abstract repository of information. The information stored in an information store is stored there as a consequence of the actions of one or more agents."
  • PartiallyIntangibleIndividual : "A specialization of both individual and partially intangible thing. Each instance of partially intangible individual is an individual that has at least some intangible (i.e. immaterial) component. The instance might be partly tangible (e.g. a copy of a book) and thus be a composite tangible and intangible thing, or it might be fully intangible (e.g. a number or an agreement) and thus be an instance of intangible individual object. "

From here, there are two ancestral paths, so we'll leave it at that (we already have the essense of the definition).

Going back to InformationBearingThing, below is the ancestral path starting from Container-Underspecified:

  • Container-Underspecified : "The collection of objects, tangible or otherwise, which are typically conceptualized by human beings for purposes of common-sense reasoning as containers. Thus, container underspecified includes not only the set of all physical containers, like boxes and suitcases, but metaphoric containers as well"
  • Area: "The collection of regions/areas, tangible or otherwise, which are typically conceptualized by human beings for purposes of common-sense reasoning as spatial regions."
  • Location-Underspecified: Similar definition as Area
  • Thing: "thing is the universal collection : the collection which, by definition, contains everything there is. Every thing in the Cyc ontology -- every individual (of any kind), every set, and every type of thing -- is an instance of (see Isa) thing"

Basic Formal Ontology (BFO)

BFO is (as the name suggests) very basic and meant to be an axiomatic implementation of the philosophy of realism. As such, the closest term for an information resource is very broad: Continuant

Definition: An entity that exists in full at any time in which it exists at all, persists through time while maintaining its identity and has no temporal parts.

However, I happen to be quite familiar with an extension of BFO called the Ontology of Biomedical Investigation (OBI) which has an appropriate term (derived from Continuant): information_content_entity

The URI for this term is:

Traversing the (short) ancestral path, we have the following definitions:

  • OBI_295 : "An information entity is a dependent_continuant which conveys meaning and can be documented and communicated."
  • OBI_321 : "generically_dependent_continuant"
  • Continuant : "An entity that exists in full at any time in which it exists at all, persists through time while maintaining its identity and has no temporal parts."
  • Entity

The Descriptive Ontology of Linguistics and Cognitive Engineering (DOLCE)

DOLCE's closest term for an information resource is information-object:

Information objects are social objects. They are realized by some entity. They are ordered (expressed according to) by some system for information encoding. Consequently, they are dependent from an encoding as well as from a concrete realization.They can express a description (the ontological equivalent of a meaning/conceptualization), can be about any entity, and can be interpreted by an agent.From a communication perspective, an information object can play the role of "message". From a semiotic perspective, it playes the role of "expression".

The URI for this term is:

Traversing the ancestral path we have:

  • non-agentive-social-object: "A social object that is not agentive in the sense of adopting a plan or being acted by some physical agent. See 'agentive-social-object' for more detail."
  • social-object: "A catch-all class for entities from the social world. It includes agentive and non-agentive socially-constructed objects: descriptions, concepts, figures, collections, information objects. It could be equivalent to 'non-physical object', but we leave the possibility open of 'private' non-physical objects."
  • non-physical-object : "Formerly known as description. A unitary endurant with no mass (non-physical), generically constantly depending on some agent, on some communication act, and indirectly on some agent participating in that act. Both descriptions (in the now current sense) and concepts are non-physical objects."
  • non-physical-endurant: "An endurant with no mass, generically constantly depending on some agent. Non-physical endurants can have physical constituents (e.g. in the case of members of a collection)."
  • endurant : "The main characteristic of endurants is that all of them are independent essential wholes. This does not mean that the corresponding property (being an endurant) carries proper unity, since there is no common unity criterion for endurants. Endurants can 'genuinely' change in time, in the sense that the very same endurant as a whole can have incompatible properties at different times."
  • particular: "AKA 'entity'.Any individual in the DOLCE domain of discourse. The extensional coverage of DOLCE is as large as possible, since it ranges on 'possibilia', i.e all possible individuals that can be postulated by means of DOLCE axioms. Possibilia include physical objects, substances, processes, qualities, conceptual regions, non-physical objects, collections and even arbitrary sums of objects."


The definitions are (in true philosophical form) quite long-winded. However, the point I'm trying to make is:

  • Alot of pain has gone into defining these terms
  • Each of these ontologies is very richly-axiomatized (for supporting inference)
  • Each of these ontologies is available in OWL/RDF

Furthermore, these ontologies were specifically designed to be domain-independent and thus support inference across domains. So, it makes sense to start here for a decent (axiomatized) definition. What is interesting is that SUMO and BFO are the only upper ontologies which treat information resources (or their equivalent term) as strictly 'physical' things. Cyc's definition includes both tangible and intangible things while DOLCE's definition is strictly intangible (non-physical-endurant)

Some food for thought

Chimezie Ogbuji

via Copia

Where does the Semantic Web converge with the Computerized Patient Record?

I've been thinking alot about the "Computer-based Patient Record: CPR", an acronym as unlikely as GRDDL but once again, a methodology expressed as an engineering specification. In both cases, the methodology is a mouthful, but a coherent architectural "style" and requires a mouthful of words to describe. Other examples of this:

  • Representation State Transfer
  • Rich Web Application Backplane
  • Problem-oriented Medical Record
  • Gleaning Resource Descriptions from Dialects of Languages

The term itself was coined (I think) by the Institute of Medicine [1]. If you are in healthcare and are motivated by the notion of using technology to make healthcare effective and inexpensive as possible, you should do the Institute a favor and buy the book:

National Institutute of Medicine, The Computer-Based Patient Record: An Essential Technology for Health Care - Revised Edition., 1998, ISBN: 0309055326.

I've written some recent slides that are on the W3C ESW 'wiki' which all have something to do with the idea in one way or another:

The nice thing about working in a W3C Interest Group is that the work you do is for the general publics benefit, so it is a manefestation of the W3C notion of the Semantic Web, which primarily involves a human social process.

Sorta like a technological manefestation of our natural darwinian instinct.

That's how I think of the Semantic Web, anyways: as a very old, living thread of advancements in Knowledge Representation which intersected with an anthropological assesment of some recent web architecture engineering standards.

Technology is our greatest contribution and so it sohould only make sense that wherer we use it to better our health it should not come as a cost to us. The slides reference and include a suggested OWL-sanctioned vocabulary for basically implementing the Problem-oriented Medical Record (a clinical methodology for problem solving).

I think the idea of a free (as in beer) vocabulary for people who need healthcare has an interesting intersection with the pragmatic parts of the Semantic Web (avoiding the double quotes) vision. I have exercised-induced asthma (or was "diagnosed" as such when I was younger). I still ran Track-and-Field in Highschool and was okay after an initial period where my lungs had to work overtime. I wouldn't mind hosting RDF content about such a "finding" if it was for my person benefit that a piece of software could do something useful for me in an automated, deterministic way.

"HL7 CDA" seems to be a freely avaiable, well-organized vocabulary for describing messages dispatched between hospital systems. And I recently wrote a set of XSLT templates which extract predicate logic statemnts about a CDA document using the POMR ontology and the other freely available "foundational ontologies" it coordinates. The CDA document on has a nice concise description of the technological merits of HL7 CDA:

The HL7 Clinical Document Architecture is an XML-based document markup standard that specifies the structure and semantics of clinical documents for the purpose of exchange. Known earlier as the Patient Record Architecture (PRA), CDA "provides an exchange model for clinical documents such as discharge summaries and progress notes, and brings the healthcare industry closer to the realization of an electronic medical record. By leveraging the use of XML, the HL7 Reference Information Model (RIM) and coded vocabularies, the CDA makes documents both machine-readable (so they are easily parsed and processed electronically) and human-readable so they can be easily retrieved and used by the people who need them. CDA documents can be displayed using XML-aware Web browsers or wireless applications such as cell phones..."

The HL7 CDA was designed to "give priority to delivery of patient care. It provides cost effective implementation across as wide a spectrum of systems as possible. It supports exchange of human-readable documents between users, including those with different levels of technical sophistication, and promotes longevity of all information encoded according to this architecture. CDA enables a wide range of post-exchange processing applications and is compatible with a wide range of document creation applications."

A CDA document is a defined and complete information object that can exist outside of a messaging context and/or can be a MIME-encoded payload within an HL7 message; thus, the CDA complements HL7 messaging specifications.

If I could put up a CDA document describing the aspects of my medical history that were in my benefit to be freely available (at my discretion), I would do so in the event some piece of software could do some automated things for my benefit. Leveraging a vocabulary which essentially grounds an expressive variant of predicate logic in a transport protocol makes the chances that this happens, very likely. The effect is as multiplicative as the human population.

The CPR specification is also very well engineered and much ahead of its time (it was written about 15 years ago). The only technological checkmark left is a uniform vocabulary. Consensus stands in the way of uniformity, so some group of people need to be thinking about how the "pragmatic" and anthropological notions of the Semantic Web can be realized with a vocabulary about our personally controlled, public clinical content. Don't you think?

I was able to register the /cpr top level PURL domain and the URL resolves to the OWL ontology with commented imports to other very relevant OWL ontologies. Once I see a pragmatic demonstration of leaving owl:imports in a 'live' URL, I'll remove them. It would be a shame if any Semantic Web vocabulary terms came in conflict with a legal mandate which controlled the use of a vocabulary.

Chimezie Ogbuji

via Copia

A Perspective on Temporal Modeling in RDF

I just read the follow-up to a thread (Why we need explicit temporal labelling) on the formal modeling of time and time related semantics to RDF, specifically. I wanted to put my $0.02 in since, I spend a good deal of my time at work buried nose-deep in large volumes of bioinformatic cardiovascular data, most of which is largely temporal. I guess, to put it succintly, I just don't see the value in merging temporal semantics (not a very light weight model) into the fabric of your representation model.

We found (for our purposes) that by including our own specific temporal semantic vocabulary, we could ensure that we can answer questions such as:

How many patients had complained about chest pains witin 30 days of a specific surgical operation.

While at the same time avoiding the rigidness of temporal reasoning that formal models impose. Such formalisms (especially in distributed systems) are unecessary when you consider that most often, data as it is fetched (at any point in time) is 'complete' regardless of how it has varied over time.

Consider the RDF schema for OWL, whose identifier (the identifier of the URL from where it's content can be loaded) includes some temporal semantics (when it was published, and the suggestion that there are prior versions). Though the content might have changed over time, the entire document as it was at any point was 'consistent' in what it conveys. No additional temporal semantics is needed to capture the relations between versions or to maintain some 'sanity' (if you will) over the fact that the data changed over time.

And if such formalism is needed, it's rather easy to piggy back off existing ontologies ("Time Ontology in OWL" for instance.)

Furthermore, If you think about it, named contexts (graphs, scopes, etc..) already provide a more adequate solution to the issue of inconsistency of data (over time) from the same source. For instance, you can take advantage of syntactic RDF/XML and N3 sugar such as:

<> a owl:Ontology;
   dc:date "2002-07-13";

or it's RDF/XML equivalent:


In order to capture enough provenance data to accomodate change.

Ironically, the ability to make provenance statements (one of which includes the date associated with this 'representation') about a named graph (identified by the URL from which it was loaded) is beyond the semantics of the RDF model. However, through it's use you can be specific about the source of triples and (in addition), you can include the specifics of version either within the identifier of the source of through provenance statements made about it.

I think the problem is more a modeling issue (and having the foresight to determine how you accomodate the change of data over time) than a shorcoming of the framework.

Chimezie Ogbuji

via Copia

Getting Some Mileage out of Semantic Works

Well, I recently had a need to write-up an OWL ontology describing the components of a 4Suite repository configuration file (which is expressed as an RDF graph, hence the use of OWL to formalize the format). There has been some mention (with regards to the long-term roadmap of the 4Suite repository component) of the possiblity of moving to a pure XML format.

Anyways, below is a diagram of the model produced by Semantic Works. I still think Protege and SWOOP provide much more bang for your buck (when you consider that they are free and Semantic Works isn't) and produce much more concise OWL/RDFS XML. But the ability to produce diagrams of this quality of a complex OWL ontology is definately a plus.

Semantic Works Diagram of 4Suite Repository Configuration Ontology Semantic Works Diagram of 4Suite Repository Configuration Ontology Semantic Works Diagram of 4Suite Repository Configuration Ontology

[Chimezie Ogbuji]

via Copia

What Are You Doing, Dave?

I just updated the 4Suite Repository Ontology (as an OWL instance). Specifically, I added appropriate documentation for most of the major components and added rdfs:subPropertyOf/rdfs:subClass/rdfs:seeAlso relationships with appropriate / related vocabularies (WordNet/Foaf/Dublin Core/Wikipedia). In addition, where appropriate, I've added links to 4suite literature (currently scattered between IBM Developer Works articles/tutorials and Uche's Akara sections).

There are some benefits:

  • This can serve as a framework for documenting the 4Suite repository (to augment the very sparse documentation that does exist)
  • Provide a formal model for the underlying RDF Graph that 'drives' the repository

This latter benefit might not be so obvious, but imagine being able to provide rules that cause implications identifying certain repository containers as RSS channels (and their child Xml documents / Rdf document as the corresponding RSS items) and associating Foaf metadata with repository users.

Some of the more powerful hooks to the System RDF graph (which the above ontology is a model of) - such as the starting/stopping of servers (currently triggered by the fchema:server.running property on fchema:server instances), purging of resources marked as temporary (by the fchema:time_to_live property), and triggering of an XSLT transform (by the fchema:run_on_strobe property) - can be further augmented by other associations in the graph, resulting in an almost 'sentient' content/application server. A little far-fetched?

[Uche Ogbuji]

via Copia