SNOMED-CT Management via Semantic Web Open Source Tools Committed to Google Code

[by Chimezie Ogbuji]

I just committed my working copy of the set of tools I use to manipulate and serialize SNOMED-CT (the Systematized Nomenclature of Medicine) and the Foundational Model of Anatomy (FMA) as OWL/RDF for use in the clinical terminology research I’ve been doing lately. It is still in a very rough form and probably not usable by anyone other than a Python / Semantic Web hacker such as myself. However, I’m hoping to get it to a shape where it can be used by others. I had hesitated to release it mostly because of my concerns around the SNOMED-CT license, but I’ve been assured that as long the hosting web site is based in the united states and (most importantly) the software is not released with the SNOMED distribution it should be okay.

I have a (mostly empty) Wiki describing the command-line invocation. It leverages InfixOWL and rdflib to manipulate the OWL/RDF. Basically, once you have loaded the delimited distribution into MySQL (the library also requires MySQLdb and an instance of MySQL to work with), you can run the command-line, giving it one or more list of SNOMED-CT terms (by their identifiers) and it will return an OWL/RDF representation of an extract from SNOMED-CT around those terms.

So, below is an example of running the command-line to extract a section around the term Diastolic Hypertension and piping the result to the FuXi commandline in order to select a single class (sno:HypertensiveDisorderSystemicArterial) and render it using (my preferred syntax for OWL: the Manchester OWL syntax):

$python ManageSNOMED-CT.py -e 48146000 -n short -s localhost -u ..mysql username.. --password=..mysql password.. -d snomed-ct | FuXi --ns=sno=tag:info@ihtsdo.org,2007-07-31:SNOMED-CT# --output=man-owl --class=sno:HypertensiveDisorderSystemicArterial --stdin
Class: sno:HypertensiveDisorderSystemicArterial
    ## Primitive Type (Hypertensive disorder) ##
    SNOMED-CT Code: 38341003 (a primitive concept)
    SubClassOf:
              Clinical finding
              Disease
              ( sno:findingSite some Systemic arterial structure )

Which renders an expression that can be paraphrased as

‘Hypertensive Disorder Systemic Arterial’ is a clinical finding and disease whose finding site is some structure of the systemic artery.

I can also take the Burn of skin example from the Wikipedia page on SNOMED and demonstrate the same thing, rendering it in its full (verbose) OWL/RDF/XML form:

<owl:Class rdf:about="tag:info@ihtsdo.org,2007-07-31:SNOMED-CT#BurnOfSkin">
  <owl:intersectionOf rdf:parseType="Collection">
    <owl:Restriction>
      <owl:onProperty>
        <owl:ObjectProperty rdf:about="tag:info@ihtsdo.org,2007-07-31:SNOMED-CT#findingSite"/>
      </owl:onProperty>
      <owl:someValuesFrom rdf:resource="tag:info@ihtsdo.org,2007-07-31:SNOMED-CT#SkinStructure"/>
    </owl:Restriction>
    <rdf:Description rdf:about="tag:info@ihtsdo.org,2007-07-31:SNOMED-CT#ClinicalFinding"/>
    <owl:Restriction>
      <owl:onProperty>
        <owl:ObjectProperty rdf:about="tag:info@ihtsdo.org,2007-07-31:SNOMED-CT#associatedMorphology"/>
      </owl:onProperty>
      <owl:someValuesFrom rdf:resource="tag:info@ihtsdo.org,2007-07-31:SNOMED-CT#BurnInjury"/>
    </owl:Restriction>
    <rdf:Description rdf:about="tag:info@ihtsdo.org,2007-07-31:SNOMED-CT#Disease"/>
  </owl:intersectionOf>
  <rdfs:label>Burn of skin</rdfs:label>
  <skos:scopeNote>disorder<skos:scopeNote>
  <skos:prefSymbol>284196006</skos:prefSmbol>
</owl:Class>

And then in its more palatable Manchester OWL form:

$ python ManageSNOMED-CT.py -e 284196006 -n short -s localhost -u ..username.. --password= -d snomed-ct | FuXi --ns=sno=tag:info@ihtsdo.org,2007-07-31:SNOMED-CT# --output=man-owl --class=sno:BurnOfSkin --stdin
Class: sno:BurnOfSkin
    ## A Defined Class (Burn of skin) ##
    SNOMED-CT Code: 284196006
    EquivalentTo:
      ( sno:ClinicalFinding and sno:Disease ) that
      ( sno:findingSite some Skin structure ) and (sno:associatedMorphology some Burn injury )

Which can be paraphrased as:

A clinical finding or disease whose finding site is some skin structure and whose associated morphology is injury via burn

The examples above use the ‘-n short’ option, which renders extracts in OWL via the short normal form which uses a procedure described in the SNOMED-CT manuals that produces a more canonical representation, eliminating redundancy in the process. It currently only works with the 2007-07-31 distribution of SNOMED-CT but I’m in the process of updating it to use the latest distribution. The latest distribution comes with its own OWL representation and I’m still trying to wrap my head around some quirks in it involving role groups and whether or not this library would need to change so it works directly off this OWL representation instead of the primary relational distribution format. Enjoy,  

FuXi: Becoming a Full-fledged Logical Reasoning System

[by Chimezie Ogbuji]

I've been doing alot of "Google reading" lately

Completing Logical Reasoning System Capabilities

With the completion (or near completion) of PML-generating capabilities for FuXi, it is becoming a fully functional logical reasoning system. In "(Artificial Intelligence: A Modern Approach)" Stuart Russel and Peter Norvig identify the following main categories for automated reasoning systems:

  1. Theorem provers
  2. Production systems
  3. Frame systems and semantic networks
  4. Description Logic systems

OWL and RDF are coverage for 3 and 4. The second category is functionally covered by the RETE-UL algorithm FuXi employs (a highly efficient modification of the original RETE algorithm). The currently developing RIF Basic Logic Dialect covers 2 - 4. Proof Markup Language covers 1. Now, FuXi can generate (and export visualization diagrams) Proof Markup Language (PML) structures. I still need to do more testing, and hope to be able to generate proofs for each of the OWL tests. Until then, below is a diagram of the proof tree generated from the "She's a Witch and I have Proof" test case:

# http://clarkparsia.com/weblog/2007/01/02/burn-the-witch/
# http://www.netfunny.com/rhf/jokes/90q4/burnher.html
@prefix : <http://www.w3.org/2000/10/swap/test/reason/witch#>.
@keywords is, of, a.
#[1]    BURNS(x) /\ WOMAN(x)            =>      WITCH(x)
{ ?x a BURNS. ?x a WOMAN } => { ?x a WITCH }.
#[2]    WOMAN(GIRL)
GIRL a WOMAN.
#[3]    \forall x, ISMADEOFWOOD(x)      =>      BURNS(x)
{ ?x a ISMADEOFWOOD. } => { ?x a BURNS. }.
#[4]    \forall x, FLOATS(x)            =>      ISMADEOFWOOD(x)
{ ?x a FLOATS } => { ?x a ISMADEOFWOOD }.
#[5]    FLOATS(DUCK)
DUCK a FLOATS.
#[6]    \forall x,y FLOATS(x) /\ SAMEWEIGHT(x,y) =>     FLOATS(y)
{ ?x a FLOATS. ?x SAMEWEIGHT ?y } => { ?y a FLOATS }.
# and, by experiment
# [7]   SAMEWEIGHT(DUCK,GIRL)

Shes a witch and I have proof trace

There is another test case of the "Dan's home region is Texas" test case on a python-dlp Wiki: DanHomeProof:

@prefix : <gmpbnode#>.
@keywords is, of, a.
dan home [ in Texas ].
{ ?WHO home ?WHERE.
  ?WHERE in ?REGION } => { ?WHO homeRegion ?REGION }.

Dans home region is Texas proof

I decided to use PML structures since there are a slew of Stanford tools which understand / visualize it and I can generate other formats from this common structure (including the CWM reason.n3 vocabulary). Personally, I prefer the proof visualization to the typically verbose step-based Q.E.D. proof.

Update: I found nice write-up on the CWM-based reason ontology and translations to PML

So, how does a forward-chaining production rule system generate proofs that are really meant for backward chaining algorithms? When the FuXi network is fed initial assertions, it is told what the 'goal' is. The goal is a single RDF statement which is being prooved. When the forward-chaining results in a inferred triple which matches the goal, it terminates the RETE algorithm. So, depending on the order of the rules and the order that the initial facts are fed it will be (for the general cases) less efficient than a backward chaining algorithm. However, I'm hoping the blinding speed of the fully hashed RETE-UL algorithm makes up the difference.

I've been spending quite a bit of time on FuXi mainly because I am interested in empirical evidence which supports a school of thought which claims that Description Logic based inference (Tableaux-based inference) will never scale as well the Logic Programming equivalent - at least for certain expressive fragments of Description Logic (I say expressive because even given the things you cannot express in this subset of OWL-DL there is much more in Horn Normal Form (and Datalog) that you cannot express even in the underlying DL for OWL 1.1). The genesis of this is a paper I read, which lays out the theory, but there was no practice to support the claims at the time (at least that I knew of). If you are interested in the details, the paper is "Description Logic Programs: Combining Logic Programs with Description Logic" and written by many people who are working in the Rule Interchange Format Working Group.

It is not light reading, but is complementary to some of Bijan's recent posts about DL-safe rules and SWRL.

A follow-up is a paper called "A Realistic Architecture for the Semantic Web" which builds on the DLP paper and makes claims that the current OWL (Description Logic-based) Semantic Web inference stack is problematic and should instead be stacked ontop of Logic Programming since Logic Programming algorithm has a much richer and pervasively deployed history (all modern relational databases, prolog, etc..)

The arguments seem sound to me, so I've essentially been building up FuXi to implement that vision (especially since it employes - arguably - the most efficient Production Rule inference algorithm). The final piece was a fully-functional implementation of the Description Horn Logic algorithm. Why is this important? The short of it is that the DLP paper outlines an algorithm which takes a (constrained) set of Description Logic expressions and converts them to 'pure' rules.

Normally, Logic Programming N3 implementations pass the OWL tests by using a generic ruleset which captures a subset of the OWL DL semantics. The most common one is owl-rules.n3. DLP flips the script by generating a rule-set specifically for the original DL, instead of feeding OWL expressions into the same network. This allows the RETE-UL algorithm to create an even more efficient network since it will be tailored to the specific bits of OWL.

For instance, where I used to run through the OWL tests in about 4 seconds, I can now pass them in about 1 secound using. Before I would setup a RETE network which consisted of the generic ruleset once and run the tests through it (resetting it each time). Now, for each test, I create a custom network, evaluate the OWL test case against it. Even with this extra overhead, it is still 4 times faster! The custom network is trivial in most cases.

Ultimately I would like to be able to use FuXi for generic "Semantic Web" agent machinery and perhaps even to try some that programming-by-proof thing that Dijkstra was talking about.

Chimezie Ogbuji

via Copia

Why FuXi?

So, I updated the cheeseshop entry for FuXi (should that be a capital 'X'?). This is the freeware I forced myself to write in order to better express myself (I don't always do a good job of that in person), and engage people, generally. It is very fast (so, I use it wherever I need to do any OWL/N3 inference ). I hope to port its serialize/parse capabilities to use (in addition): SWRL, the "new" Rule Interchange Format, and CycML (since this is trivial with 4Suite and OpenCyc is, well, "open")

I host it on Google Code because I like their combined service: um, it's free, the use of Subversion, a mailing list component, a Wiki, and other community services. In addition, I can synchronize my license(s) - in this case Fuxi's license is bare-bones BSD (I wonder if I should switch to an Apache license?). I link my cheeseshop entry to the Google Code page, and this is the primary "entry point" for package management. Cheeseshop + easyinstall + Python = very painless. I'm planning on setting up triclops this way (a WSGI-based SPARQL service).

Update: I added a google group for Fuxi: All discussion on Fuxi

Doing this brought me back to the question of why I gave this piece of software a name (see: origin) which conventional wisdom might consider "odd". I named it after a very coherent philosophy written a very loong time ago. Sometime in 2004, I started reading alot of text from that canon and then did some experimentation with 1) capturing the trigrams in OWL 2) generating SVG diagrams of them as an additional serialization. These were some of my older Copia entries.

The text is very mathematical, in fact it is based (almost entirely) on the binary numerical system. My formal "study" was Computer Engineering, which emphasized microprocessor theory (all of which is based on the binary numerical system as well), so my interest was not just "spiritual" but also very practical as I have come to a better appreciation of microprocessor theory many years after graduating from the University of Champaign Urbana.

My interest is also very historical. I believe that the theory that these text are based on represent some of the oldest human analysis of semiotics, binary numerics, psychology, and ontology. I have heard that the oldest ontology is purported to be Aristotle's, but I think this is very much mistaken if you consider the more mathematical aspects of "classic" semiotics. This was why I thought it would be interesting (at the time) to capture the trigrams in OWL (i.e., the formal theory) with annotations that consist of the better English translations of the original text (the Yijing) as well as SVG diagram exports.

This could serve as a good tool for older generations that study these text via conventional methods (consider the nature of the more oral traditions). Igbo tradition (my tradition) is very much "oral". I had thought at the time that a tool which relied on inference to interpret this ancient theory (for students of this ancient theory) would make for a good demonstration of "a" value proposition for Semantic Web technologies in a (very) unintended way. In many ways, the "philosophies" of open source/communities/standards echo a contemporary manifestation of this older way of life. It gives me some relief amidst a modern society obsessed with military expenditure (one of the oldest human archetypes).

However, at that point, my day job picked up. Even though I use Fuxi every day to do inference for reasons other than the original intent, I decided to keep the original name as motivation to (someday) go back to that particular "project", at least as a way to excercise my self-expression (which, as I said earlier, I normally do a poor job of this).

Chimezie Ogbuji

via Copia

Moving FuXi onto the Development Track

[by Chimezie Ogbuji]

I was recently prompted to consider updating FuXi to use the more recent CVS versions of both Pychinko and rdflib. In particular, I've been itching to get Pychinko working with the new rdflib API – which (as I've mentioned) has had it's API updated significantly to support (amongst other things) support for Notation 3 persistence.

Currently, FuXi works with frozen versions of cwm, rdflib, and Pychiko.

I personally find it more effective to work with reasoning capabilities within the context of a querying language than as a third party software library. This was the original motivation for creating FuXi. Specifically, the process of adding inferred statements, dispatching a prospective query and returning the knowledge base to it's original state is a perfect compromise between classic backward / forward chaining.

It frees up both the query processor and persistence layer from the drudgery of logical inference – a daunting software requirement in its own right. Of course, the price paid in this case is the cumbersome software requirements.

It's well worth noting that such on-demand reasoning also provides a practical way to combat the scalability limitations of RDF persistence.

To these ends, I've updated FuXi to work with the current (CVS) versions of rdflib, 4Suite RDF, and pychinko. It's essentially a re-write and provides 3 major modules:

  • FuXi.py (the core component – a means to fire the pychinko interpreter with facts and rules from rdflib graphs)
  • AgentTools.py (provides utility functions for the parsing and scuttering of remote graphs)
  • VersaFuXiExtensions.py (defines Versa extension functions which provide scutter / reasoning capabilities)

Versa Functions:

reason(expr)

This function takes a Versa expression as a string and evaluates it after executing FuXi using any rules associated with the current graph (via a fuxi:ruleBase property). FuXi (and Pychinko, consequently) use the current graph (and any graphs associated by rdfs:isDefinedBy or rdfs:seeAlso) as the set of facts against which the rules are fired.

class(instances)

This function returns the class(es) – rdfs:Class or owl:Class – of the given list of resources. If the current graph has already been extended to include inferred statements (via the reason function, perhaps), it simply returns the objects of all rdf:type statements made against the resources. Otherwise, it registers, compiles, and fires a set of OWL/RDFS rules (a reasonable subset of owl-rules.n3 and rdfs-rules.n3 bundled with Euler) against the current graph (and any associated graphs) before matching classes to return.

type(klasses)

This essentially overrides the default 4Suite RDF implementation of this 'built-in' Versa function which attempts to apply RDFS entailment rules in brute force fashion. It behaves just like class with the exception that it returns instances of the given classes instead (essentially it performs the reverse operation).

scutter(url,expr,steps=5)

This function attempts to apply some best practices in the interpretation of a network of remote RDF graphs. In particular it uses content negotiation and Scutter principles to parse linked RDF graphs (expressed in either RDF/XML or Notation 3). The main use case for this function (and the primary motivation for writing it) is identity-reasoning within a remsotely-hosted set of RDF Graphs (FOAF smushing for example)

The FuXi software bundle includes a short ontology documenting the two RDF terms: one is used to manage the automated association of a rule base with a graph and the other identifies a graph that has been expanded by inference.

I have yet to write documentation, so this piece essentially attempts to serve that purpose, however included in the bundle are some unittest cases for each of the above functions. It works off a small set of initial facts.

Unfortunately, a majority of the aforementioned software requirement liability has to do with Pychinko's reliance on the SWAP code base. Initially, I began looking for a functional subset to bundle but later decided it was against the spirit of the combined body of work. So, until a better solution surfaces, the SWAP code can be checked out from CVS like so (taken from ):

$ cvs -d:pserver:anonymous@dev.w3.org:/sources/public login
password? anonymous
$ cvs -d:pserver:anonymous@dev.w3.org:/sources/public get 2000/10/swap

The latest 4Suite CVS snapshot can be downloaded from ftp://ftp.4suite.org/pub/cvs-snapshots/4Suite-CVS.tar.gz,
Pychinko can be retrieved from the Mindswap svn repository, and rdflib can also be retrieved from it's svn repository.

Chimezie Ogbuji

via Copia

FuXi: FOAFed and DOAPed

I just upgraded and repackaged FuXi (v0.7): added some extra prints in the Versa extension function, added a 'test' directory to the source tree with an appropriate example of how FuXi could be used to make a prospective query against OWL rules and a source graph, created a DOAP instance for FuXi, a FOAF instance for myself, created a permanent home for FuXi, and added FuXi to the SemanticWebDOAPBulletinBoard WiKi. This was primarily motivated by Libby's post on Semantic Web Applications and Demos. I thought it would be perfect forum for FuXi. I probably need to move it into a CVS repository when I can find time.

Below is the output of running the test case:

loaded model with owl/rdfs minimal rules and initial test facts
executing Versa query:
prospective-query('urn:uuid:MinimalRules',
                               'urn:uuid:InitialFacts',
                               'distribute(list(test:Lion,test:Don_Giovanni),\'. - rdf:type -> *\')',
                               'urn:uuid:InitialFacts')
extracted 35 rules from urn:uuid:MinimalRules
extracted 16 facts from source model (scoped by urn:uuid:InitialFacts) into interpreter. Executing...
inferred 15 statements in 0.0526609420776 seconds
time to add inferred statements (to scope: urn:uuid:InitialFacts): 0.000159025192261
compiled prospective query, executing (over scope: urn:uuid:InitialFacts)
time to execute prospective query:  0.0024938583374
time to remove inferred statements:  0.0132219791412
[[[u'http://copia.ogbuji.net/files/FuXi/test/Animal',
   u'http://copia.ogbuji.net/files/FuXi/test/LivingBeing']],
 [[u'http://copia.ogbuji.net/files/FuXi/test/DaPonteOperaOfMozart']]]

This particular test extracts inferred statements from an initial graph using a functional subset of the original owl-rules.n3 and executes the Versa query (which essentially asks: what classes do the resources test:Lion and test:Don_Giovanni belong to?) to demonstrate OWL reasoning (class membership extension by owl:oneOf and owl:unionOf, specifically).

see previous posts on FuXi/N3/4Suite:

Chimezie Ogbuji

via Copia

YiJing SVG Plotter

About a year or more ago I had an idea that a simple python/SVG library could be written to aid the drawing of the very rudementary components of the yijing in modular fashion upon which the more complex diagrams could be very easily drawn (programatically). Philosophically, it can be thought of extending the concepts within the text into a program that represents the ideas in it. A little beatnick-ish? Well, using SVG, binary numerics and an understanding of the more fundamental arrangements of the trigrams I was able to write such a library: YiJingPlotter.py. It takes advantage of the translation of the trigrams to their binary values (see earlier post) in order to draw them in 2 dimensional coordinate space (leveraging SVG for this purpose). And in 218 lines of code I was able to write the library as well as 2 utility functions that produced the two most (arguably) fundamental / useful arrangements of the trigrams in SVG:

FuXi's circular arrangement

Shao Yung's square diagram

Once again I would embed the SVG diagrams, but alas there is still (apparently) no browser-agnostic way to do this (someone inform me if there is)

The library (written in python) relies on:

I tried to comment as heavily as possible for anyone interested in using the library to generate other diagrams. Comments from the second of the two utility functions are below:

Another demonstration of a classic arrangement drawn using the gua/trigram plotting functions. This is ShaoYong's Square. Probably the most useful (in my opinion) arrangement for observing the relationships between the fully developed 64 gua. Within each row, the lower trigrams are all of the same kind (he refered to them as the 'palace' of earth, mountain, etc..) and within each column the upper trigrams are also of the same kind. So, essentially it is a 2 dimensional plot of the 64 gua where the X coordinate is the upper gua and the Y coordinate is the lower gua. This incredible numeric symmetry comes from simply drawing the gua in ascending binary order from 0 - 63, 8 per line! I've added the english names of the corresponding coordinates so a student can match up the lower/upper gua (by name) to find the gua formed.

Note: I'm still unsure of the proper spelling of Shao Yung's name (Wikipedia has it as Shao Yung, however I've seen various references to Shao Yong)

Chimezie Ogbuji

via Copia

Wikipedia Links to Primary Gua

With regards to my last entry on the primary trigrams, Wikipedia links to the fully developed 8 primary hexagrams (out of 64) are below with their binary values and names (I'm partial to Alfred Huang's translation of the symbol names - instead of the more common Richard Wilhelm translations):

Initiating - 111111

Responding - 0000000

Keeping Still - 001001

Darkness - 010010

Proceeding Humbly - 011011

Taking Action - 100100

Brightness - 101101

Joyful - 110110

I imagine these would be the most appropriate URI's to represent each hexagram if ever modeled in RDF.

-- Chimezie

[Uche Ogbuji]

via Copia

The Earliest Juncture of Semiotics and Mathematics

The Trigrams and My Interest

My interest in the trigrams of the very ancient Yijing is mostly scholastic. It's the coherent set of philosophies (or canon), derived from these trigrams and what amounts to a mathematical interpretation of everything that have had a more concrete effect on how I go about my life and how I deal with adversity.

The trigrams are many things, but their most interesting characteristics (from a secular point of view) are their direct analogy to the binary numerical system as well as the fact that they (undisputedely) represent the earliest coherent example of humankind's study of semionics:

the philosophical theory of the functions of sign and symbols

The infinite Characteristics of the Trigrams

The first (and less emphasized) of these two characteristics of the trigrams was formally observed by the German mathematician Gottfried Wilhelm Leibniz (the original observation is probably as old as the purported author of the trigrams: FuXi). He, is the creator of the modern binary system of counting, which is the primary framework upon which microprocessor design is based (an important, historical irony).
He noticed that the concept of duality/balance evident in the trigrams' source (the )) as well as the derived related philosophies are directly analogous to the binary system when you substitute 0 for dashed lines (yin - the concept of no motion) and 1 for unbroken lines (yang - the concept of motion / kinetic energy).

The trigrams are meant to be interpreted from the bottom up, so a continuation of this binary analog would have the reader tip the trigrams over to their right side and read them as binary numbers.

The Binary Analog of the Primary Gua

Below is the original horizontal arrangement of the trigrams with their corresponding binary numbers (click on each to view the corresponding SVG diagram):

Earth - 000 Mountain - 001 Water - 010 Wind - 011 Thunder - 100 Fire - 101 Lake - 101 Heaven - 111

Extension to the 64 Trigrams of the Yijing

Since, the 8 primary gua are the building blocks upon which the 64 symbols of the Yijing are built (and purportedly, everything), this binary analogy can be extended to all the 64 symbols. This is well known amongst scholars of the Yijing and below is the most famous diagram of this extension by Shao Yong (1011AD - 1077AD):

Shao Yong's Diagram

The numerical significance of the trigrams in sequence is well summarized here. This page also includes a very useful animated image of the entire sequence as a binary progression:

FuXi Sequence

The most complete resource on the subject (that I've read so far) is Alfred Huang's The Numerology of the I Ching (ISBN: 0-89281-811-5)

I was unable to embed the SVG diagrams within the page, which is a shame because the yijing trigrams are an excellent SVG use case. I hope to someday capture all 64 as SVG diagrams so the various, more popular philosophical/visual arrangements can be rendered programatically. Imagine Shao Yong's circular diagram as SVG (talk about an interesting combination of ancient numerology with modern vector graphic technology). It would prove quite a useful tool for avid students of the yijing symbols as well as make for some very interesting patterns.

[Chimezie Ogbuji]

via Copia