The role of leadership in informatics and engineering academia in lowering the cost of quality care

by Chimezie Ogbuji

The response by the Health Care industry to the quality reporting requirements of the ACA and the subsequent response to that response (by the Dept. of Pres. Obama's HHS) of slashing the number of measures that need to be reported demonstrates how much the use of information systems (and informatics) in the medical information systems of the US is in the dark ages (as a director of clinical research once put it to me many times).

The informatics needs of converting relational healthcare data into various target variables for the purpose of aggregate "reporting" is a solved problem from the perspective of database theory, however risk averse healthcare providers shell out millions to hegemony-oriented software companies (whether it be those that sell shrink wrapped products or those that sell services) to solve trivial informatics problems.

I think there is a great opportunity for AI (in general), and logic-based knowledge representation (specifically) to be resurrected from the graveyard (or winter) of pure research into playing a prominent role in the engineering underlying what really needs to be done to lower the cost associated with leveraging information to make the provision of care more efficient.

Perhaps, even the idea of the Semantic Web (separate from the WWW-related technologies that enable it) can avoid falling for the same fate and be a part of this. However, the stewards of the places where peer-reviewed scientific research is done and literature is produced on the topic(s) of informatics (web-based informatics even) need to jettison the cancer of obsession with aesthetic / academic purity: novelty of methods described in written material, citation history of authors, thoroughness of literature review, etc. This cancer is what seems to separate pure (computer) science research from informatics, or the promulgation or accreditation of professional engineering (software or otherwise).  

The development of standards, curricula, system methodology, and (ultimately) scientific literature needs to be more problem-focused (ergo engineering).  The things that will make a difference will not be the things that are truly novel but those that involve the combination of engineering solutions that are novel and others that are mundane.

FuXi: Becoming a Full-fledged Logical Reasoning System

[by Chimezie Ogbuji]

I've been doing alot of "Google reading" lately

Completing Logical Reasoning System Capabilities

With the completion (or near completion) of PML-generating capabilities for FuXi, it is becoming a fully functional logical reasoning system. In "(Artificial Intelligence: A Modern Approach)" Stuart Russel and Peter Norvig identify the following main categories for automated reasoning systems:

  1. Theorem provers
  2. Production systems
  3. Frame systems and semantic networks
  4. Description Logic systems

OWL and RDF are coverage for 3 and 4. The second category is functionally covered by the RETE-UL algorithm FuXi employs (a highly efficient modification of the original RETE algorithm). The currently developing RIF Basic Logic Dialect covers 2 - 4. Proof Markup Language covers 1. Now, FuXi can generate (and export visualization diagrams) Proof Markup Language (PML) structures. I still need to do more testing, and hope to be able to generate proofs for each of the OWL tests. Until then, below is a diagram of the proof tree generated from the "She's a Witch and I have Proof" test case:

# http://clarkparsia.com/weblog/2007/01/02/burn-the-witch/
# http://www.netfunny.com/rhf/jokes/90q4/burnher.html
@prefix : <http://www.w3.org/2000/10/swap/test/reason/witch#>.
@keywords is, of, a.
#[1]    BURNS(x) /\ WOMAN(x)            =>      WITCH(x)
{ ?x a BURNS. ?x a WOMAN } => { ?x a WITCH }.
#[2]    WOMAN(GIRL)
GIRL a WOMAN.
#[3]    \forall x, ISMADEOFWOOD(x)      =>      BURNS(x)
{ ?x a ISMADEOFWOOD. } => { ?x a BURNS. }.
#[4]    \forall x, FLOATS(x)            =>      ISMADEOFWOOD(x)
{ ?x a FLOATS } => { ?x a ISMADEOFWOOD }.
#[5]    FLOATS(DUCK)
DUCK a FLOATS.
#[6]    \forall x,y FLOATS(x) /\ SAMEWEIGHT(x,y) =>     FLOATS(y)
{ ?x a FLOATS. ?x SAMEWEIGHT ?y } => { ?y a FLOATS }.
# and, by experiment
# [7]   SAMEWEIGHT(DUCK,GIRL)

Shes a witch and I have proof trace

There is another test case of the "Dan's home region is Texas" test case on a python-dlp Wiki: DanHomeProof:

@prefix : <gmpbnode#>.
@keywords is, of, a.
dan home [ in Texas ].
{ ?WHO home ?WHERE.
  ?WHERE in ?REGION } => { ?WHO homeRegion ?REGION }.

Dans home region is Texas proof

I decided to use PML structures since there are a slew of Stanford tools which understand / visualize it and I can generate other formats from this common structure (including the CWM reason.n3 vocabulary). Personally, I prefer the proof visualization to the typically verbose step-based Q.E.D. proof.

Update: I found nice write-up on the CWM-based reason ontology and translations to PML

So, how does a forward-chaining production rule system generate proofs that are really meant for backward chaining algorithms? When the FuXi network is fed initial assertions, it is told what the 'goal' is. The goal is a single RDF statement which is being prooved. When the forward-chaining results in a inferred triple which matches the goal, it terminates the RETE algorithm. So, depending on the order of the rules and the order that the initial facts are fed it will be (for the general cases) less efficient than a backward chaining algorithm. However, I'm hoping the blinding speed of the fully hashed RETE-UL algorithm makes up the difference.

I've been spending quite a bit of time on FuXi mainly because I am interested in empirical evidence which supports a school of thought which claims that Description Logic based inference (Tableaux-based inference) will never scale as well the Logic Programming equivalent - at least for certain expressive fragments of Description Logic (I say expressive because even given the things you cannot express in this subset of OWL-DL there is much more in Horn Normal Form (and Datalog) that you cannot express even in the underlying DL for OWL 1.1). The genesis of this is a paper I read, which lays out the theory, but there was no practice to support the claims at the time (at least that I knew of). If you are interested in the details, the paper is "Description Logic Programs: Combining Logic Programs with Description Logic" and written by many people who are working in the Rule Interchange Format Working Group.

It is not light reading, but is complementary to some of Bijan's recent posts about DL-safe rules and SWRL.

A follow-up is a paper called "A Realistic Architecture for the Semantic Web" which builds on the DLP paper and makes claims that the current OWL (Description Logic-based) Semantic Web inference stack is problematic and should instead be stacked ontop of Logic Programming since Logic Programming algorithm has a much richer and pervasively deployed history (all modern relational databases, prolog, etc..)

The arguments seem sound to me, so I've essentially been building up FuXi to implement that vision (especially since it employes - arguably - the most efficient Production Rule inference algorithm). The final piece was a fully-functional implementation of the Description Horn Logic algorithm. Why is this important? The short of it is that the DLP paper outlines an algorithm which takes a (constrained) set of Description Logic expressions and converts them to 'pure' rules.

Normally, Logic Programming N3 implementations pass the OWL tests by using a generic ruleset which captures a subset of the OWL DL semantics. The most common one is owl-rules.n3. DLP flips the script by generating a rule-set specifically for the original DL, instead of feeding OWL expressions into the same network. This allows the RETE-UL algorithm to create an even more efficient network since it will be tailored to the specific bits of OWL.

For instance, where I used to run through the OWL tests in about 4 seconds, I can now pass them in about 1 secound using. Before I would setup a RETE network which consisted of the generic ruleset once and run the tests through it (resetting it each time). Now, for each test, I create a custom network, evaluate the OWL test case against it. Even with this extra overhead, it is still 4 times faster! The custom network is trivial in most cases.

Ultimately I would like to be able to use FuXi for generic "Semantic Web" agent machinery and perhaps even to try some that programming-by-proof thing that Dijkstra was talking about.

Chimezie Ogbuji

via Copia

Rewriting Source Content Descriptions as Versa Queries

I recently read Morten Frederiksen's blog entry about implementing Source Content Descriptions as SPARQL queries in Redland and was quite interested. Especially the consideration that such queries could be automatically generated and the set of these queries you would want to ask is small and straight forward. Even more interesting was Morten's step-by-step walk-thru of how such queries would be translated to SQL queries on a Redland Triple store sitting on top of MySQL (my favorite RDBMS deployment for 4RDF as well).

However, I couldn't help but wonder how such a set of queries would be expressed in Versa (in my opinion, a language more aligned with the data model it queries than it's SQL-RDQL counter-parts). So below was my attempt to port the queries into versa:

Classes used in the store

SPARQL
SELECT DISTINCT ?Class
WHERE { ?R rdf:type ?Class }
Versa
set(all() - rdf:type -> *)

Predicates that are used with instances of each class

SPARQL
SELECT DISTINCT ?Class, ?Property
  WHERE { ?R rdf:type ?Class .
        OPTIONAL { ?R ?Property ?Object .
                   FILTER ?Property != rdf:type } }
Versa
difference(
  properties(set(all() - rdf:type -> *)),
  set(rdf:type)
)

Do all instances of each class have a statement with each predicate?

It wasn't clear to me if the intent was to check if all classes have a statement with each predicate as specified by an ontology or to just count how many properties each class instance has. The latter interpretation is the one I went with (it's also simpler). This particular query will return a list of lists, each inner list consisting of two values: the URI of a distinct class instance and the number of distinct properties described in a statements about it (except rdf:type)

Versa
distribute(
  set(all() |- rdf:type -> *),
  '.',
  'length(
    difference(
      properties(.),
      set(rdf:type)
    )
  )'
)

Is the type of object in a statement with each class/predicate combination always the same?

I wasn't clear on the intent of this query, either. I wasn't sure if he meant to ask this of the combination with all predicates defined in an ontology or all predicates on class instances in the graph being queried.

But there you have it.

NOTE: The use of the set function was in order to guarantee that only distinct values were returned and may have been used redundantly with functions and expressions that already account for duplication.

[Uche Ogbuji]

via Copia