The role of leadership in informatics and engineering academia in lowering the cost of quality care

by Chimezie Ogbuji

The response by the Health Care industry to the quality reporting requirements of the ACA and the subsequent response to that response (by the Dept. of Pres. Obama's HHS) of slashing the number of measures that need to be reported demonstrates how much the use of information systems (and informatics) in the medical information systems of the US is in the dark ages (as a director of clinical research once put it to me many times).

The informatics needs of converting relational healthcare data into various target variables for the purpose of aggregate "reporting" is a solved problem from the perspective of database theory, however risk averse healthcare providers shell out millions to hegemony-oriented software companies (whether it be those that sell shrink wrapped products or those that sell services) to solve trivial informatics problems.

I think there is a great opportunity for AI (in general), and logic-based knowledge representation (specifically) to be resurrected from the graveyard (or winter) of pure research into playing a prominent role in the engineering underlying what really needs to be done to lower the cost associated with leveraging information to make the provision of care more efficient.

Perhaps, even the idea of the Semantic Web (separate from the WWW-related technologies that enable it) can avoid falling for the same fate and be a part of this. However, the stewards of the places where peer-reviewed scientific research is done and literature is produced on the topic(s) of informatics (web-based informatics even) need to jettison the cancer of obsession with aesthetic / academic purity: novelty of methods described in written material, citation history of authors, thoroughness of literature review, etc. This cancer is what seems to separate pure (computer) science research from informatics, or the promulgation or accreditation of professional engineering (software or otherwise).  

The development of standards, curricula, system methodology, and (ultimately) scientific literature needs to be more problem-focused (ergo engineering).  The things that will make a difference will not be the things that are truly novel but those that involve the combination of engineering solutions that are novel and others that are mundane.

IEEE IC Special Issue is Out

by Chimezie Ogbuji

Ogbuji, Chimezie;   Gomadam, Karthik;   Petrie, Charles;  
Case Western Reserve University 

This paper appears in: Internet Computing, IEEE
Issue Date: July-Aug. 2011
Volume: 15 Issue:4
On page(s): 10 - 13
ISSN: 1089-7801
Digital Object Identifier: 10.1109/MIC.2011.99 
Date of Current Version: 2011-06-30 10:41:12.0
Sponsored by: IEEE Computer Society 


Contemporary Web-based architectures can help address the technological and architectural challenges inherent to modern personal health record (PHR) systems. Current research in the area of healthcare informatics has focused on incorporating Web-based technology for PHR systems' primary functions. This special issue presents work in this area of research.


I received my complementary copy of this IEEE IC with the special issue on Personal Health Records that I was guest editor for. It turned out well in the end.

My Thoughts after 7 Years of Being a Web-based Patient Registry Architect

Monday January 17th, 2011 is my last day at the Cleveland Clinic where I was Lead Systems Analyst for 7 years working on a very exciting project with a goal to replace a relational Cardiovascular Information Registry that supported research in cardiovascular medicine and surgery and consolidate data management for research purposes in the surgery department.  Eventually, our Clinical Investigation unit became part of the Heart and Vascular Institute.  

The long-term goal was to create a framework for context-free data management systems in which expert-provided, domain-specific knowledge is used to control all aspects of data entry, storage, display, retrieval, communication, and formatting for external systems.  By ‘context-free’, I mean that the framework can be used for any domain (even outside of medicine) and nothing about the domain is assumed or hardcoded.  The use of metadata was envisioned as key to facilitating this capability and to this end the use of RDF was effective as a web and logic-based knowledge representation.

At the time, I was unemployed soon after the post 9-11 economic and innovation bubble wherein there was great risk aversion to using the emerging technologies of the time: XML, RDF, XSLT, Python, REST, etc.  I was lucky that in the city whereI was born, a couple of miles (literally) from where I was born and my mother worked there was a great job opportunity under a mandate from our director for a de novo, innovative architecture.  I went for the interview and was fortunate to get the job.

Four years later (in the fall of 2007), It was deployed for production use by nurses, coders, and researchers and built on top of the predecessor of Akara, an XML & RDF content repository:  

The diagram above documents the application server and services stack that is used by the various components that interface with the repository.  The web services paradigm was never used and most of the service-oriented architecture was pure HTTP with XML (i.e., POX/HTTP)

Mozilla’s Firefox with XForms extension was used for about 5 years for the complete collection of longitudinal patient record content for a little over 200,000 patients who had operations with cardiac(-thoracic) surgical component(s) in a registry.  The architectural methodology was such that the use and deployment of infrastructure (entire data collection screens, XML and RDF schemas, data transformations, etc.) was significantly automated.  W3C document management and semantic web representation standards (HTTP, RDF, XML, N3, OWL, and SPARQL) were used to ensure interoperability, extensibility, and automation, in particular as infrastructure for both (certified) quality measure reporting and a clinical research repository.

I wanted to take the time to coalesce my experience in working on this project and share some of the key desiderata, architectural constraints (that perhaps comprise an architectural style), and opportunities in using these emerging technologies to address the primary engineering challenges of clinical research repositories and patient registries.  The following functionalities stood out as very implementable for such an approach: 

  • patient record abstraction (data entry via knowledge-generated input screens)
  • workflow management
  • data quality management
  • identification of patients cohorts for study
  • data export for statistical analysis

Patient Record Abstraction

XForms brings a rich history of browser-based data entry to bear as comprehensive, declarative syntax that is part of a new architectural paradigm and works well with the architectural style of the World Wide Web: REST.  It abstracts widgets, controls, data bindings, logic, remote data management, integration into a host language, and other related rich internet application requirements.  

I had to check the way back machine but found the abstract of the 2006 presentation: “The Essence of Declarative, XML-based Web Applications: XForms and XSLT”.  There I discussed best practices, common patterns, and pitfalls in using XSLT as a host language for generating web-based user interfaces expressed in XForms.  The XForms-generating infrastructure was quite robust and everything from screen placement, behavior, range checking, drop-down lists, and data field dependencies were described in an XML document written using a microformat designed for templating user interfaces for patient registry data collection.

All in all (and I hope John writes about this at some point since he was the primary architect of the most advanced manifestation of this approach), the use of a microformat for documenting and generating an XForms framework for controlled, validated, form-based data collection was very adequate and sufficed to allow the (often unpredictable) requirements of national registries reporting and our clinical studies to dictate automatically deployed changes to a secure (access controlled) patient record web portal and repository.

Regarding quality management, on December 2007, John and I were supposed to present at XML 2007 about how

XForms provides a direct interface to the power of XML validation tools for immediate and meaningful feedback to the user about any problems in the data. Further, the use of validation components enables and encourages the reuse of these components at other points of entry into the system, or at other system boundaries.

I also found the details of the session (from the online conference schedule) on the wayback machine from the XML 2007 conference site.  The presentation was titled: “Analysis of an architecture for data validation in end-to-end XML processing systems.

Client-side XForms constraint mechanisms as well as server-side schematron validation was the basis for quality management at the point of data entry (which is historically the most robust way to address errors in medical record content).  All together, the Mozilla browser platform presents quite an opportunity to offload sophisticated content management capabilities from the server to the client and XML processing and pipelining played a major role in this regard.  

Workflow Management

See: A Role for Semantic Web Technologies in Patient Record Data Collection where I described my chapter in the LInked Enterprise Data book regarding the use of semantic web technologies to facilitate patient record data collection workflow.  In the Implementation section, I also describe how declarative AJAX frameworks such as Simile Exhibit were integrated for use in faceted browsing and vizualization of patient record content.  RDF worked well as the state machine of a workflow engine.  The digital artifacts involved in the workflow application and the messages sent back and forth from browser to server are XML and JSON documents.  The content in the documents are mirrored into an RDF dataset describing the state of the workflow task which can be queried when listing the workflow tasks associated with the currently logged in user (for instance).  

This is an archetype of an emerging pattern where XML is used as the document and messaging syntax and a semantics-preserving RDF rendering of the XML content (mirrored persistently in an RDF dataset) is used as the knowledge representation for inference and querying.  More on this archetype later.  

Identification of Patient Cohorts for Study

Eric Prud’hommeaux has been doing alot of excellent infrastructure work (see: SWObjects) in the Semantic Web for Healthcare and Life Sciences Interest Group around the federated use of SPARQL for querying structured, discrete EHR data in a meaningful way and in a variety of contexts: translational research, clinical observations interoperability, etc.  His positive experience has mirrored ours in the use of SPARQL as the protocol for querying a patient outcome registry and clinical research database.

However, we had the advantage of already having the data natively available as a collection of RDF graphs (each of which describes a longitudinal patient record) from the registry.  A weekly ETL-like process recreates an RDF dataset from the XML collection and serves as the snapshot data warehouse for the operational registry, which relies primarily on XML payload for the documentation, data collection, and inter-system messaging needs.  Other efforts have been more focused on making extant relational EHR data available as SPARQL.  

This was the backdrop to the research we did with Case Western Reserve University P.h.D students in the summer and fall of 2008 on the efficient use of relational algebra to evaluate the SPARQL language in its entirety.  

GRDDL Mechanisms and Dual Representation

Probably the most substantive desiderata or opportunity in patient record systems (and registries) of the future is as hard to articulate as it is to (frankly) appreciate and understand.  However, I recently rediscovered the GRDDL usecase that does a decent job of describing how GRDDL mechanisms can be used to address the dual syntactic and semantic interoperability challenges in patient registry and record systems.

XML is the ultimate structured document and messaging format so it is no surprise it is the preferred serialization syntax of the de facto messaging and clinical documentation standard in healthcare systems.  There are other light-weight alternatives (JSON), but it is still the case that XML is the most pervasive standard in this regard.  However, XML is challenged in its ability to specify the meaning of its content in a context-free way (i.e., semantic interoperability).  Viewing RDF content as a rendering of this meaning ( a function of the structured content ) and viewing an XML vocabulary as a special purpose syntax for RDF content is a very useful way to support (all in the same framework): 

  • structured data collection
  • standardized system-to-system messaging
  • automated deductive analysis and transformation
  • structural constraint validation.

In short, XML facilitates the separation of presentation from content and semantic-preserving RDF renderings of XML facilitate the separation of syntax from semantics.  

The diagram above demonstrates how this separation acts as the modern version of the tranditional boundary between transactional systems and their datawarehouse in relational database systems.  In this updated paradigm, however, XML processing is the framework for the former, RDF processing is the framework for the latter, and XSLT (or any similar transform algorithm) is the ETL framework.

In the end, I think it is safe to say that RDF and XSLT are very useful for facilitating semantic and syntactic automation in information systems.  I wish I had presented or written more on this aspect, but I did find some older documents on the topic along with the abstract of the presentation I gave with William Stelhorn during the first Semantic Technologies Conference in 2005:

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica} span.s1 {letter-spacing: 0.0px}

The diagram above illustrates how a domain model (described in an RDF vocabulary) can be used by an automated component to generate semantic and syntactic schemas (OWL and RELAX-NG respectively) as well as GRDDL transforms (as XSLT) that faithfully render the structured content in a machine-understandable knowledge representation (RDF).  In this way, many of the data management tooling and infrastructure can be managed by tweaking declarative documentation (not programming code) and automatically generating the infrastructure compiled specifically for a particular domain.

The Challenges

I have mostly mentioned where things worked well.  There were certainly significant challenges to our project, but most of them were not technical in nature.  This seems to be a recurring theme in medical informaticis.  

The main technical challenge has to do with support in RDF databases or triplestores for an appreciable amount of write as well as read operations.  This shortcoming will be particularly emphasized given the emerging SPARQL 1.1 specifications having to do with write operations becoming W3C recommendations and their inevitable adoption.  Most contemporary relational schemas for RDF storage are highly-normalized star schemas where the there isn't a single fact table but rather (in the most advanced cases) this space is partitioned into separate tables and divided along the general, distinct kinds of statements you can have in an RDF graph: RDF statements where the predicate is rdf:type, statements where the object is an RDF literal, and statements where the object is an RDF URI.  So, it is more like a conjoined star schema, for a lack of a better term.  

This minimizes self-joins since the data space is partitioned amongst multiple tables and a query such as this:

SELECT ?MRN { ?PATIENT hasMedicalRecordNumber ?MRN }  

Would ony require (assuming we knew a priori that the range of the hasMedicalRecordNumber property is RDF literals) using the single table for RDF statements where the object is an RDF literal.  If an additional triple pattern is added that matches RDF statements where the object is a URI (or Blank Node), then this results in a join between the two tables rather than a self-join on a single massive table for all the RDF statements.

In such an architecture, there are separate lookup tables that map internal identifiers (integers perhaps) to their full lexical form as URIs or RDF literals for instance.  This enforces the re-use of terms so that space is not wasted if two RDF statemens refer to the same URI.  Internally, they will refer to the same row in the RDF term lookup table.  This is often called string interning.  

However, the downside to this kind of normalization, which is significantly optimized for usecases that primarily involve read access (i.e., datawarehouse or OLAP scenarios), does not bode well when updates are needed to be made to part of an RDF graph in a RDF dataset or the entire graph.  If an RDF statement is removed and it is the only one that references a particular RDF URI, that URI will need to be garbage collected or removed from the RDF term lookup table.  It is such optimizations for OLAP behavior that almost require that high volume updates to RDF content happen as massive ETL jobs where the entire RDF collection of patient record content is replaced rather than doing so one patient record graph at a time.  

This fundamental challenge is the same reason why (some time back) the content repository underlying SemanticDB switched from using RDF to manage the state of the repository (file modification dates, internet media types associated with artifacts, etc.) to using cached, in-memory, pre-parsed XML.  It is also the same reason why we didn't use this capability to allow modifications to patient record documents (via XForms data collection screens) to be immediately reflected into the underlying RDF dataset.  In both cases, writing to the RDF dataset became the prominent bottleneck as the RDF database was not able to support Online Transactional Processing (OLTP).

There was an additional semi-technical challenge that had (and still has) to do with a lack of a small, broad, uniform, upper ontology of clinical medicine.  It needs to be certainly much smaller than SNOMED-CT but certainly able to cover the same material, hence my description of it as an upper ontology.  There are at least 2 projects I know of regarding this:

The latter is my project and was motivated (from the beginning) by this exact problem.  The other problems are massive but not technical.  The two primary ones (in my opinion) are the lack of understanding of - on the one hand - how to concieve and develop a business strategy around open source, open communities, open standards, and open data that relies more on services, the competitive advantage of using emerging technologies where appropriate, and leverages the catalyzing power of web 2.0 / 3.0 technology and social media and - on the other hand - better communication of the merits of semantic web technologies in addressing the engineering and infrastructure challenges of healthcare information systems.  Most of the people who can benefit from better communication in this regard are significantly risk-averse to begin with (another problem in its own right).  

An example of this is that, in my opinion, the only new innovation that semantic web technologies brings to the table is the symbiotic combination of the architecture of the world wide web with knowledge representation.  The use of logic and ontologies to address semantic interoperability challenges predates the semantic web (and even predates Description Logic).  By being precise in describing this difference you can also be precise about decribing the value with very little risk of over stating it.  

Any problem domain where the meaning of data plays a major role will benefit from tranditional deductive database and expert systems (such as prolog or business rule systems respectively) as easily as it would from semantic web technologies.  However, a problem domain where linking data, identifying concepts and digital artifacts in a universal and re-usable way, and leveraging web-based infrastructure is a major factor will benefit from semantic web technologies in a way that it wouldn't from the tradtional alternatives.  This simplification of the value proposition message (for consumption by risk-averse laypeople) also helps to sharpen the distinctions between the markets that a business strategy can target as well as target the engineering problems these emergning technologies should (and should not) attempt to address.  A sharper, straightforward message is needed to break the political and generational barriers that retard the use of potentionally transformational technologies in this field that is a major contribution to the economic instability of this country.

Many of these technological opportunities transfer directly over for use in Patient Controlled Health Records (PCHR) systems.  I also think much of the risk aversion associated with the atmosphere I found myself in after leaving Fourthought (for instance) and generally in large institutions contributes to why the very evident opportunities in leveraging rich web application (“web 2.0”) and semantic web infrastructure (“web 3.0”) have not had as much penetration in healthcare information technology as one would expect.

My new job is as a Senior Research Associate with the Center for Clinical Investigation in the Case Western School of Medicine.  I will be managing Clinical and Translational Science Collaboration (CTSC) clinical, biomedical, and administrative informatics projects as well as designing and developing the informatics infrastructure that supports this.  Coupled with being a part-time P.h.D. student, I will still essentially be doing clinical research informatics (i.e., the use of informatics to facilitate biomedical and health research), however the focus will be on translational research: translating the findings in basic research more quickly and efficiently into medical practice and meaningful health outcomes: physical, mental, or social.  So, I imagine the domain will be closer to the biology end of the spectrum and there will be more of an explicit emphasis on collaboration.

I have thoroughly enjoyed my time working on such a challenging project, in such a world-renown intitution, and working under such a visionary mandate.  I was priviledged to be able to represent the Cleveland Clnic in the W3C through the various working groups developing standards relevant to the way in which we were leveraging semantic web technologies:

  • Semantic Web for Healthcare and Life Sciences Interest Group
  • Data Access Working Group
  • GRDDL Working Group 

If not for the exposure at CCF to the great challenges in medical informatics and equally great opportunities to address them, I would probably have never considered going back to seek a P.h.D in this field.  Although I will be working with a different instutition, it will still essentially be in the Univiersity Circle area and only about a 20 minute walk from where I was at the Cleveland Clinic.  I'm very proud of what we were able to do and I'm looking forward to the future.

A Role for Semantic Web Technologies in Patient Record Data Collection

I found out today that not only is the Linking Enterprise Book now available but it is also freely available online as well as in other avenues (Springer and pre-order on Amazon):

Linking Enterprise Data is the application of Semantic Web architecture principles to real-world information management issues faced by commercial, not-for-profit and government enterprises.This book aims to provide practical approaches to addressing common information management issues by the application of Semantic Web and Linked Data research to production environments.


I wrote a chapter ("A Role for Semantic Web Technologies in Patient Record Data Collection") discussing the debate around SOAP-based web services and Representational State Transfer (REST) that focuses on a specific, deployed use case that emphasizes the role of the Semantic Web, a simple Web application architecture that leverages the use of declarative XML processing, and the needs of a workflow system for patient record data collection.  It touches just a bit some of the use of XForms to manage patient record content as special-purpose XML dialects for RDF graphs that I mentioned in my last post but is mostly focused on how to use RDF to manage workflow state to orchestrate data collection of patient data.

Business Process Management Systems (BPMS) are a component of the stack of Web standards that comprise Service Oriented Architecture (SOA). Such systems are representative of the architectural framework of modern information systems built in an enterprise intranet and are in contrast to systems built for deployment on the larger World Wide Web. The REST architectural style is an emerging style for building loosely coupled systems based purely on the native HTTP protocol. It is a coordinated set of architectural constraints with a goal to minimize latency, maxi- mize the independence and scalability of distributed components, and facilitate the use of intermediary processors. Within the development community for distributed, Web-based systems, there has been a debate regarding the merits of both approaches. In some cases, there are legitimate concerns about the differences in both architec- tural styles. In other cases, the contention seems to be based on concerns that are marginal at best. 

In this chapter, we will attempt to contribute to this debate by focusing on a specific, deployed use case that emphasizes the role of the Semantic Web, a simple Web application architecture that leverages the use of declarative XML processing, and the needs of a workflow system. The use case involves orchestrating a work process associated with the data entry of structured patient record content into a research registry at the Cleveland Clinic’s Clinical Investigation department in the Heart and Vascular Institute

IEEE Internet Computing Special Issue: Web Technology and Architecture for Personal Health Records

IEEE Internet Computing is soliciting original articles describing the development of, relevant trends, and challenges incorporating contemporary Web-based technology for the primary functions of Personal Health Record (PHR).  Of particular interest are PHR systes that capture healthcare data entered by patients themselves: Personally Controlled Health Records (PCHR).  If you are interested please email either of the guest editors: Me ( /, Karthik Gomadam (, or Charles Petrie (

Please email the guest editors a brief description of the article you plan to submit by 15 October 2010.  Final submissions are due on the first of November 2010.

The main functional categories of interest are information collection, sharing, exchange, and management.

Appropriate topics of interest include

  • Web-based, structured data collection in PHR systems
  • implementations of access-control policies and healthcare data sharing
  • distributed, identity-based authentication methods
  • digital signature and encryption techniques
  • Web portal architecture’s general components and capabilities as the basis for a PHR system
  • architectural paradigms regarding connectivity to other healthcare information producers and consumers
  • data models for PHR systems
  • distributed data subscription and publishing protocols
  • successful Web-based applications for chronic disease and medication management
  • health applications for PHR systems on mobile devices
  • privacy and security issues
  • HIPAA and its implications for adopting cloud computing for PHR applications
  • semantics for PHR interoperability and applications

All submissions must be original manuscripts of fewer than 5,000 words, focused on Internet technologies and implementations. All manuscripts are subject to peer review on both technical merit and relevance to IC’s international readership — primarily system and software design engineers. We do not accept white papers, and we discourage strictly theoretical or mathematical papers.

To submit a manuscript, please log on to Manuscript Central to create or access an account, which you can use to log on to IC‘s Author Center and upload your submission.

SNOMED-CT Management via Semantic Web Open Source Tools Committed to Google Code

[by Chimezie Ogbuji]

I just committed my working copy of the set of tools I use to manipulate and serialize SNOMED-CT (the Systematized Nomenclature of Medicine) and the Foundational Model of Anatomy (FMA) as OWL/RDF for use in the clinical terminology research I’ve been doing lately. It is still in a very rough form and probably not usable by anyone other than a Python / Semantic Web hacker such as myself. However, I’m hoping to get it to a shape where it can be used by others. I had hesitated to release it mostly because of my concerns around the SNOMED-CT license, but I’ve been assured that as long the hosting web site is based in the united states and (most importantly) the software is not released with the SNOMED distribution it should be okay.

I have a (mostly empty) Wiki describing the command-line invocation. It leverages InfixOWL and rdflib to manipulate the OWL/RDF. Basically, once you have loaded the delimited distribution into MySQL (the library also requires MySQLdb and an instance of MySQL to work with), you can run the command-line, giving it one or more list of SNOMED-CT terms (by their identifiers) and it will return an OWL/RDF representation of an extract from SNOMED-CT around those terms.

So, below is an example of running the command-line to extract a section around the term Diastolic Hypertension and piping the result to the FuXi commandline in order to select a single class (sno:HypertensiveDisorderSystemicArterial) and render it using (my preferred syntax for OWL: the Manchester OWL syntax):

$python -e 48146000 -n short -s localhost -u ..mysql username.. --password=..mysql password.. -d snomed-ct | FuXi,2007-07-31:SNOMED-CT# --output=man-owl --class=sno:HypertensiveDisorderSystemicArterial --stdin
Class: sno:HypertensiveDisorderSystemicArterial
    ## Primitive Type (Hypertensive disorder) ##
    SNOMED-CT Code: 38341003 (a primitive concept)
              Clinical finding
              ( sno:findingSite some Systemic arterial structure )

Which renders an expression that can be paraphrased as

‘Hypertensive Disorder Systemic Arterial’ is a clinical finding and disease whose finding site is some structure of the systemic artery.

I can also take the Burn of skin example from the Wikipedia page on SNOMED and demonstrate the same thing, rendering it in its full (verbose) OWL/RDF/XML form:

<owl:Class rdf:about=",2007-07-31:SNOMED-CT#BurnOfSkin">
  <owl:intersectionOf rdf:parseType="Collection">
        <owl:ObjectProperty rdf:about=",2007-07-31:SNOMED-CT#findingSite"/>
      <owl:someValuesFrom rdf:resource=",2007-07-31:SNOMED-CT#SkinStructure"/>
    <rdf:Description rdf:about=",2007-07-31:SNOMED-CT#ClinicalFinding"/>
        <owl:ObjectProperty rdf:about=",2007-07-31:SNOMED-CT#associatedMorphology"/>
      <owl:someValuesFrom rdf:resource=",2007-07-31:SNOMED-CT#BurnInjury"/>
    <rdf:Description rdf:about=",2007-07-31:SNOMED-CT#Disease"/>
  <rdfs:label>Burn of skin</rdfs:label>

And then in its more palatable Manchester OWL form:

$ python -e 284196006 -n short -s localhost -u ..username.. --password= -d snomed-ct | FuXi,2007-07-31:SNOMED-CT# --output=man-owl --class=sno:BurnOfSkin --stdin
Class: sno:BurnOfSkin
    ## A Defined Class (Burn of skin) ##
    SNOMED-CT Code: 284196006
      ( sno:ClinicalFinding and sno:Disease ) that
      ( sno:findingSite some Skin structure ) and (sno:associatedMorphology some Burn injury )

Which can be paraphrased as:

A clinical finding or disease whose finding site is some skin structure and whose associated morphology is injury via burn

The examples above use the ‘-n short’ option, which renders extracts in OWL via the short normal form which uses a procedure described in the SNOMED-CT manuals that produces a more canonical representation, eliminating redundancy in the process. It currently only works with the 2007-07-31 distribution of SNOMED-CT but I’m in the process of updating it to use the latest distribution. The latest distribution comes with its own OWL representation and I’m still trying to wrap my head around some quirks in it involving role groups and whether or not this library would need to change so it works directly off this OWL representation instead of the primary relational distribution format. Enjoy,  

Health Care Technology @ O'Reilly's Open Source Convention

Andy Oram has recently written about O'Reilly's Open Source convention which contains a track on health care IT. As he discusses in that article, the potential value proposition of open source software and open data initiatives (and royalty-free standards) in making a difference in how electronic medical records are stored and shared is significant. I have seen it first hand, having worked on a patient registry that is mostly composed of open source components. 

As a result of the ARRA act (the stimulus bill), there is a significant incentive for healthcare professionals to demonstrate meaningful use of EHRs. This criteria is comprised of 3 requirements:

  1. Use of certified EHR technology in a meaningful manner 
  2. Utilize certified EHR technology connected for health information exchange 
  3. Use of certified EHR technology to submit information on clinical 
    quality measures

These are very broad requirements, but the way they can be achieved
and the role of open data / source and royalty-free standards and
helping achieve these requirements can be seen by looking at some of
the challenges [1] that currently limit the meaningful use of Health
Information Technology (HIT):

  1. Clinical information systems from disparate hospitals do not communicate with each other automatically because implementation of existing standards is lacking 
  2. Data standards for medical specialities need further development to 
    accurately communicate intricacies of care 
  3. Database architectures are often designed to support single 
    clinical applications and are not easily modified 
  4. HIT increases the cost of doing business: cost of technology, 
    training, maintenance, system optimization, and skilled personnel
  5. Healthcare organizations have little recourse if a vendor fails to 
    deliver once the vendor's system becomes embedded into the 
    organization (vendor lock-in) 
  6. Decisions on technology acquisitions and implementations are often 
    made by people that lack clinical informatics expertise 

Promulgation of royalty-free standards address the lack of standards and cost of using such standards. Involvement of multiple member organizations in developing such standards build some amount of serendipity into the systems that use them, given the rigor that typically goes into creating these standards.

Open source software similarly addresses the cost of technology as well, and in addition tend to expand the pool of skilled personnel available to use them by virtue of the communities that are built around them. From these communities often come a significant resource to tap in maintaining and optimizing such systems. For example, the informatics team I work with at the Cleveland Clinic's Heart and Vascular Institute (on SemanticDB) currently use MySQL as the backend for our RDF triple store and none of the developers who maintain and optimize this aspect of our software ever needed to travel to a site to learn MySQL as most of what we needed to know was widely available on various internet sites.

Much of these benefits are turned on their head when healthcare organizations find themselves in the proverbial position of "vendor lock in". Vendors of HIT, like most other capitalist entities, seek to perpetuate their grip on a market via steady incline to a platform built entirely on their goods. An information technology market based on royalty-free standards and open source is a counter weight to this insofar as vendor lock in is much harder to achieve if the platforms are built to standards that were developed in concert with various industry players and thus diffusing the competition.

This potential bodes well for a future HIT landscape that looks quite different from the one we have today and the impetus of the new incentives put in place by this administration might accelerate this change. For those interested in this topic, you might want to also take a look at Andy's most recent summary of the health care OSCon technology track.

[1] Kadry, B. and Sanderson, I.C. and Macario, A., Challenges that limit meaningful use of health information technology, Current Opinion in Anaesthesiology, volume 23, 2010.

Indivo X - A Promising Framework for Personal Healthcare Records

Indivo X Alpha 1 released

A few days ago, we released the source code for the first public alpha of Indivo X, our latest vision for personally controlled health records. This is a release focused on the Indivo X API, targeted first at developers. Jump right into the installation instructions. (We don't recommend you use this version in a production environment just yet.)

Fred Trotter wrote up his first impressions, and ZDnet picked up the story on open-source and health-reform. We look forward to feedback from the community, and we're already hard at work on Alpha 2, which we expect to deliver in early Spring.

I recently came across the publicly available codebase for Indivo. I've read quite a bit about Indivo during my research regarding Personal Healthcare Records (PHRs). Indivo seems like the most promising for several reasons. First, it is being released to the public (at least a version of it). One of the reasons I have been really driven to learn more about the PHR market is my belief that the combination of social web, the emerging interests of patients to have more direct access to their healthcare data, and the significant stunting of adoption of contemporary web technologies for medical record systems will be a major catalyst to a new generation of health applications. There is alot of data that supports this trend. Second, the seminal paper by the authors of Indivo captures their vision of how PHRs will change the healthcare data landscape. It is a good read for anyone interested this phenomenon. Their vision appeals to me on a visceral level. There is something about the idea of a healthcare data revolution being sparked by patients themselves and their willingness to adopt value-adding technologies that otherwise their caretakers are perhaps too risk averse to consider that appeals to me.

Looking closely at the code base, I discovered that it is comprised of Python, Django, and Postgres. The SemanticDB patient registry is currently based on 4Suite, Python, a significant amount of XML processing and MySQL. I'm keen on building a simple hello-world PHR for managing my blood pressure readings and medications as a first iteration to see how far I can go with my current toolset: Akara (for the web infrastructure), Amara (for the XML processing), rdflib (for the RDF processing), FuXi (for any logical entailment and query re-writing), and CPR for the medical record ontology.

Indivo also includes a Python implementation of OAuth. I've been doing alot of research regarding how OAuth can be adopted as a cryptographically safe mechanism to delegate (subscribed) access to PHR content (similarly to facebook content subscription).

I will have alot more to say on this general topic.  Stay tuned!


There Should be an Altruistic Intersection of Innovative Technology and Healthcare

Note: This is a semi-rant on the current state of healthcare and innovative technology and why we all should be motivated to do something more about it. The opinions expressed here are mine and mine alone (Chimezie Ogbuji).

We recently wrote-up a case study for the W3C Semantic Web Education and Outreach Interest Group:

"A Semantic Web Content Repository for Clinical Research"

A major difference between the user experience with SemanticDB and the previous interface to the relational technology-based Cardiovascular Information Registry (CVIR) that has accelerated adoption of Semantic Web technologies is the use of local terminology familiar within the domain rather than terms that are a consequence of the physical organization of the data. In addition, the model of the domain (expressed in OWL) is more amenable to extensions typically associated with targeted studies that introduce additional variables.

It is an overview of the work we have been doing on clinical research driven by the value of having well-curated population-level patient data. It is a very appropriate use case for the semantic web in two respects: the problems addressed by the specific technologies used are directly relevant for clinical research, and certain sociological aspects of the semantic web (altruism through innovative technology, open communities / standards / software, etc.). This later point isn't emphasized often enough, though I've been thinking quite a bit about it lately as I've been developing a compact ontology for medical records. This started as a side project associated with the activities in the W3C Semantic Web Healthcare and Life Sciences Interest Group that I am involved in but has since become a personal project to investigate a personal philosophy that has recently come in contact with the nature of my current work through a tragedy in my family.

One of the things, I would like to do is learn a bit about the ailments in my family through active engagement of the science behind these ailments. I'm a software hobbyist with aspirations for contributing to pragmatic application of knowledge representation to common human problems. I have access to all the technologies and tools that can make a personal medical record repository a reality for me. I have access to a massive, freely available, well organized ontology of clinical phenomenon (GALEN). Common sense suggests that there is no one more motivated to learn if such an excercise is fruitful than myself. I could sit around, waiting for modern medicine to catch up with the reality of the innovative technologies of today, but why should I wait? Why should we wait is the question I really want to ask, but at the very least I can do something about my immediate situation (we have royalty-free standards, open source software, and open communities to thank for that).

If I have tools which can draw sound conclusions from well-currated information about all my medical history (and the medical history of my loved ones), document the complete set of justifications to these conslusions, reduce forms-based management of this information to a trivial task, and can be stored in a content repository, is it not in my interest to take advantage of these tools to the benefit of my health and the health of my loved ones?

To a certain extent, applying innovative technology at the point of care or for research purposes is a win-win. No brainer, really. At least it shouldn't be. It is in the best interest of both healthcare providers and recipients of healthcare services to leverage innovative technologies.

I work for a non-for-profit organization with a mission statement to the people of Cleveland. I'm one of those types who take such thing serously (oaths, mission statements, etc.). I was born (and essentially raised) in the greater Cleveland area. I have (young) children and family here. In addition, there is a strong history of hypertension and diabetes in my genetic lineage. I've lost loved ones at the point of care. The combination of these things makes the work I do much more relevant for me and as such I take it very seriously.

The ridiculous cost of healthcare, its effectiveness, and curation of expressive, patient data for the benefit of scientific reserach should be thought of first as a problem that modern science has a duty to solve rather than simply a business oppurtunity. A certain minimal amount of altruism is required. Anything less would be a diservice to the silent oaths that nurses take when they dedicate their professional lives to the healthcare of the populace with a vigor that only few can demonstrate. My mom was (and is) an incredible nurse, so I should know a little something about this.

At any rate, I think collectively we sit at a point of transition and reformation in the way healthcare information is housed, managed, and leveraged. I believe the shape of things to come will depend largely on how well we understand that altruism deserves a very prominent seat in all this. Everyone is effected by unecessarily expensive healthcare costs, even the providers themselves.

Chimezie Ogbuji

via Copia

Where does the Semantic Web converge with the Computerized Patient Record?

I've been thinking alot about the "Computer-based Patient Record: CPR", an acronym as unlikely as GRDDL but once again, a methodology expressed as an engineering specification. In both cases, the methodology is a mouthful, but a coherent architectural "style" and requires a mouthful of words to describe. Other examples of this:

  • Representation State Transfer
  • Rich Web Application Backplane
  • Problem-oriented Medical Record
  • Gleaning Resource Descriptions from Dialects of Languages

The term itself was coined (I think) by the Institute of Medicine [1]. If you are in healthcare and are motivated by the notion of using technology to make healthcare effective and inexpensive as possible, you should do the Institute a favor and buy the book:

National Institutute of Medicine, The Computer-Based Patient Record: An Essential Technology for Health Care - Revised Edition., 1998, ISBN: 0309055326.

I've written some recent slides that are on the W3C ESW 'wiki' which all have something to do with the idea in one way or another:

The nice thing about working in a W3C Interest Group is that the work you do is for the general publics benefit, so it is a manefestation of the W3C notion of the Semantic Web, which primarily involves a human social process.

Sorta like a technological manefestation of our natural darwinian instinct.

That's how I think of the Semantic Web, anyways: as a very old, living thread of advancements in Knowledge Representation which intersected with an anthropological assesment of some recent web architecture engineering standards.

Technology is our greatest contribution and so it sohould only make sense that wherer we use it to better our health it should not come as a cost to us. The slides reference and include a suggested OWL-sanctioned vocabulary for basically implementing the Problem-oriented Medical Record (a clinical methodology for problem solving).

I think the idea of a free (as in beer) vocabulary for people who need healthcare has an interesting intersection with the pragmatic parts of the Semantic Web (avoiding the double quotes) vision. I have exercised-induced asthma (or was "diagnosed" as such when I was younger). I still ran Track-and-Field in Highschool and was okay after an initial period where my lungs had to work overtime. I wouldn't mind hosting RDF content about such a "finding" if it was for my person benefit that a piece of software could do something useful for me in an automated, deterministic way.

"HL7 CDA" seems to be a freely avaiable, well-organized vocabulary for describing messages dispatched between hospital systems. And I recently wrote a set of XSLT templates which extract predicate logic statemnts about a CDA document using the POMR ontology and the other freely available "foundational ontologies" it coordinates. The CDA document on has a nice concise description of the technological merits of HL7 CDA:

The HL7 Clinical Document Architecture is an XML-based document markup standard that specifies the structure and semantics of clinical documents for the purpose of exchange. Known earlier as the Patient Record Architecture (PRA), CDA "provides an exchange model for clinical documents such as discharge summaries and progress notes, and brings the healthcare industry closer to the realization of an electronic medical record. By leveraging the use of XML, the HL7 Reference Information Model (RIM) and coded vocabularies, the CDA makes documents both machine-readable (so they are easily parsed and processed electronically) and human-readable so they can be easily retrieved and used by the people who need them. CDA documents can be displayed using XML-aware Web browsers or wireless applications such as cell phones..."

The HL7 CDA was designed to "give priority to delivery of patient care. It provides cost effective implementation across as wide a spectrum of systems as possible. It supports exchange of human-readable documents between users, including those with different levels of technical sophistication, and promotes longevity of all information encoded according to this architecture. CDA enables a wide range of post-exchange processing applications and is compatible with a wide range of document creation applications."

A CDA document is a defined and complete information object that can exist outside of a messaging context and/or can be a MIME-encoded payload within an HL7 message; thus, the CDA complements HL7 messaging specifications.

If I could put up a CDA document describing the aspects of my medical history that were in my benefit to be freely available (at my discretion), I would do so in the event some piece of software could do some automated things for my benefit. Leveraging a vocabulary which essentially grounds an expressive variant of predicate logic in a transport protocol makes the chances that this happens, very likely. The effect is as multiplicative as the human population.

The CPR specification is also very well engineered and much ahead of its time (it was written about 15 years ago). The only technological checkmark left is a uniform vocabulary. Consensus stands in the way of uniformity, so some group of people need to be thinking about how the "pragmatic" and anthropological notions of the Semantic Web can be realized with a vocabulary about our personally controlled, public clinical content. Don't you think?

I was able to register the /cpr top level PURL domain and the URL resolves to the OWL ontology with commented imports to other very relevant OWL ontologies. Once I see a pragmatic demonstration of leaving owl:imports in a 'live' URL, I'll remove them. It would be a shame if any Semantic Web vocabulary terms came in conflict with a legal mandate which controlled the use of a vocabulary.

Chimezie Ogbuji

via Copia