SemanticDB: A CMS Methodology for the Enterprise Semantic Web

This is a heads up that this weekend I will be writing a full length article giving an architectural overview of SemanticDB, a Content Management System methodology (and implementation) we have been developing at the Cleveland Clinic Foundation over the last 4 years (since I've been there). I hope this may trigger more public dialog (which unfortunately has not been happening) about how we've been leveraging Semantic Web and document management (XML-related technologies) W3C standards to build a robust platform to facilitate all aspects of clinical research.

I believe most of our success with SemanticDB can be attributed to the strenghts of the standards being leveraged, the opensource tools (specifically: Python, 4Suite, RDFLib, FuXi, Paste, and FormsPlayer) which serve as primary infrastructure, and the enthusiasm of the relevant communities behind these open standards and opensource software tools. As such, it is only fair that these communities become aware of examples which demonstrate how this arena is slowly transitioning from the realm of pure research to practical problem solving. In addition, I hope it may contribute to the growing body of literature demonstrating concrete problems being solved with these technologies in a way that simply would not have been possible via legacy means.

Below is the abstract from a technological whitepaper that has been in clandestine distribution as we have sought to ramp up our efforts:

SemanticDB represents a methodology for building and maintaining a highly flexible content management system built around a centralized vocabulary.
This vocabulary incorporates a formal semantics (via a combination of an OWL ontology and a N3 ruleset) for heirarchical document composition and an abstract framework for modelling terms in a domain in a way that facilitates semi-automated use of native XML and RDF representations.

It relies on a very recent set of technologies (Semantic Web technologies) to automate certain aspects of data entry, structure, storage, display, and retrieval with minimal intervention by traditional database administrators and computer programmers. A SemanticDB instance is built around a domain model expressed using a Data Node Model. From the domain model various XML/RDF management components are generated. The Data Mason takes a domain model and generates XML schemas, formal ontologies, XSLT transforms, stored queries, XML templates, document definitions, etc.

The ScreenCompiler takes an abstract representation of a data entry form (with references to terms in a domain model) and generates a user interface in a particular host language (currently XForms)

Below is a core diagram of the methodology which facilitates semi-automated data management of XML & RDF content:

The Data Mason and automated XML/RDF Data Management

Watch this space..

Chimezie Ogbuji

via Copia
2 responses
Data is pushed to the World Wide Web primarily for human interpretation, not for the machines. The semantic web technologies provide a standardized way to interpret the data on the web by the machines.

The SemanticDb project is based on the semantic web technology standards. It provides an extensible, expressive, automated, accessible and scalable way for patient clinical data collection and storage.

In medical researches the knowledge advances significantly in time. This may alter the query needs on the data. The relational data model does not provide the flexibility for these changes. The SemanticDb project addresses these issues by storing the data as RDF triples which is the base for the SemanticDb's comprehensive centralized vocabulary. Using the SPARQL query language the RDF graphs are then processed for complex queries.