Service Modeling Language

I'd long ago put up a very thick lens for looking at any news from the SOA space. With analysts and hungry vendors flinging the buzzword around in a mindless frenzy it came to the point where one out of twenty bits of information using the term were pure drivel. I do believe there is some substance to SOA, but it's definitely veiled in a thick cloud of the vapors. This week Service Modeling Language caught my eye through said thick lens, and I think it may be one of the more interesting SOA initiatives to emerge.

One problem is that the SML blurbs and the SML spec seem to have little substantive connection. It's touted as follows:

The Service Modeling Language (SML) provides a rich set of constructs for creating models of complex IT services and systems. These models typically include information about configuration, deployment, monitoring, policy, health, capacity planning, target operating range, service level agreements, and so on.

The second sentence reads at first glance as if it's some form of ontology of systems management, basically an actual model, rather than a modeling language. No big deal. I see modeling languages and actual models conflated all the time. Then I catch the "typically", and read the spec, and it becomes evident that SML has much more to it than "models of complex IT services and systems". It's really a general-purpose modeling language. It builds on a subset of WXS and a subset of ISO Schematron, adding a handful of useful data modeling constructs such as sml:unique and sml:acyclic (the latter is subtle, but experienced architects know how important identifying cyclic dependencies is to risk assessment).

I'm still not sure I see the entire "story" as it pertains to services/SOA and automation. I guess if I use my imagination I could maybe divine that an architect publishes a model of the IT needs for a service, and some management tool such as Tivoli or Unicenter generates reports on a system to flag issues or to assess compatibility with the service infrastructure need? (I'm not sure whether this would be a task undertaken during proposal assessment, systems development, maintenance, all of the above?). I imagine all the talk of automation in SML involves how such reports would help reduce manual assessment in architecture and integration? But that can't be! Surely SML folks have learned the lessons of UDDI. Some assessment tasks simply cannot be automated.

SML shows the fingerprints of some very sharp folks, so I assume I'm missing soemthing. I think that much more useful than the buzzword-laden blurbs for SML would be an document articulating some nice, simple use cases. Also, I think the SML spec should be split up. At present a lot of its bulk is taken up defining WXS and ISO Schematron subsets. It seems a useful profile to have, but it should be separated from the actual specification of SML modeling primitives.

[Uche Ogbuji]

via Copia

Tagging meets hierarchies: XBELicious

The indefatigable John L. Clark recently announced another very useful effort, the start of a system for managing your del.icio.us bookmarks as XBEL files. Of course not everyone might be as keen on XBEL as I am, but even if you aren't, there is a reason for more general interest in the project. It uses a very sensible set of heuristics for mapping tagged metadata to hierarchical metadata. del.icio.us is all Web 2.0-ish and thus uses tagging for organization. XBEL is all XML-ish and thus uses hierarchicy for same. I've long wanted to document simple common-sense rules for mapping one scenario to another, and John's approach is very similar to sketches I had in my mind. Read section 5 ("Templates") of the XBELicious Installation and User's Guide for an overview. Here is a key snippet:

For example, if your XBEL template has a hierarchy of folders like "Computers → linux → news" and you have a bookmark tagged with all three of these tags, then it will be placed under the "news" folder because it has tags corresponding to each level in this hierarchy. Note, however, that this bookmark will not be placed in either of the two higher directories, because it fits best in the news category. A bookmark tagged with "Computers" and "news" would only be placed under "Computers" because it doesn't have the "linux" tag, and a bookmark tagged with "linux" and "news" would not be stored in any of these three folders.

XBELicious is work in progress, but worthy work for a variety of reasons. I hope I have some time to lend a hand soon.

[Uche Ogbuji]

via Copia

"Create vector graphics in the browser with SVG"

"Create vector graphics in the browser with SVG"

Subtitle: Add two-dimensional vector graphics to your Web pages with the flexible, XML graphics format of Scalable Vector Graphics (SVG) 1.1.
Synopsis: Learn step-by-step how to incorporate Scalable Vector Graphics (SVG) into Web pages using real browser examples. SVG 1.1, an XML language for describing two-dimensional vector graphics, provides a practical and flexible graphics format in XML, despite the language's verbosity. Several browsers recently completed or announced built-in SVG support.

I was early to SVG, exploring it in this 2001 article, but in recent years I haven't had as much time as I'd have liked to work with this fun technology. I was able to put it to use in projects last year, and I think it's good timing, considering recent inroads SVG has been making in browser and mobile spaces. I've been lucky to have much fewer problems than Eric has. Most of what I've tried just works, and does so in Firefox, Opera 9 and MSIE/Adobe SVG Viewer.

[Uche Ogbuji]

via Copia

XML metrics

Rick Jelliffe has been working on XML metrics for a while. As I reported in "Thinking XML: XMLOpen and more XML Hacks", discussing Rick's presentation at XMLOpen 2004:

Jelliffe's talk was actually about his experiences trying to come up with metrics of XML schema complexity. The idea was to get an index number to help estimate the difficulty of implementing processing tasks (such as creating an XSLT transform) for a vocabulary and the typical uses for the vocabulary. Jelliffe's formula was a count of element types, attributes, and various special cases of these measured either from a DTD or from one or more instance documents. While there was some discussion of the exact details of such measurements -- for example, the extent to which structured fields and controlled vocabularies within content complicated processing -- the general idea turned out to be one that others had considered and even implemented. I mentioned that at Fourthought, the consultancy where I practice, we have created a lightweight measure to estimate how hard it would be to develop an XML schema (in RELAX NG) given the outlines of a vocabulary needed by the client. It will be interesting to see whether the industry begins to come up with general measurements of XML language complexity, and even to standardize such measurements, perhaps along lines that are traceable to ISO standards for software quality.

Recently Rick published a series of Weblog postings on XML.com on the topic.

A commenter brought up GMX/V,

LISA OSCAR's latest standard GMX/V (Global Information management Metrics eXchange - Volume) has been approved and is going through its final public comment phase. GMX/V tackles the issue of word and character counts and how to exchange localization volume information via an XML vocabulary. GMX/V finally provides a verifiable, industry standard for word and character counts. GMX/V mandates XLIFF as the canonical form for word and character counts.

The main idea is to provide LOE and thus cost estimates for l10n efforts.

[Uche Ogbuji]

via Copia

"Tip: Rescue terrible HTML with TagSoup"

Well, since I've so emphatically broken my Weblogging pause for The Cup, I'd better post some professional items.

"Tip: Rescue terrible HTML with TagSoup"

Subtitle: Turn poorly formed HTML into valid XHTML
Synopsis: XHTML is a friendly enough format for parsing and screen-scraping, but the Web still has a lot of messy HTML out there. In this tip Uche Ogbuji demonstrates the use of TagSoup to turn just about any HTML into neat XHTML.

TagSoup is very handy. EVen though it's a Java project I put it to use from Python code fairly often. It also recently went full 1.0.

[Uche Ogbuji]

via Copia

Cleveland Clinic Job Posting for Data Warehouse Specialist

Cleveland Clinic Foundation has recently posted a job position for a mid-level database developer with experience in semi-structured data binding, transformation, modeling, and querying. I happen to know that experience with the following are big pluses:

  • Python
  • XML data binding
  • XML / RDF querying (SPARQL,XPath,XQuery,etc..)
  • XML / RDF modelling
  • Programming database connections
  • *NIX System administration
  • Web application frameworks

You can follow the above link to the post an application.

Chimezie Ogbuji

via Copia

"Thinking XML: Good advice for creating XML"

An earlier article (published in January) that I forgot to announce:

"Thinking XML: Good advice for creating XML"

Subtitle: Principles of XML design from the community at large
Synopsis: The use of XML has become widespread, but much of it is not well formed. When it is well formed, it's often of poor design, which makes processing and maintenance very difficult. And much of the infrastructure for serving XML can compound these problems. In response, there has been some public discussion of XML best practices, such as Henri Sivonen's document, "HOWTO Avoid Being Called a Bozo When Producing XML." Uche Ogbuji frequently discusses XML best practices on IBM developerWorks, and in this column, he gives you his opinion about the main points discussed in such articles. [Also discusses "Monastic XML," by Simon St. Laurent.]

[Uche Ogbuji]

via Copia

"Thinking XML: Review of RFC 3470: Guidelines for the use of XML"

"Thinking XML: Review of RFC 3470: Guidelines for the use of XML"

Thinking XML author Uche Ogbuji continues with the theme of XML best practices. In the previous installment "Good advice for creating XML," you looked at XML design recommendations from experts. In this article, you'll find recommendations from the Internet Engineering Task Force (IETF), an organization whose technical papers drive most Internet protocols. The IETF's XML recommendations are gathered together in RFC 3470: "Guidelines for the Use of Extensible Markup Language (XML) within IETF Protocols."

[Uche Ogbuji]

via Copia

"Tip: Remove sensitive content from your XML samples with XSLT"

"Tip: Remove sensitive content from your XML samples with XSLT"

Do you need to share samples of your XML code, but can't disclose the data? For example, you might need to post a sample of your XML code with a question to get some advice with a problem. In this tip, Uche Ogbuji shows how to use XSLT to remove sensitive content and retain the basic XML structure.

I limited this article to erasing rather than obfuscating sensitive content, which can be done with XSLT 1.0 alone. With EXSLT (or even XSLT 2.0) you can do some degree of obfuscation, allowing you to possibly preserve elements of character data that are important to the problem under discussion. Honestly, though, I prefer to solve this problem with even more flexible tools. As a bonus the following is a bit of 4Suite/SAX code that uses a SAX filter to obfuscate character data by adding a random shift to the ordinal of each character in the Unicode alphanumeric class. This way if exotic characters were part of the problem you're demonstrating, they'd be left alone. It's easy to use the code as a template, and usually all you have to change is the obfuscate function or the obfuscate_filter class in order to fine-tune the workings.

import re
import random
from xml.sax import make_parser, saxutils
from Ft.Xml import CreateInputSource, Sax

RANDOM_AMP = 15
ALPHANUM_PAT = re.compile('\w', re.UNICODE)

def obfuscate(old):
    def mutate(c):
        return unichr(ord(c.group())+random.randint(-RANDOM_AMP,RANDOM_AMP))
    return ALPHANUM_PAT.subn(mutate, old)[0]

class obfuscate_filter(saxutils.XMLFilterBase):
    def characters(self, content):
        saxutils.XMLFilterBase.characters(self, obfuscate(content))
        return

if __name__ == "__main__":
    XML = "http://cvs.4suite.org/viewcvs/*checkout*/Amara/demo/labels1.xml"
    parser = make_parser(['Ft.Xml.Sax'])
    filtered_parser = obfuscate_filter(parser)
    handler = Sax.SaxPrinter()
    filtered_parser.setContentHandler(handler)
    filtered_parser.parse(CreateInputSource(XML))

This code uses recent fixes and capabilities I checked into 4Suite CVS last week. I think all the needed details to understand the code are in the SAX section of the updated 4Suite docs, which John Clark has already posted.

[Uche Ogbuji]

via Copia

"Semantic Transparency" by any other name

In response to "Semantic hairball, y'all" Paul Downey responded with approval of my skewering of some of the technologies I see dominating the semantics space, but did say:

..."semantic transparency" in "XML Schema" sounds just a little too scary for my tastes....

This could mean that the names sound scary, or that his interpretation of the idea itself sounds scary. If it's the latter, I'll try to show soon that the idea is very simple and shouldn't be at all scary. If it's the former, the man has a point. "Semantic Transparency" is a very ungainly name. As far as I can tell, it was coined by Robin Cover, and I'm sure it was quite suitable at the time, but for sure right now it's a bit of a liability in the pursuit that most interests me.

The pursuit is of ways to build on the prodigious success of XML to make truly revolutionary changes in data architecture within and across organizations. Not revolutionary in terms of the technology to be used. In fact, as I said in "No one ever got fired for...", the trick is to finally give age-old and well proven Web architecture more than a peripheral role in enterprise architecture. The revolution would be in the effects on corporate culture that could come from the increased openness and collaboration being fostered in current Web trends.

XML ushered in a small revolution by at least codifying a syntactic basis for general purpose information exchange. A common alphabet for sharing. Much is made of the division between data and documents in XML (more accurate terms have been proposed, including records versus narrative, but I think people understand quite well what's meant by the data/documents divide, and those terms are just fine). The key to XML is that even though it's much more suited to documents, it adapts fairly well to data applications. Technologies born in the data world such as relational and object databases have never been nearly as suitable for document applications, despite shrill claims of relational fundamentalists. XML's syntax mini-revolution means that for once those trying to make enterprise information systems more transparent by consolidating departmental databases into massive stores (call them the data warehouse empire), and those trying to consolidate documents into titanic content management stores (call them the CMS empire) can use the same alphabet (note: the latter group is usually closely allied with those who want to make all that intellectual capital extremely easy to exchange under contract terms. Call them the EDI empire). The common alphabet might not be ideal for any one side at the table, but it's a place to start building interoperability, and along with that the next generation of applications.

All over the place I find in my consulting and speaking that people have embraced XML just to run into the inevitable limitations of its syntactic interoperability and scratch their head wondering OK, what's the next stop on this bus route? People who know how to make money have latched onto the suspense, largely as a way of re-emphasizing the relevance of their traditional products and services, rather than as a way to push for further evolution. A few more idealistic visionaries are pushing such further evolution, some rallying under the banner of the "Semantic Web". I think this term is, unfortunately, tainted. Too much of the 70s AI ambition has been cooked into recent iterations of Semantic Web technologies, and these technologies are completely glazing over the eyes of the folks who really matter: the non-Ph.Ds (for the most part) who generate the vast bodies of public and private documents and data that are to drive the burgeoning information economy.

Some people building on XML are looking for a sort of mindshare arbitrage between the sharp vendors and the polyester hippies, touting sloppy, bottom-up initiatives such as microformats and folksonomies. These are cheap, and don't make the head spin to contemplate, but it seems clear to anyone paying attention that they don't scale as a way to consolidate knowledge any more than the original Web does.

I believe all these forces will offer significant juice to next generation information systems, and that the debate really is just how the success will be apportioned. As such, we still need an umbrella term for what it means to build on a syntactic foundation by sharing context as well. To start sharing glossaries as well as alphabets. The fate (IMO) of the term "Semantic Web" is instructive. I often joke about the curse of the s-word. It's a joke I picked up from elsewhere (I forget where) to the effect that certain words starting with "s", and "semantic" in particular are doomed to sound imposing yet impossibly vague. My first thought was: rather than "semantic transparency", how about just "transparency? The problem is that it's a bit too much of a hijack of the generic. A data architect probably will get the right picture from the term, but we need to communicate to ithe wider world.

Other ideas that occur to me are:

  • "information transparency"
  • "shared context" or "context sharing"
  • "merged context"
  • "context framing"
  • "Web reference"

The latter idea comes from my favorite metaphor for these XML++ technologies: that they are the reference section (plus card catalog) of the library (see "Managing XML libraries"). They are what makes it possible to find, cross-reference and understand all the information in the actual books themselves. I'm still fishing for good terms, and would welcome any suggestions.

[Uche Ogbuji]

via Copia