Rick Jelliffe has been working on XML metrics for a while. As I reported in "Thinking XML: XMLOpen and more XML Hacks", discussing Rick's presentation at XMLOpen 2004:
Jelliffe's talk was actually about his experiences trying to come up with metrics of XML schema complexity. The idea was to get an index number to help estimate the difficulty of implementing processing tasks (such as creating an XSLT transform) for a vocabulary and the typical uses for the vocabulary. Jelliffe's formula was a count of element types, attributes, and various special cases of these measured either from a DTD or from one or more instance documents. While there was some discussion of the exact details of such measurements -- for example, the extent to which structured fields and controlled vocabularies within content complicated processing -- the general idea turned out to be one that others had considered and even implemented. I mentioned that at Fourthought, the consultancy where I practice, we have created a lightweight measure to estimate how hard it would be to develop an XML schema (in RELAX NG) given the outlines of a vocabulary needed by the client. It will be interesting to see whether the industry begins to come up with general measurements of XML language complexity, and even to standardize such measurements, perhaps along lines that are traceable to ISO standards for software quality.
Recently Rick published a series of Weblog postings on XML.com on the topic.
- "Metrics for XML Projects #1: Element and Attribute Count"
- "Metrics for XML Projects #2: Production Count"—“How complex is this schema (or DTD)?”
- "Metrics for XML Projects #3: XML Mapping Completeness Ratio"—“How complete a mapping can be made from a document in one schema to a document in another schema?”
- "Metrics for XML Projects #4: XML Mapping Additions Ratio"—“How many fields are in the intended schema that are not in the original schema?”
- "Metrics for XML Projects #5: Structured Document Complexity Metric"—“How complex is this document set or schema?”
A commenter brought up GMX/V,
LISA OSCAR's latest standard GMX/V (Global Information management Metrics eXchange - Volume) has been approved and is going through its final public comment phase. GMX/V tackles the issue of word and character counts and how to exchange localization volume information via an XML vocabulary. GMX/V finally provides a verifiable, industry standard for word and character counts. GMX/V mandates XLIFF as the canonical form for word and character counts.
The main idea is to provide LOE and thus cost estimates for l10n efforts.