IDREF=NPC?

Interesting (as always) musings from Rick Jelliffe:

There has been good work on the theoretical classification of schema languages over the last two or three years.

My impression is that as soon as your schema language supports IDREF it is stuffed, from an NP POV: Schematron, XSD, RELAX NG, DTDs, the lot!

Theoretical classification is important for know what the characteristics of things are, and what pathological problems implementations may have to deal with. But people who reject one language or another on theoretical grounds alone, without considering their pragmatic value, need to have an alternative otherwise they are troll-like.

The emphasis is mine, marking the bit that caught my attention. So my guess is that in his theory IDREF==NPC (NP complete). Interesting. I haven't done any formal analysis, but when I ponder it, I can imagine ID reference checking as similar to problems I know to be P. I can't really think off head of a similar, known NPC problem, but that doesn't mean much. It can be fairly subtle factors of a problem that establish its NP profile.

It is worth noting, as Eric van der Vlist once had to remind me, that ID/IDREF integrity support is not mandated in RELAX NG validators. Another example of the far-sightedness of Mr. Clark, Murata-san and co? From "The Design of RELAX NG":

The RELAX NG TC spent a considerable amount of time considering what support RELAX NG should provide for enforcing identity (uniqueness and cross-reference) constraints. In the end, the conclusion was that identity constraints were better separated out into a separate specification. Accordingly, RELAX NG itself provides no support for identity constraints. RELAX NG DTD Compatibility 12 provides support for traditional XML ID/IDREF attributes. There were a number of reasons for preferring separation. One reason is the relative difference in maturity. RELAX NG is based on finite tree automata; this is an area of computer science that has been studied for many years and is accordingly mature and well understood. The use of grammars for specifying document structures is based on more than 15 years of practical experience. By contrast, the area of identity constraints (beyond simple ID/IDREF constraints) is much less mature and is still the subject of active research. Another reason is that it is often desirable to perform grammar processing separately from identity constraint processing. For example, it may be known that a particular document is valid with respect to a grammar but not known that it satisfies identity constraints. The type system of the language that was used to generate a document may well be able to guarantee that it is valid with respect to the grammar; it is unlikely that it will be able to guarantee that it satisfies the identity constraints. A document assembled from a number of components may guaranteed to be valid with respect to a grammar because of the validity of the components, but this will often not be the case with identity constraints. Even when a document is known to satisfy the identity constraints as well as be valid with respect to the grammar, it may be necessary to perform identity constraint processing in order to allow application programs to follow references. Another reason is that no single identity constraint language is suitable for all applications. Different applications have identity constraints of vastly different complexity. Some applications have complex constraints that span multiple documents 22. Other applications need only a modest increment on the XML ID/IDREF mechanism. A solution that is sufficient for those applications with complex requirements is likely to be overkill for those applications with simpler requirements.

Well reasoned, like almost everything in RELAX NG (and Schematron).

[Uche Ogbuji]

via Copia