Copia

Wikipedia Links to Primary Gua

With regards to my last entry on the primary trigrams, Wikipedia links to the fully developed 8 primary hexagrams (out of 64) are below with their binary values and names (I'm partial to Alfred Huang's translation of the symbol names - instead of the more common Richard Wilhelm translations):

Initiating - 111111

Responding - 0000000

Keeping Still - 001001

Darkness - 010010

Proceeding Humbly - 011011

Taking Action - 100100

Brightness - 101101

Joyful - 110110

I imagine these would be the most appropriate URI's to represent each hexagram if ever modeled in RDF.

-- Chimezie

[Uche Ogbuji]

via Copia

Some DCMI clarification on dc:type vocbularies

In an earlier entry I discussed the strange hotch-potch that's the set of recommended values for dc:type.

I did find a resource with some insight, straight from the Dublin Core (it seems). The question is:

we classify text documents with a descriptive term from a controlled vocabulary -- 'News', 'Press Releases', 'Situation Reports', etc. Is this a DCM element set 'Description', a 'Format', a 'Type' or, if not any of these, then what? If it is a 'Type', what is the appropriate sub-element? We could use Type -> Text, but that doesn't refer to a descriptor used for categorization of data, which is our purpose for using such terms in the first place.

This to me is clearly the province of dc:type, and the question is how one reconciles a value such as "Press Release" with the approach DCMI has taken with their recommended values. The DCMI reply (drafted by Pete Johnston):

The examples you cite for your "descriptor used for categorization of data" look to me like examples of values for dc:type. They describe (I think! ) "the nature or genre of the content of the resource". DCMI does provide a simple, high level "DCMI Type Vocabulary" (which, as you note, includes the term "Text" ), but you are not limited to using that Type Vocabulary and can use your own "local" vocabulary to provide values for dc:type. Having said that, you need to bear in mind what applications will be consuming your metadata and whether those applications are programmed to interpret your values as you expect. Especially if you are sharing your metadata with applications that may not have "built-in knowledge" of your "local" Type Vocabulary, it is considered good practice to include a term from the DCMI Type Vocabulary where possible. All DC metadata elements are repeatable (unless in the context of your application some additional constraints have been specified ), so you could include two occurrences of dc:type in each metadata description: dc:type = "Text" dc:type = "Press Release"

Pete goes on to explain the concept of DCMI element qualification (I do suggest you be familiar with this concept if you're using DCMI), but that doesn't seem to me to be at the heart of my issue. I'm just surprised that DCMI doesn't have such a sample vocabulary available.

Anyway, in my work at Sun, we're probably going to have to roll our own. One source of fodder I found are in LOM (see my article). The values are:

Exercise, Simulation, Questionnaire, Diagram, Figure, Graph, Index, Slide, Table, Narrative Text, Exam, Experiment, ProblemStatement, SelfAssesment

Too pedagogically oriented for our use, but some fodder nevertheless. Another source I found is PRISM (see my article). It's table #16 in the PRISM 1.2 spec. And here is how the PRISM spec introduces dc:type (from section 5.2.15):

Definition: The style of presentation of the resource's content, such as image vs. sidebar.
Comment: The `type' of a resource can be many different things. In PRISM descriptions, the dc:type element takes values that indicate the style of presentation of the content, such as "Map", "Table", or "Chart". This is in contrast to prism:category, which represents the genre, or stereotypical intellectual content type, of the resource. For example, the genre `electionResults' can be presented in a map, a table, or a chart. Recommended practice for PRISM implementations is to use a value from Table 16: Controlled Vocabulary of Presentation Styles, expressed as a URI reference. Implementations MUST also be able to handle text values, but interoperation with text values cannot be guaranteed. To describe the physical size or digital file format of the resource, use the dc:format element.
Example: The two examples below show how prism:type, prism:category, and dc:format all describe different aspects of a resource. For brevity, the examples below use relative URI references. Assume that they are within the scope of a base URI declaration: xml:base="http://prismstandard.org/vocabularies/1.2/"

<dc:type rdf:resource="resourcetype.xml#article"/>
<prism:category rdf:resource="category.xml#column"/>
<dc:format>text/html</dc:format>
<dc:type rdf:resource="resourcetype.xml#birdsEye"/>
<prism:category rdf:resource="category.xml#photo"/>
<dc:format>image/jpeg</dc:format>

[Uche Ogbuji]

via Copia

Running multiple python versions in home directory install

I've had to do this several times in order to install 4Suite to a locally installed python directory on a server I didn't have SU on - or was managing the packages with a utility that wasn't so good with python packages (yum, apt sometimes). I had to dig into the old akara IRC logs for these tid-bits.

uche_: Here are the contents of ~jkloth/bin/python2.4 on Jeremy's computer:

#!/bin/bash
export PYTHONPATH=$HOME/lib/python2.4
exec /usr/bin/python2.4 "$@"

uche_: This assumes $HOME/bin is on your path and before where ever your python is installed

chimezie@WuChi 4Suite $ $HOME/bin/python2.4 setup.py config --home=$HOME
chimezie@WuChi 4Suite $ $HOME/bin/python2.4 setup.py install
chimezie@WuChi 4Suite $ which 4ss_manager
/home/chimezie/bin/4ss_manager

chimezie: In order to run 4Suite scripts in this isolated environment, you need to execute them with the $HOME/bin/ explicitely

chimezie@WuChi devel $ $HOME/bin/python2.4 /home/chimezie/bin/4ss_manager start -n  -u <user> -p <password>
.. snip log ..
Apr 10 18:31:05 Controller: [notice] Controller configured -- resuming normal operations

[Uche Ogbuji]

via Copia

The Earliest Juncture of Semiotics and Mathematics

The Trigrams and My Interest

My interest in the trigrams of the very ancient Yijing is mostly scholastic. It's the coherent set of philosophies (or canon), derived from these trigrams and what amounts to a mathematical interpretation of everything that have had a more concrete effect on how I go about my life and how I deal with adversity.

The trigrams are many things, but their most interesting characteristics (from a secular point of view) are their direct analogy to the binary numerical system as well as the fact that they (undisputedely) represent the earliest coherent example of humankind's study of semionics:

the philosophical theory of the functions of sign and symbols

The infinite Characteristics of the Trigrams

The first (and less emphasized) of these two characteristics of the trigrams was formally observed by the German mathematician Gottfried Wilhelm Leibniz (the original observation is probably as old as the purported author of the trigrams: FuXi). He, is the creator of the modern binary system of counting, which is the primary framework upon which microprocessor design is based (an important, historical irony).
He noticed that the concept of duality/balance evident in the trigrams' source (the yijing)) as well as the derived related philosophies are directly analogous to the binary system when you substitute 0 for dashed lines (yin - the concept of no motion) and 1 for unbroken lines (yang - the concept of motion / kinetic energy).

The trigrams are meant to be interpreted from the bottom up, so a continuation of this binary analog would have the reader tip the trigrams over to their right side and read them as binary numbers.

The Binary Analog of the Primary Gua

Below is the original horizontal arrangement of the trigrams with their corresponding binary numbers (click on each to view the corresponding SVG diagram):

Earth - 000 Mountain - 001 Water - 010 Wind - 011 Thunder - 100 Fire - 101 Lake - 101 Heaven - 111

Extension to the 64 Trigrams of the Yijing

Since, the 8 primary gua are the building blocks upon which the 64 symbols of the Yijing are built (and purportedly, everything), this binary analogy can be extended to all the 64 symbols. This is well known amongst scholars of the Yijing and below is the most famous diagram of this extension by Shao Yong (1011AD - 1077AD):

Shao Yong's Diagram

The numerical significance of the trigrams in sequence is well summarized here. This page also includes a very useful animated image of the entire sequence as a binary progression:

FuXi Sequence

The most complete resource on the subject (that I've read so far) is Alfred Huang's The Numerology of the I Ching (ISBN: 0-89281-811-5)

I was unable to embed the SVG diagrams within the page, which is a shame because the yijing trigrams are an excellent SVG use case. I hope to someday capture all 64 as SVG diagrams so the various, more popular philosophical/visual arrangements can be rendered programatically. Imagine Shao Yong's circular diagram as SVG (talk about an interesting combination of ancient numerology with modern vector graphic technology). It would prove quite a useful tool for avid students of the yijing symbols as well as make for some very interesting patterns.

[Chimezie Ogbuji]

via Copia

Sane template-like output for Amara

In an earlier entry I showed some Amara equivalents for XSLT 2 and XQuery examples. I think the main disadvantage of Amara in these cases was the somewhat clumsy XML output generation syntax. This is not an easy problem to fix. XSLT and XQuery basically work XML syntax directly into the language, to make output specification very seamless. This makes sense as long as they stick to the task of being very determinate black boxes taking one body of XML data and working it into another. But often you turn to a language like Python for XML processing because you want to blow the lid off the determinate black boxes a bit: you want to take up all the power of general-purpose computing.

With this power comes the need to streamline and modularize, and the usual first principle for such streamlining is the principle of separating the model from presentation. This is a much easier principle to state than to observe in real-life processing scenarios. We love template languages for XML and HTML generation because they are so convenient in solving real problems in the here and now. We look askance at them, however, because we know that they come with a tendency to mix model and presentation, and that we might regret the solution once it comes time to maintain it when (as inevitable) model processing requirements change or presentation requirements change.

Well, that was a longer preamble than I'd originally had in mind, but it's all boils down to my basic problem: how do I make Amara's output mechanism more readable without falling into the many pitfalls of template systems?

Here is one of the XSLT 2 examples:

<xsl:for-each select="doc('auction.xml')/site/people/person">
  <xsl:variable name="p" select="."/>
  <xsl:variable name="a" as="element(item)*">
    <xsl:for-each select="doc('auction.xml')/site/closed_auctions/closed_auction">
      <xsl:variable name="t" select="."/>
      <xsl:variable name="n" 
           select="doc('auction.xml')/site/regions/europe/item
                               [$t/itemref/@item = @id]"/>
      <xsl:if test="$p/@id = $t/buyer/person">
        <item><xsl:copy-of select="$n/name"/></item>
      </xsl:if>
  </xsl:variable>
  <person name="{$p/name}">
    <xsl:copy-of select="$a"/>
  </person>
</xsl:for-each>

In Amara 1.0b3 it goes something like:

def closed_auction_items_by_name():
    doc = binderytools.bind_file('auction.xml')
    newdoc = binderytools.create_document()
    #Iterate over each person
    for person in doc.xml_xpath(u'/site/people/person'):
        #Prepare the wrapper element for each person
        person_elem = newdoc.xml_element(
            u'person',
            attributes={u'name': unicode(person.name)}
        )
        newdoc.xml_append(person_elem)
        #Join to compute all the items this person bought in Europe
        items = [ unicode(item.name)
          for closed in doc.xml_xpath (u'/site/closed_auctions/closed_auction')
          for item in doc.xml_xpath(u'/site/regions/europe/item')
          if (item.id == closed.itemref.item
              and person.id == closed.buyer.person)
        ]
        #XML chunk with results of join
        for item in items:
            person_elem.xml_append(
                newdoc.xml_element(u'item', content=item)
            )
    #All done.  Print out the resulting document
    print newdoc.xml()

The following snippet is a good example:

person_elem = newdoc.xml_element(
            u'person',
            attributes={u'name': unicode(person.name)}
        )
        newdoc.xml_append(person_elem)

If I could turn all this into:

newdoc.xml_append_template("<person name='{person.name}'/>")

This would certainly be a huge win for readability. The curly brackets are borrowed from XSLT attribute value templates (AVTs), except that their contents are a Python expression rather than an XPath. The person element created is empty for now, but it becomes just part of the data binding and you can access it using the expected newdoc.person or newdoc.person.name.

One important note: this is very different from `"<person name='% s'/>"%(person.name)`. What I have in mind is a structured template that must be well-formed (it can have multiple root elements). The replacement occurs within the perfectly well-formed XML structure of the template. As with XSLT AVTs you can represent a literal curly bracket as {{ or }}.

The other output generation part in the example:

for item in items:
            person_elem.xml_append(
                newdoc.xml_element(u'item', content=item)
            )

Would become

for item in items:
            newdoc.person.xml_append_template("<item>{item}</item>")

This time we have the template substitution going on in the content rather than an attribute. Again I would want to restrict this entire idea to a very clean and layered template with proper XML semantics. There would be no tricks such as "<{element_name}>spam</{element_name}>". If you wanted that sort of thing you could use the existing API such as xml_element(element_name), or even use Python string operations directly.

The complete example using such a templating system would be:

def closed_auction_items_by_name():
    doc = binderytools.bind_file('auction.xml')
    newdoc = binderytools.create_document()
    #Iterate over each person
    for person in doc.xml_xpath(u'/site/people/person'):
        #Prepare the wrapper element for each person
        newdoc.xml_append_template("<person name='{person.name}'/>")
        #Join to compute all the items this person bought in Europe
        items = [ unicode(item.name)
          for closed in doc.xml_xpath (u'/site/closed_auctions/closed_auction')
          for item in doc.xml_xpath(u'/site/regions/europe/item')
          if (item.id == closed.itemref.item
              and person.id == closed.buyer.person)
        ]
        #XML chunk with results of join
        for item in items:
            newdoc.person.xml_append_template("<item>{item}</item>")
    #All done.  Print out the resulting document
    print newdoc.xml()

I think that this example is indeed more readable than the XSLT version.

One tempting thing about this idea is that all the building blocks are there. 4Suite already gives me the ability to parse and process this template very easily, and I could implement this logic without much trouble. But I also think that it deserves some serious thought (and, I hope, feedback from users). There's no hurry: I don't plan to add this capability in the Amara 1.0 cycle. I need to get Amara 1.0 out, and I'm expecting that 1.0b3 is the last stop before a release candidate.

So, things to ponder.

Security. Any time you support such arbitrary-code-in-template features the tainted string worry comes up: what happens if one is not careful with the expression that is used within a template? I think that this issue is not really Amara's responsibility. The developer using Amara should no more pass in untrusted Python expressions to a template than they would to an exec statement. They should be aware that Amara templates will execute arbitrary Python expressions, if they're passed in, and they should apply the usual precautions against tainting.

String or Unicode? Should the templates be specified as strings or Unicode? They are themselves well-formed XML, which makes me think they should be strings (XML is really defined in terms of encoded serialization, and the Unicode backbone is just an abstraction imposed on the actual encoded byte stream). But is this confusing to users? I've always preached that XML APIs should use Unicode, and my products reflect that, and for a user that doesn't understand the nuances, this could seem like a confusing exception. Then again, we already have this exception for 4Suite and Amara APIs that parse XML from strings. My leaning would be to have the template expressed as a string, but to have the results of expressions within templates coerced to Unicode. This is the right thing to do, and that's the strongest argument.

separation of model and presentation. The age-old question with such templates is whether they cause tangles that complicate maintenance. I think one can often make an empirical check for such problems by imagining what happens in a scenario where the data model operations need to change, and another scenario where the presentation needs to change.

As an example of a model change, imagine that the source for the item info was moved from an XML document to a database. I wouldn't need to change any of the templates as long as I could get the same values to pass in, and I think it's reasonable to assume I could do this. Basically, since my templates simply refer to host variables whose computation is nicely decoupled from the template code, the system passes the first test.

As an example of a presentation change, imagine that I now want to generate XHTML directly, rather than this <person><item>... business. I think the system passes this test as well. The templates themselves would have to change, but this change would be isolated from the computation of the host variables used by the templates. Some people might argue that I'm grading these tests too leniently, and that it's already problematic that the computation and presentation occurs so close together, in the same function in the same code file. I'm open to being convinced this is the case, but I'd want to hear of practical maintenance scenarios where this would be a definite problem.

So what do you think?

[Uche Ogbuji]

via Copia

Versa Diagrams

I updated the Versa by Deconstruction document with diagrams for the forward traversal operator and the distribute function (probably the two most difficult / fundamental components of Versa). They are both below:

Distribute - as Cartesian Product

[Uche Ogbuji]

via Copia

Don't buy batteries in Europe, if you can help it

Just a simple practical tip, that's all. The two times I can remember buying batteries in Europe, once in Loughborough, England, and the other time in Amsterdam, I've just about left the store in tears. Accounting for the exchange rate in both cases it came to about $8.00 for 4 AAs for name brand batteries. In the US a typical full price would be about $4.00 for 8. If you're traveling from the U.S. to Europe, take a long some spares. And if you're visiting the U.S. from Europe, consider taking back as many as your luggage allowance and scowling customs officers will allow. F'real, though.

Disclaimer: I realize that generalizing two high street stores to an entire continent is so, like, Yankee gauche, but dammit, it's easier to sort the world out that way... Hmm. I wonder how many people use that excuse to form their impressions of the continent of Africa.

[Uche Ogbuji]

via Copia

First post from Fedora Core 4

This is my first post from Fedora Core 4. I grabbed the DVD ISO via bittorrent (I also seeded to a ration of 2:1, i.e. I served up twice as much DVD ISO goodness as I downloaded. Please do the same if you can). I then used the standard upgrade option to get my main desktop on point. It went so smoothly that I threw caution to the wind and did the same for my laptop (see my post on the upgrade from FC2 to FC3). It's looking good on the laptop as well. With the new kernel, 2.6.11-1.1369_FC4, my Centrino (Intel PRO/Wireless 2100) wireless worked right out of the box (unlike with FC3) and so did my Broadcomm B44 Ethernet driver (unlike with FC3). Now that's what software evolution is all about. Speaking of Evolution, this time it was a smaller jump, from 2.0.4 to 2.2.2, and it went schmoovely (unlike with FC3).

I'll keep reporting on my experiences as I go along, but at the moment, FC4 gets both my thumbs up. Besides giving me far fewer upgrade headaches than FC3, it looks a lot better and feels a lot snappier (the interminable boot process from FC3 seems to take half the time now).

[Uche Ogbuji]

via Copia

Quotīdiē

They dug a trench, and threw him in a grave
Shallow as youth; and poured the wine out
soaking the tunic and the dry air.
They covered him lightly, and left him there.

When music comes upon the airs of Spring,
Faith fevers the blood; counter to harmony,
The mind makes its rugged testaments.
Melancholy moves, preservative and predatory.

The light is a container of treachery,
The light is the preserver of the Parthenon.
The light is lost from that young eye.
Hearing music, I speak, lest he should die.

—Richard Eberhart—"A Young Greek, Killed in the Wars"—Poetry magazine (Volume 85, February 1955)

Carl Sandburg is often put forth as the Midwestern American poet par excellence, which annoys me to no end not just because I dislike Sandburg's work very much, but also because a lot about Sandburg's institutionalization says as much about why a great poet such as Richard Eberhart never gained the recognition he deserved. Eberhart wasn't content to write the folksy, middle- American sentiment that people properly expected from farm country. He insisted on breaking his bounds by writing about the big human themes with extraordinary craft and sensibility. Richard Eberhart died on Sunday, 12 June 2005 at the age of 101.

Eberhart was born 5 April 1904, in Austin, Minnesota on a modest estate called Burr Oaks (later on the name of one of his books). He was educated in the US (University of Minnesota, Dartmouth, Harvard) and England (St. John's College). He served as private tutor to the son of King Prajadhipok of Siam (Thailand) in the early 30s, and otherwise had a fairly adventurous youth, with stints as a sailor and gunnery instructor.

As for his poetry, I'll quote from one of my favorite critics. John Wain's assessment is:

His varied and energetic life comes through in his poetry, which is rugged, inquisitive and forceful; clumsy in patches, supremely felicitous in others.

—Anthology of Modern Poetry (Hutchinson, 1963)

I think I'm lucky to have been spared Eberhart's clumsy patches: I've read him almost exclusively in anthology and journal. I have found that he is one of the most affecting writers on the horrors of war. His best known poem is this one.

You would think the fury of aerial bombardment
Would rouse God to relent; the infinite spaces
Are still silent. He looks on shock-pried faces.
History, even, does not know what is meant.

You would feel that after so many centuries
God would give man to repent; yet he can kill
As Cain could, but with multitudinous will,
No farther advanced than in his ancient furies

Was man made stupid to see his own stupidity?
Is God by definition indifferent, beyond us all?
Is the eternal truth man's fighting soul
Wherein the Beast ravens in its own avidity?

Of Van Wettering I speak, and Averill,
Names on a list, whose faces I do not recall
But they are gone to early death, who late in school
Distinguished the belt feed lever from the belt holding pawl.

—Richard Eberhart—"The Fury of Aerial Bombardment"

Eberhart was very generous in his criticism (that which I've read). Too generous at times, in my opinion, given his approval of the mess made by Ginsberg and the Beat poets. The AP article in which I read about Eberhart's death included a typical Eberhart quote:

Poems in a way are spells against death, They are milestones, to see where you were then from where you are now. To perpetuate your feelings, to establish them. If you have in any way touched the central heart of mankind's feelings, you'll survive.

And survive he did. 101 years is quite the achievement to be celebrated. Soon (July 29) we'll be celebrating the centenary of another great poet Stanley Kunitz. Kunitz's "Benediction" is one of my favorite poems to recite to my wife and sons. It's wonderful to see these poets live such complete lives, who have brought such feeling to the exploration of life's great themes.

I meant to link to "Benediction" but I can't find a respectable transcription of on-line. It deserves its own entry, so some other day I'll type it in for Quotīdiē. But I do want to mention that I found "A Young Greek, Killed in the Wars", "The Fury of Aerial Bombardment" and "Benediction" all in my favorite small poetry book, John Wain's Anthology of Modern Poetry (Hutchinson, 1963), ISBN 0090671317. It's out of print and not easy to find, even used (here are the listings on Amazon UK Marketplace). I bought it in 1988 at the University of Nigeria and it has been one of my most treasured books all this time. It's a superb collection, and if you can lay your hands on a copy, I suggest you do so.

[Uche Ogbuji]

via Copia

We need more solid guidelines for i18n in OSS projects

Every time I'm about to tackle i18n in some project, I find myself filled with a bit of trepidation. I actually know more than the average developer on the topic, but it's a hard topic, and there is quite the mystery around it.

One of the reasons for this mystery is that there are really few good resources to guide FOSS developers through the process of internationalizing their apps. Gettext is the most common means for i18n and l10n in FOSS apps. The master resource for this is the GNU gettext manual, but that's a lot to bite off. Poking around on Google, I found a useful gettext overview for PHP folks, the Ruby on Rails folks have a nice chapter on the it and there are a few other general intros (1, 2, 3). Luis Miguel Morillas pointed out some for GNOME, KDE and wxPython (1, 2).

Python has the usual set of gettext-based facilities, which is great, but they are woefully documented. In the library reference's section on gettext you get a purported overview and a bunch of scattered notes on the API, but nothing that really coherently leads the developer through the concepts and process of i18n, as the PHP and Ruby folks seem to have. It doesn't even seem as if buying a book helps much. The books I most often recommend as Python intros don't seem to touch the subject, and the reference books seem to go not much deeper than the Python docs. Even David Mertz's useful Text Processing in Python doesn't cover i18n (this surprised me).

My recent foray into i18n actually straddled Python and XML worlds. For XMLers, there are a few handy resources:

XLIFF is of particular interest, but I decided not to use it for i18n in 4Suite/XSLT because I wanted to base my work on the Python facilities, which are well tested and based on the de facto standard gettext.

Anyway, am I missing something? Are there all sorts of great resources out there that would slide Python developers right into the i18n groove?

[Uche Ogbuji]

via Copia