Python/XML column #33 pubbed

"Unicode Secrets"

In his latest Python-XML column, Uche Ogbuji delves broadly and deeply into the world of Unicode, especially with regard to processing XML in Python.

In this one I started out talking about a quick spot check for Unicode compliance in XML tools, then went on to present some tips on Python's Unicode API. The intent was not to be comprehensive. I cherry-picked the particular Unicode facilities I tend to use the most. As one person mentioned in the comments, there are even more means at your disposal than I cover. I'll get to some of them in part 2, in the next column installment.

[Uche Ogbuji]

via Copia
2 responses
ORA's brain-dead comment facility wouldn't let me say this.  It is very important for Western European and American Windows users to set the local conversion mode to CP-1252, not ISO 8859-1.  Using 8859-1 means that the Windows characters at 0x80-0x9F (curly quotes, s-hacek, ellipsis, etc.) get converted to U+0080 to U+009F, which are valid but useless.  Using CP-1252 gets them converted to Unicode correctly.
I posted this on the ORA forum for you, John.  Thanks.