"Tip: Remove sensitive content from your XML samples with XSLT"

"Tip: Remove sensitive content from your XML samples with XSLT"

Do you need to share samples of your XML code, but can't disclose the data? For example, you might need to post a sample of your XML code with a question to get some advice with a problem. In this tip, Uche Ogbuji shows how to use XSLT to remove sensitive content and retain the basic XML structure.

I limited this article to erasing rather than obfuscating sensitive content, which can be done with XSLT 1.0 alone. With EXSLT (or even XSLT 2.0) you can do some degree of obfuscation, allowing you to possibly preserve elements of character data that are important to the problem under discussion. Honestly, though, I prefer to solve this problem with even more flexible tools. As a bonus the following is a bit of 4Suite/SAX code that uses a SAX filter to obfuscate character data by adding a random shift to the ordinal of each character in the Unicode alphanumeric class. This way if exotic characters were part of the problem you're demonstrating, they'd be left alone. It's easy to use the code as a template, and usually all you have to change is the obfuscate function or the obfuscate_filter class in order to fine-tune the workings.

import re
import random
from xml.sax import make_parser, saxutils
from Ft.Xml import CreateInputSource, Sax

ALPHANUM_PAT = re.compile('\w', re.UNICODE)

def obfuscate(old):
    def mutate(c):
        return unichr(ord(c.group())+random.randint(-RANDOM_AMP,RANDOM_AMP))
    return ALPHANUM_PAT.subn(mutate, old)[0]

class obfuscate_filter(saxutils.XMLFilterBase):
    def characters(self, content):
        saxutils.XMLFilterBase.characters(self, obfuscate(content))

if __name__ == "__main__":
    XML = "http://cvs.4suite.org/viewcvs/*checkout*/Amara/demo/labels1.xml"
    parser = make_parser(['Ft.Xml.Sax'])
    filtered_parser = obfuscate_filter(parser)
    handler = Sax.SaxPrinter()

This code uses recent fixes and capabilities I checked into 4Suite CVS last week. I think all the needed details to understand the code are in the SAX section of the updated 4Suite docs, which John Clark has already posted.

[Uche Ogbuji]

via Copia