Dare's XLINQ examples in Amara

Dare's examples for XLINQ are interesting. They are certainly more streamlined than the usual C# and Java fare I see, but still a bit clunky compared to what I'm used to in Python. To be fair a lot of that is on the C# language, so I'd be interested in seeing what XLINK looks like from Python.NET or Boo.

The following is my translation from Dare's fragments into corresponding Amara fragments (compatible with the Amara 1.2 branch).

'1. Creating an XML document'

import amara
#Done in 2 chunks just to show the range of options
#Another way would be to start with amara.create_document
skel = '<!--XLinq Contacts XML Example--><?MyApp 123-44-4444?><contacts/>'
doc = amara.parse(skel)
  <name>Patrick Hines</name>
    <street1>123 Main St</street1>
    <city>Mercer Island</city>

'2. Creating an XML element in the "http://example.com" namespace'

doc.xml_create_element(u'contacts', u'http://example.com')

'3. Loading an XML element from a file'


'4. Writing out an array of Person objects as an XML file'

persons = {}
persons[u'Patrick Hines'] = [u'206-555-0144', u'425-555-0145']
persons[u'Gretchen Rivas'] = [u'206-555-0163']
for name in persons:
    for phone in persons[name]:
print doc.xml()

'5. Print out all the element nodes that are children of the <contact> element'

for c in contact.xml_child_elements():
    print c.xml()

'6. Print all the <phone> elements that are children of the <contact> element'

for c in contact.xml_xpath(u'phone'):
    print c.xml()

'7. Adding a <phone> element as a child of the <contact> element'


'8. Adding a <phone> element as a sibling of another <phone> element'

mobile = contacts.xml_create_element(u'phone', content=u'206-555-0168')
first = contacts.phone
contacts.xml_insert_after(first, mobile)

'9. Adding an <address> element as a child of the <contact> element'

contacts.xml_append_fragment("""  <address>
    <street1>123 Main St</street1>
    <city>Mercer Island</city>

'10. Deleting all <phone> elements under a <contact> element'

for p in contact.phone: contact.xml_remove_child(p)

'11. Delete all children of the <address> element which is a child of the <contact> element'


'12. Replacing the content of the <phone> element under a <contact> element'

#Not really necessary: just showing how to clear the content
contact.phone = u'425-555-0155'

'13. Alternate technique for replacing the content of the <phone> element under a <contact> element'

contact.phone = u'425-555-0155'

'14. Creating a contact element with attributes multiple phone number types'

#I'm sure it's clear by now how easy this would be with xml_append_fragment
#So here is the more analogous API approach
contact = contacts.xml_create_element(u'contact')
contact.xml_append(contact.xml_create_element(u'name', content=u'Patrick Hines'))
                               attributes={u'type': u'home'},
                               attributes={u'type': u'work'},

'15. Printing the value of the <phone> element whose type attribute has the value "home"'

print u'Home phone is:', contact.xml_xpath(u'phone[@type="home"]')

'16. Deleting the type attribute of the first <phone> element under the <contact> element'

del contact.phone.type

'17. Transforming our original <contacts> element to a new <contacts> element containing a list of <contact> elements whose children are <name> and <phoneNumbers>'

new_contacts = doc.xml_create_element(u'contacts')
for c in doc.contacts.contact:
    for p in c.phone:

'18. Retrieving the names of all the contacts from Washington, sorted alphabetically '

wash_contacts = contacts.xml_xpath(u'contact[address/state="WA"]')
names = [ unicode(c.name) for c in contacts.contact ]

[Uche Ogbuji]

via Copia

Triclops - Resurrected

Like a pheonix from the flames, I've resurrected an old RDF querying interface called Triclops. It used to be coupled with the 4Suite repository's defunct Dashboard (which I've also tried to resurrect as an XForms interface to a live 4Suite repository - there is more to come on that front, thanks to FormFaces) but I've broken it out into it's own stand alone application. It's driven by this stylesheet which makes use of two XSLT extensions (http://metacognition.info/extensions/remote-query.dd#query and http://metacognition.info/extensions/remote-query.dd#graph) both of which are defined here.

I've updated the tabled,triple result page to include ways to navigate graphs by clicking on subjects (which takes you to a subsequent triple result page for that particular subject), predicates (which takes you to another triple result page with statements which use that predicate), and objects (which allows you to jump from graph to graph along rdfs:seeAlso / rdfs:isDefinedBy relationships). Note that it assumes the objects of rdfs:seeAlso and rdfs:isDefined by are live URIs that return an RDF graph (the most common use for these is for FOAF networks and relationships between ontologies).

I've also included buttons for common, 'canned' queries that can be executed on any graph, such as:

  • All classes: type(list(rdfs:Class,owl:Class))
  • rss:items: type(rss:item)
  • Dated Resources: all()|-dc:date->*
  • Today's Items: all()|-dc:date->contains(.,".. today's date ..")
  • Annotated Resources: all()|-list(rdfs:label,rdfs:comment,dc:title,dc:description,rss:title,rss:description)->*
  • Ontologies: type(owl:Ontology)
  • People: type(foaf:Person)
  • Everything: all()

In addition, I've added some documentation

P.S.: The nodes in the RDF diagrams generated by Triclops (as an alternative to raw triples) are live links. The JPEG diagrams are associated with image maps (generated by graphviz) that allow you to click on the nodes and the SVG diagrams are rendered with links as well (depending on the level of maturity of your SVG viewer - this might or might not be the prefered diagram format). The SVG diagrams have alot potential for things such as styling and the other possibilities that are standard to SVG.

In all, I think it provides a very user-friendly way to quickly traverse, whittle, and circumnavigate the "Semantic Web" - whoops, there is that phrase again :)

4Suite Repository and 4Suite RDF have become sort of bastard children of recent 4Suite development and I've been focusing my efforts lately in moving the latter along - The former (the Repository) only lacks documentation IMHO as Metacognition is run entirely as a 4Suite Repository instance.

Chimezie Ogbuji

via Copia

Solution: simple XML output "templates" for Amara

A few months ago in "Sane template-like output for Amara" I discussed ideas for making the Amara output API a little bit more competitive with full-blown templating systems such as XSLT, without adopting all the madness of template frameworks.

I just checked in the simplest patch that does the trick. Here is an example from the previous article:

Amara 1.0 code:

person_elem = newdoc.xml_element(
        attributes={u'name': unicode(person.name)}

Proposed Amara 1.2 code:

newdoc.xml_append_template("<person name='{person.name}'/>")

What I actually checked into CVS today for Amara 1.2:

newdoc.xml_append_fragment("<person name='%s'/>"%person.name)

That has the advantage of leaning as much as possible on an existing Python concept (formatted strings). As the method name indicates, this is conceptually no longer a template, but rather a fragment of XML in text form. The magic for Amara is in allowing one to dynamically create XML objects from such fragments. I think this is a unique capability (shared with 4Suite's MarkupWriter) for Python XML output APIs (I have no doubt you'll let me know if I'm wrong).

Also, I think the approach I settled on is best in light of the three "things to ponder" from the older article.

  • Security. Again I'm leaning on a well-known facility of Python, and not introducing any new holes. The original proposal would have opened up possible issues with tainted strings in the template expressions.
  • String or Unicode? I went with strings for the fragments. It's up to the developer to make sure that however he constructs the XML fragment, the result is a plain string and not a Unicode object.
  • separation of model and presentation. There is a very clear separation between Python operations to build a string XML fragment (these are usually the data model objects), and any transforms applied to the resulting XML binding objects (this is usually the separate presentation side). Sure a determined developer can write spaghetti, but I think that with xml_append_fragment it's possible and natural to have a clean separation. With most template systems, this is very hard to achieve.

One other thing to mention is that the dynamic incorporation of the new fragment into the XML binding makes this a potential building block for pipelined processing architecture.

def process_link(body, href, content):
    body.xml_append_fragment('%s'%(href, content))
    #Send the "a" element object that was just appended to
    #the next pipeline stage

def check_unique(a_node):
    if not a_node.href in g_link_dict:
        #index the href to the link text (a element text content)
        g_link_dict[a_node.href] = unicode(a_node)

[Uche Ogbuji]

via Copia

Firing SAX events from a DOM tree in 4Suite

One nice thing about the briskly-moving 4Suite documentation project is that it is shining a clear light on places where we need to make the APIs more versatile. Adding convenience parse functions was one earlier result.

Saxlette has the ability to walk a Domlette tree, firing off events to a handler as if from a source document parse. This ability used to be too well, hidden, though, and I made an API addition to make it more readily available. This is the new Ft.Xml.Domlette.SaxWalker. The following example should show how easy it is to use:

from Ft.Xml.Domlette import SaxWalker
from Ft.Xml import Parse

XML = ""

class element_counter:
    def startDocument(self):
        self.ecount = 0

    def startElementNS(self, name, qname, attribs):
        self.ecount += 1

#First get a Domlette document node
doc = Parse(XML)
#Then SAX "parse" it
parser = SaxWalker(doc)
handler = element_counter()
#You can set any properties or features, or do whatever
#you would to a regular SAX2 parser instance here
parser.parse() #called without any argument
print "Elements counted:", handler.ecount

Again Saxlette and Domlette are fully implemented in C, so you get great performance from the SaxWalker.

[Uche Ogbuji]

via Copia

Python/XML column #37 (and out): Processing Atom 1.0

"Processing Atom 1.0"

In his final Python-XML column, Uche Ogbuji shows us three ways to process Atom 1.0 feeds in Python. [Sep. 14, 2005]

I show how to parse Atom 1.0 using minidom (for those who want no additional dependencies), Amara Bindery (for those who want an easier API) and Universal Feed Parser (with a quick hack to bring the support in UFP 3.3 up to Atom 1.0). I also show how to use DateUtil and Python 2.3's datetime to process Atom dates.

As the teaser says, we've come to the end of the column in its present form, but it's more of a transition than a termination. From the article:

And with this month's exploration, the Python-XML column has come to an end. After discussions with my editor, I'll replace this column with one with a broader focus. It will cover the intersection of Agile Languages and Web 2.0 technologies. The primary language focus will still be Python, but there will sometimes be coverage of other languages such as Ruby and ECMAScript. I think many of the topics will continue to be of interest to readers of the present column. I look forward to continuing my relationship with the XML.com audience.

It is too bad that I don't get to some of the articles that I had in the queue, including coverage of lxml pygenx, XSLT processing from Python, the role of PEP 342 in XML processing, and more. I can still squeeze some of these topics into the new column, I think, as long as I make an emphasis on the Web. I'll also try to keep up my coverage of news in the Python/XML community here on Copia.

Speaking of such news, I forgot to mention in the column that I'd found an interesting resource from John Shipman.

[F]or my relatively modest needs, I've written a more Pythonic module that uses minidom. Complete documentation, including the code of the module in 'literate programming' style, is at:


The relevant sections start with section 7, "xmlcreate.py".

[Uche Ogbuji]

via Copia

Live Markdown Compilation via 4XSLT / 4Suite Repository

Related to uche's recent entry about PyBlosxom + CherryPy, I recently wrote a 4XSLT extension that compiles a 4Suite Repository RawFile (which holds a Markdown document) into an HTML 4.1 document on the fly. I'm using it to host a collaborative markdown-based Wiki.

The general idea to allow the Markdown document to reside in the repository and be editable by anyone (or specific users). The raw content of that document can be viewed with a different URL: http://metacognition.info/markdown-documents/RDFInterfaces.txt . That is the actual location of the file, the previous URL is actually a REST service setup with the 4Suite Server instance running on metacognition that listens for requests with a leading /markdown and redirects the request to a stylesheet that compiles the content of the file and returns an HTML document.

The relevant section of the server.xml document is below:

         xslt-transform='/extensions/RenderMarkdown.xslt'   />

This makes use of a feature in the 4Suite Repository Server architecture that allows you to register URL patterns to XSLT transformations. In this case, all incoming requests for paths with a leading /markdown are interpreted as a request to execute the stylesheet /extensions/RenderMarkdown.xslt with a top-level path parameter which is the full path to the markdown document (/markdown-documents/RDFInterfaces.txt in this case). For more on these capabilities, see: The architecture of 4Suite Web applications.

The rendering stylesheet is below:

<?xml version="1.0" encoding="UTF-8"?>
        extension-element-prefixes="exsl md fcore"
        exclude-result-prefixes="fcore ftext exsl md xsl">
          doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN" 
        <xsl:param name="path"/>
        <xsl:param name="title"/>
        <xsl:param name="css"/>
        <xsl:template match="/">        
            <title><xsl:value-of select="$title"/></title>         
            <link href="{$css}" type="text/css" rel="stylesheet"/>
            <xsl:copy-of select="md:renderMarkdown(fcore:get-content($path))"/>

This stylesheet makes use of a md:renderMarkdown extension function defined in the Python module below:

from pymarkdown import Markdown
    from Ft.Xml.Xslt import XsltElement,ContentInfo,AttributeInfo
    from Ft.Xml.XPath import Conversions
    from Ft.Xml import Domlette


    def RenderMarkdown(context, markDownString):
        dom = Domlette.NonvalidatingReader.parseString(str(rt),"urn:uuid:Blah")
        return [dom.documentElement]

    ExtFunctions = {
        (NS, 'renderMarkdown'): RenderMarkdown,

Notice that the stylesheet allows for the title and css to be specified as parameters to the original URL.

The markdown compilation mechanism is none other than the pymarkdown.py used by Copia.

For now, the Markdown documents can only be edited remotely by editors that know how to submit content over HTTP via PUT as well as handle HTTP authentication challenges if met with a 401 for a resource in the repository that isn't publicly available (in this day and age it's a shame there are only a few such editors - The one I use primarily is the Oxygen XML Editor).

I hope to later add a simple HTML-based form for live modification of the markdown documents which should complete the very simple framework for a markdown-based, 4Suite-enabled mini-Wiki.

Chimezie Ogbuji

via Copia

Itinerant Binds - Better Software Documentation

It was brought to my attention that my recent entry about Sparta/Versa/rdflib possibilities was a little vague/unclear. This tends to happen when I get caught up in an interest. Anyways,.. I renamed the module to Itinerant Binds (I liked the term), created a page on Metacognition for the recent rdflib/4Suite RDF work I've been doing with some more details on how the components works. I added an example that better demonstrates isolating RDF resources through targeted Versa queries and using the bound python result objects to modify / extend the underlying graph.

Chimezie Ogbuji

via Copia

Convenience APIs for 4Suite Domlette parsing

I added some functions to make a baby step into Domlette parsing. I call these the brain-dead APIs (though we use more decorous terms officially). You can now get yourself a crisp DOM with as little effort as:

>>> from Ft.Xml import Parse
>>> doc = Parse("<hello>world</hello>")

And thence on to the usual fun stuff:

>>> print doc.xpath(u"string(hello)")

Parse also knows how to handle streams (file-like objects):

>>> doc = Parse(open("hello.xml"))

Do a help(Parse) to get a warning about using that function with XML that is not self-contained, and an example of how to parse such XML properly in 4Suite.

There is also ParsePath, which handles files paths and URLs:

>>> from Ft.Xml import ParsePath
>>> doc = ParsePath("hello.xml")
>>> doc = ParsePath("http://copia.ogbuji.net/blog/index.atom")

And what do you know? That last URL does not parse. My feed is ill-formed. ☠☠☠☠☠ (I love cursing in Unicode) Sigh. Gotta fix the PyBlosxom Atom generation. Maybe we need a touch of 4Suite (Amara, actually) there, as well.

[Uche Ogbuji]

via Copia

RDF-API: Reconciling the redundancy in pythonic RDF store implementations

I just wrapped up the second of two rdflib-related libraries I wrote with the aim of bridging the gap between rdflib and 4Suite RDF. The latter (BoundVersaResult.py) is a little more interesting than the former in that it uses Sparta to allow the distinct components of a Versa query result to each be bound to appropriate python objects. 4Suite RDF's Versa implementation already provides such a binding:

  • String -> Python unicode
  • Number -> Python float
  • Boolean -> Python boolean
  • List -> Python list
  • Set -> Python Sets
  • Resource/BlankNodes -> Python unicode

The bindings for all the datatypes except Resource/BlankNodes are straight forward. This library extends the datatype binding to include the ability to bind Sparta Things to Versa Resources and BlankNodes. Since Sparta only works with rdflib Graphs, the FtRdfBackend.py module is used to wrap an rdflib.Graph around a 4Suite Model.

Sparta takes an RDF Graph and a defining Ontology which dictates the cardinality of properties bound to resource objects (Things). It allows an RDF Graph to be traversed (and extended) via pythonic idiom. The combination of being able to isolate resources by Versa query (or SPARQL queries eventually - as soon as the ongoing rdflib effort in that regard is completed) and bind them to python objects whose properties reflect the properties on the underlying RDF resources they are bound to is very cool, IMHO. The ability to provide an implementation agnostic way to modify an RDF graph, using a host language as expressive as Python is the icing on the cake. For example, check out the following code snippet demonstrating the use of this library:

#Setup FtRDF Model
db = Memory.GetDb('', '')
model = Model.Model(db)

#Parse my del.icio.us rss feed
szr = Dom.Serializer()
dom = Domlette.NonvalidatingReader.parseString(domStr,'http://del.icio.us/rss/chimezie')

#Setup rdflib.Graph with FtRDF Model as Backend, using FtRdf driver
for item in generator.query("type(rss:item)"):        
    [pprint(link) for link in item.rss_link]
    print generator.query("distribute(@'%s','.-rss:title->*','.-dc:subject->*')"%item._id)[0]

Note that (within the loop over the rss:items in the graph), the rss:link property returns an iterator over the possible values (since there is no defining ontology that could have specified that the rss:link property has a cardinality of 1, or is an inverse functional property - which would have caused Sparta to bind the rss_link property to a single object instead of an iterator).

The result of running this code:

[[u'More on additional semantic information from Enrico Franconi on 2004-07-12 (public-rdf-    dawg@w3.org from 
July to September 2004)'], [u'academic architecture archive community dawg email logic query rdf reference 
[[u'Representing Specified Values in OWL: "value partitions" and "value sets"'], [u'academic datatypes ontology owl 
rdf semantic standard w3c']]
[[u'boolean operators and type errors from Jeen Broekstra on 2005-09-07 (public-rdf-dawg@w3.org from July to 
September 2005)'], [u'academic architecture archive community dawg email logic rdf reference semantic w3c']]
[[u'RDF Diff, Patch, Update, and Sync -- Design Issues'], [u'academic paper rdf semantic standards tbl w3c']]
[[u'RDF Data Access Use Cases and Requirements'], [u'academic architecture framework query rdf reference semantic 
specification standard w3c']]
[[u'Relational Databases and the Semantic Web (in Design Issues)'], [u'academic architecture framework rdb rdf 
reference semantic tbl w3c']]
[[u'Defining N-ary Relations on the Semantic Web: Use With Individuals'], [u'academic logic ontology owl predicate 
rdf reference relationships semantic standard w3c']]

Chimezie Ogbuji

via Copia

Wrapping rdflib's Graph around a 4RDF Model

Well, for some time I had pondered what it would take fo provide SPARQL support in 4Suite RDF. I fell upon sparql-p, earlier and noticed it was essentially a SPARQL query processor w/out a parser to drive it. It works over a deprecated rdflib interface: TripleStore. The newly suggested interface is Graph, which is as solid suggestion for a generic RDF:API as any. So, I wrote a 4Suite RDF model backend for rdflib, that allows the wrapping of Graph around a live 4Suite RDF model. Finally, I used this backend to execute a sparql-p query over http://http://del.icio.us/rss/chimezie:

  ?item rdf:type rss:item;
        dc:subject ?subj;
        rss:title ?title.
        FILTER (REGEX(?subj,".*rdf")).

The corresponding python code:

#Setup FtRDF Model
db = Memory.GetDb('rules', 'test')
model = Model.Model(db)

#Parse my del.icio.us rss feed
szr = Dom.Serializer()
dom = Domlette.NonvalidatingReader.parseString(domStr,'http://del.icio.us/rss/chimezie')

#Setup rdflib.Graph with FtRDF Model as Backend, using FtRdf driver

#Setup sparql-p query processor engine
select = ("?title")

#Setup term
copia = URIRef('http://del.icio.us/chimezie')
rssTitle = URIRef('http://purl.org/rss/1.0/title')
versaWiki = URIRef('http://en.wikipedia.org/wiki/Versa')

#Filter on objects of statements (dc:subject values) - keep only those containing the string 'rdf'
def rdfSubFilter(subj,pred,obj):
    return bool(obj.find('rdf')+1)

#Execute query
where = GraphPattern([("?item",rdf_type,URIRef('http://purl.org/rss/1.0/item')),
tStore = myTripleStore(FtRdf(model))
result = tStore.query(select,where)

The result (which will change daily as my links shift thru my del.icio.us channel queue:

[chimezie@Zion RDF-API]$ python FtRdfBackend.py
 u'Representing Specified Values in OWL: "value partitions" and "value sets"',
 u'RDF Template Language 1.0',
 u'SIOC Vocabulary Specification',
 u'SPARQL in RDFLib',
 u'MeetingRecords - ESW Wiki',
 u'Enumerated datatypes (OWL)',
 u'Defining N-ary Relations on the Semantic Web: Use With Individuals']

Chimezie Ogbuji

via Copia