Thursday, December 08, 2011

NodeTree 0.2 Released

I just shipped NodeTree 0.2.

This version will not parse an XML stream. All it contains are some basic types representing XML nodes such as Comment, Document, and Element. As promised, text is also handled as a node but uses standard Python strings (UTF-8 strings/bytes and unicode). These should all be fairly intuitive to use.

The magic is the XML data is being managed in C using a libxml2 DOM tree but accessed through a Pythonic object-oriented API. For example, in DOM each node may have exactly one parent - in NodeTree a node may be added to any number of parents with a separate DOM node and context for each.

I started this project because the existing XML packages for Python proved too difficult to use with XMPP. Fritzy's SleekXMPP uses lxml but had to jump through several hoops to get stream parsing to work, looking over his work I certainly didn't want to repeat it with Concordance-XMPP.

Beyond this the leading XML API for Python, ElementTree, includes several unfortunate design decisions that make it frustrating to use in the best cases and unusable in others. A full list of why can be left for another time, but the difference to NodeTree can be described in their names - ElementTree is a tree of XML Element nodes with other kinds of nodes either silently dropped, mangled, or made available in bizarre ways (eg, .text and .tail). In contrast, NodeTree provides XML data as a tree of nodes starting with the Document node and includes comment and text nodes in its tree. I plan to provide 100% XML 1.0 support in a future release while maintaining a clean, simple, and intuitive API.

Storing XML data in libxml2 DOM format gives us a few advantages over other XML libraries. First, we'll have XPath, XInclude, and XSLT available without having to convert the data between formats. Second, Python objects only need to be created for nodes Python wants a reference to so when we get to parsing data this will happen much faster and with less memory.

At version 0.2 NodeTree is still in its infancy but some of its API can be demonstrated. Here's a short example:

Python 3.2.2 (default, Oct  3 2011, 00:20:58) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nodetree
>>> doc = nodetree.Document()
>>> doc.append(nodetree.Comment(' Start '))
>>> doc.append(nodetree.Element('data'))
>>> doc.append(nodetree.Comment(' Fini '))
>>> doc[1].attributes['thing'] = 'normal'
>>> doc[1].append(nodetree.Element('record'))
>>> doc[1][0].append('First') 
>>> doc[1].append(nodetree.Element('record'))
>>> doc[1][1].append('Second')
>>> doc
<?xml version="1.0"?>
<!-- Start -->
<data thing="normal">
  <record>First</record>
  <record>Second</record>
</data>
<!-- Fini -->

NodeTree 0.2 is tested to work with Python 2.6, 2.7, 3.1, 3.2, and 3.3-pre. The next release is intended to support basic file and stream parsing.

No comments: