Test XML Output Using Python doctest

Python doctest combines unit testing with documentation.  With doctest, you put your unit tests in the Python docstring, and documentation generators like Sphinx render your tests as usage examples.  It’s easy to compare a method’s output to an XML string using doctest.  It’s not so easy when you want the XML string pretty-printed for the sake of  clear documentation.  Here is a simple solution.

The source code on which this article is based provides a unified example of the concepts presented.

Background: doctest Wants the XML on a Single Line

Consider this sample that creates an XML document:

import datetime
import getpass
from xml.dom.minidom import Document

class PythonXmlDoctest(object):
    '''Example of testing XML output using Python doctest.'''

    def __init__(self, contentText):
    '''Create an ultra-super-mega great document.'''
    self._doc = Document()
    super = self._doc.createElement("super")
    self._doc.appendChild(super)

    ultra = self._doc.createElement("ultra")
    ultra.setAttribute('date', str(datetime.date.today()))
    ultra.setAttribute('user', getpass.getuser())
    super.appendChild(ultra)

    mega = self._doc.createElement("mega")
    ultra.appendChild(mega)
    content = self._doc.createTextNode(contentText)
    mega.appendChild(content)

    @property
    def doc(self):
    '''The minidom document.'''
    return self._doc

if __name__ == "__main__":
    import doctest
    doctest.testmod()

If you test against the result of toxml(), the expected result must appear all on one line, which is satisfactory for only the shortest XML element:

    def __init__(self, contentText):
        ''' Create an ultra-super-mega great document.

        >>> d = PythonXmlDoctest('Somebody stop me!')
        >>> d.doc.toxml()   #doctest: +ELLIPSIS
        '<?xml version="1.0" ?><super><ultra date="..." user="..."><mega>Somebody stop me!</mega></ultra></super>'
        '''
        self._doc = Document()
        # And so on...

Note how I used the doctest ELLIPSIS option. It allows me to specify ... for the date and user attribute values so that the test will succeed no matter who you are and when you run the test.

Using toprettyxml() is even worse because the expected result contains \n and \t, and it’s still on one line:

    def __init__(self, contentText):
        ''' Create an ultra-super-mega great document.

        >>> d = PythonXmlDoctest('Somebody stop me!')
        >>> d.doc.toprettyxml()   #doctest: +ELLIPSIS
        '<?xml version="1.0" ?>\n<super>\n\t<ultra date="2011-11-05" user="johnm">\n\t\t<mega>\n\t\t\tSomebody stop me!\n\t\t</mega>\n\t</ultra>\n</super>\n'
        '''
        self._doc = Document()
        # And so on...

Solution: Pretty Print with Spaces and Normalize White Space

I got the best result using toprettyxml(indent=' ', newl=' ') together with the doctest NORMALIZE_WHITESPACE option. The key to understanding the NORMALIZE_WHITESPACE option is that it causes one or more (but not zero) white space characters to be considered equivalent:

    def __init__(self, contentText):
        ''' Create an ultra-super-mega great document.

        >>> d = PythonXmlDoctest('Somebody stop me!')
        >>> d.doc.toprettyxml(indent=' ', newl=' ')   #doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
        '<?xml version="1.0" ?>
        <super>
          <ultra date="..." user="...">
            <mega>
              Somebody stop me!
            </mega>
          </ultra>
        </super> '
        '''
        self._doc = Document()
        # And so on...

The expected result in the above docstring is easily readable, and the test passes.  Moreover, when rendered as documentation using Sphinx, the XML expected result appears as a nicely formatted usage example.

If Your XML Package Has No Pretty Print

If you are using another Python XML package like ElementTree that cannot pretty print, just read the XML into minidom and output it again pretty printed:

from xml.etree import ElementTree
from xml.dom import minidom

def prettify(elem):
    """Return a pretty-printed XML string for the Element."""
    rough_string = ElementTree.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent=' ', newl=' ')

This article was originally published by John McGehee, Voom, Inc. under the CC BY 3.0 license.  Changes have been made.