Test XML Output Using Python doctest
Python doctest combines unit testing with documentation. With doctest, you put your unit tests in the Python docstring, and documentation generators like Sphinx render your tests as usage examples. It’s easy to compare a method’s output to an XML string using doctest. It’s not so easy when you want the XML string pretty-printed for the sake of clear documentation. Here is a simple solution.
The source code on which this article is based provides a unified example of the concepts presented.
Background: doctest Wants the XML on a Single Line
Consider this sample that creates an XML document:
import datetime import getpass from xml.dom.minidom import Document class PythonXmlDoctest(object): '''Example of testing XML output using Python doctest.''' def __init__(self, contentText): '''Create an ultra-super-mega great document.''' self._doc = Document() super = self._doc.createElement("super") self._doc.appendChild(super) ultra = self._doc.createElement("ultra") ultra.setAttribute('date', str(datetime.date.today())) ultra.setAttribute('user', getpass.getuser()) super.appendChild(ultra) mega = self._doc.createElement("mega") ultra.appendChild(mega) content = self._doc.createTextNode(contentText) mega.appendChild(content) @property def doc(self): '''The minidom document.''' return self._doc if __name__ == "__main__": import doctest doctest.testmod()
If you test against the result of toxml()
, the expected result must appear all on one line, which is satisfactory for only the shortest XML element:
def __init__(self, contentText): ''' Create an ultra-super-mega great document. >>> d = PythonXmlDoctest('Somebody stop me!') >>> d.doc.toxml() #doctest: +ELLIPSIS '<?xml version="1.0" ?><super><ultra date="..." user="..."><mega>Somebody stop me!</mega></ultra></super>' ''' self._doc = Document() # And so on...
Note how I used the doctest ELLIPSIS
option. It allows me to specify ...
for the date
and user
attribute values so that the test will succeed no matter who you are and when you run the test.
Using toprettyxml()
is even worse because the expected result contains \n
and \t
, and it’s still on one line:
def __init__(self, contentText): ''' Create an ultra-super-mega great document. >>> d = PythonXmlDoctest('Somebody stop me!') >>> d.doc.toprettyxml() #doctest: +ELLIPSIS '<?xml version="1.0" ?>\n<super>\n\t<ultra date="2011-11-05" user="johnm">\n\t\t<mega>\n\t\t\tSomebody stop me!\n\t\t</mega>\n\t</ultra>\n</super>\n' ''' self._doc = Document() # And so on...
Solution: Pretty Print with Spaces and Normalize White Space
I got the best result using toprettyxml(indent=' ', newl=' ')
together with the doctest NORMALIZE_WHITESPACE
option. The key to understanding the NORMALIZE_WHITESPACE
option is that it causes one or more (but not zero) white space characters to be considered equivalent:
def __init__(self, contentText): ''' Create an ultra-super-mega great document. >>> d = PythonXmlDoctest('Somebody stop me!') >>> d.doc.toprettyxml(indent=' ', newl=' ') #doctest: +ELLIPSIS +NORMALIZE_WHITESPACE '<?xml version="1.0" ?> <super> <ultra date="..." user="..."> <mega> Somebody stop me! </mega> </ultra> </super> ' ''' self._doc = Document() # And so on...
The expected result in the above docstring is easily readable, and the test passes. Moreover, when rendered as documentation using Sphinx, the XML expected result appears as a nicely formatted usage example.
If Your XML Package Has No Pretty Print
If you are using another Python XML package like ElementTree that cannot pretty print, just read the XML into minidom and output it again pretty printed:
from xml.etree import ElementTree from xml.dom import minidom def prettify(elem): """Return a pretty-printed XML string for the Element.""" rough_string = ElementTree.tostring(elem, 'utf-8') reparsed = minidom.parseString(rough_string) return reparsed.toprettyxml(indent=' ', newl=' ')
This article was originally published by John McGehee, Voom, Inc. under the CC BY 3.0 license. Changes have been made.
Apr 10, 2016 @ 17:43:15
I have an additional tip for doctest users. Ordinarily, in doctests you must use a double backslash even in raw strings in your test:
def test_backslash1():
”’Two backslashes needed in the docstring tests:
>>> r’\\windows\\filename.txt’ == test_backslash1()
True
”’
return r’\windows\filename.txt’
This is because the docstring is a regular string, in which a literal backslash must be escaped. However, if you make the docstring itself a raw sting (note how the docstring begins with r”’) you can use a single backslash:
def test_backslash2():
r”’Two backslashes not required because this docstring is itself a raw string:
>>> r’\windows\filename.txt’ == test_backslash2()
True
”’
return r’\windows\filename.txt’