Any module with functions should have a set of tests that can check the correctness of the implementations. There exists well-established procedures and corresponding tools for automating the execution of such tests. These tools allow large test sets to be run with a one-line command, making it easy to check of the still software works (as far as the tests tell!). Here we shall illustrate two important software testing techniques: doctest and unit testing. The first one is Python specific, while unit testing is the dominating test technique in the software industry today.
A doc string, the first string after the function header, is used to
document the purpose of functions and their arguments
(see the section Implementing the numerical algorithm in a function). Very often it
is instructive to include an example in the doc string
on how to use the function.
Interactive examples in the Python shell are most illustrative as
we can see the output resulting from the statements and expressions.
For example,
in the solver
function, we can include an example on calling
this function and printing the computed u
and t
arrays:
def solver(I, a, T, dt, theta):
"""
Solve u'=-a*u, u(0)=I, for t in (0,T] with steps of dt.
>>> u, t = solver(I=0.8, a=1.2, T=1.5, dt=0.5, theta=0.5)
>>> for n in range(len(t)):
... print 't=%.1f, u=%.14f' % (t[n], u[n])
t=0.0, u=0.80000000000000
t=0.5, u=0.43076923076923
t=1.0, u=0.23195266272189
t=1.5, u=0.12489758761948
"""
...
When such interactive demonstrations are inserted in doc strings,
Python’s doctest
module can be used to automate running all commands
in interactive sessions and compare new output with the output
appearing in the doc string. All we have to do in the current example
is to run the module file decay.py
with
Terminal> python -m doctest decay.py
This command imports the doctest
module, which runs all
doctests found in the file and reports discrepancies between
expected and computed output.
Alternatively, the test block in a module may run all doctests
by
if __name__ == '__main__':
import doctest
doctest.testmod()
Doctests can also be embedded in nose/pytest unit tests as explained in the next section.
Doctests prevent command-line arguments
No additional command-line argument is allowed when running doctests. If your program relies on command-line input, make sure the doctests can be run without such input on the command line.
However, you can simulate command-line input by filling sys.argv
with values, e.g.,
import sys; sys.argv = '--I 1.0 --a 5'.split()
The execution command above will report any problem if a test fails.
As an illustration, let us alter the u
value at t=1.5
in
the output of the doctest by replacing the last digit 8
by 7
.
This edit triggers a report:
Terminal> python -m doctest decay.py
********************************************************
File "decay.py", line ...
Failed example:
for n in range(len(t)):
print 't=%.1f, u=%.14f' % (t[n], u[n])
Expected:
t=0.0, u=0.80000000000000
t=0.5, u=0.43076923076923
t=1.0, u=0.23195266272189
t=1.5, u=0.12489758761948
Got:
t=0.0, u=0.80000000000000
t=0.5, u=0.43076923076923
t=1.0, u=0.23195266272189
t=1.5, u=0.12489758761947
Pay attention to the number of digits in doctest results
Note that in the output of t
and u
we write u
with 14 digits.
Writing all 16 digits is not a good idea: if the tests are run on
different hardware, round-off errors might be different, and
the doctest
module detects that the numbers are not precisely the same
and reports failures. In the present application, where \(0 < u(t) \leq 0.8\),
we expect round-off errors to be of size \(10^{-16}\), so comparing 15
digits would probably be reliable, but we compare 14 to be on the
safe side. On the other hand, comparing a small number of digits may
hide software errors.
Doctests are highly encouraged as they do two things: 1) demonstrate how a function is used and 2) test that the function works.
The unit testing technique consists of identifying smaller units of code and writing one or more tests for each unit. One unit can typically be a function. Each test should, ideally, not depend on the outcome of other tests. The recommended practice is actually to design and write the unit tests first and then implement the functions!
In scientific computing it is not always obvious how to best perform
unit testing. The units are naturally larger than in non-scientific
software. Very often the solution procedure of a mathematical problem
identifies a unit, such as our solver
function.
Python offers two very easy-to-use software frameworks for implementing unit tests: nose and pytest. These work (almost) in the same way, but our recommendation is to go for pytest.
For a test to qualify as a test function in nose or pytest, three rules must be followed:
- The function name must start with
test_
.- Function arguments are not allowed.
- An
AssertionError
exception must be raised if the test fails.
A specific example might be illustrative before proceeding. We have the following function that we want to test:
def double(n):
return 2*n
The corresponding test function could, in principle, have been written as
def test_double():
"""Test that double(n) works for one specific n."""
n = 4
expected = 2*4
computed = double(4)
if expected != computed:
raise AssertionError
The last two lines, however, are never written like this in test functions.
Instead, Python’s assert
statement is used: assert success, msg
, where
success
is a boolean variable, which is False
if the test fails, and
msg
is an optional message string that is printed when the test fails.
A better version of the test function is therefore
def test_double():
"""Test that double(n) works for one specific n."""
n = 4
expected = 2*4
computed = double(4)
msg = 'expected %g, computed %g' % (expected, computed)
success = expected == computed
assert success, msg
Because of the finite precision arithmetics on a computer, which gives
rise to round-off errors, the ==
operator is not suitable for
checking whether two real numbers are equal. Obviously, this principle
also applies to tests in test functions.
We must therefore replace a == b
by a comparison
based on a tolerance tol
: abs(a-b) < tol
. The next example illustrates
the problem and its solution.
Here is a slightly different function that we want to test:
def third(x):
return x/3.
We write a test function where the expected result is computed as \(\frac{1}{3}x\) rather than \(x/3\):
def test_third():
"""Check that third(x) works for many x values."""
for x in np.linspace(0, 1, 21):
expected = (1/3.0)*x
computed = third(x)
success = expected == computed
assert success
This test_third
function executes silently, i.e., no failure,
until x
becomes 0.15. Then round-off errors make the ==
comparison
False
. In fact, seven of the x
values above face this problem.
The solution is to compare expected
and computed
with a small tolerance:
def test_third():
"""Check that third(x) works for many x values."""
for x in np.linspace(0, 1, 21):
expected = (1/3.)*x
computed = third(x)
tol = 1E-15
success = abs(expected - computed) < tol
assert success
Always compare real numbers with a tolerance
Real numbers should never be compared with the ==
operator, but always
with the absolute value of the difference and a tolerance.
So, replace a == b
, if a
and/or b
is float
, by
tol = 1E-14
abs(a - b) < tol
The suitable size of tol
depends on the size of a
and b
(see Problem 5: Experiment with tolerances in comparisons).
Test frameworks often contain more tailored
assert functions that can be called instead of using the assert
statement. For example, comparing two objects within
a tolerance, as in the present
case, can be done by the assert_almost_equal
from the nose
framework:
import nose.tools as nt
def test_third():
x = 0.15
expected = (1/3.)*x
computed = third(x)
nt.assert_almost_equal(
expected, computed, delta=1E-15,
msg='diff=%.17E' % (expected - computed))
Whether to use the plain assert
statement with a comparison based on
a tolerance or to use the ready-made function assert_almost_equal
depends on the programmer’s preference. The examples used in the
documentation of the pytest framework stick to the plain assert
statement.
Test functions can reside in a module together with the functions they
are supposed to verify, or the test functions can be collected in
separate files having names starting with test
. Actually,
nose and pytest can recursively run all test functions
in all test*.py
files in the current directory, as well as in all subdirectories!
The decay.py module file features
test functions in the module, but we could equally well have made
a subdirectory tests
and put the test functions in
tests/test_decay.py.
To run all test functions in the file decay.py
do
Terminal> nosetests -s -v decay.py
Terminal> py.test -s -v decay.py
The -s
option ensures that output from the test functions is printed
in the terminal window, while -v
prints the outcome of each individual
test function.
Alternatively, if the test functions are located in some separate
test*.py
files,
we can just write
Terminal> py.test -s -v
to recursively run all test functions in the current directory tree. The corresponding
Terminal> nosetests -s -v
command does the same, but requires subdirectory names to start
with test
or end with _test
or _tests
(which is a good habit anyway).
An example of a tests
directory with a test*.py
file is found in src/softeng1/tests.
Doctests can also be executed from nose/pytest unit tests. Here
is an example of a file test_decay_doctest.py where we in the test block run all the doctests
in the imported module decay
, but we also include a local test function that
does the same:
import sys, os
sys.path.insert(0, os.pardir)
import decay
import doctest
def test_decay_module_with_doctest():
"""Doctest embedded in a nose/pytest unit test."""
# Test all functions with doctest in module decay
failure_count, test_count = doctest.testmod(m=decay)
assert failure_count == 0
if __name__ == '__main__':
# Run all functions with doctests in this module
failure_count, test_count = doctest.testmod(m=decay)
Running this file as a program from the command line
triggers the doctest.testmod
call
in the test block, while applying py.test
or nosetests
to the file triggers
an import of the file and execution of the test function
test_decay_modue_with_doctest
.
With pip
available, it is trivial to install nose and/or pytest:
sudo pip install nose
and sudo pip install pytest
.
Finding good test problems for verifying the implementation of numerical methods is a topic on its own. The challenge is that we very seldom know what the numerical errors are. For the present model problem (3.1)-(3.2) solved by (3.3) one can, fortunately, derive a formula for the numerical approximation:
Then we know that the implementation should produce numbers that agree with this formula to machine precision. The formula for \(u^n\) is known as an exact discrete solution of the problem and can be coded as
def exact_discrete_solution(n, I, a, theta, dt):
"""Return exact discrete solution of the numerical schemes."""
dt = float(dt) # avoid integer division
A = (1 - (1-theta)*a*dt)/(1 + theta*dt*a)
return I*A**n
A test function can evaluate this solution on a time mesh
and check that the u
values produced by the solver
function
do not deviate with more than a small tolerance:
def test_exact_discrete_solution():
"""Check that solver reproduces the exact discr. sol."""
theta = 0.8; a = 2; I = 0.1; dt = 0.8
Nt = int(8/dt) # no of steps
u, t = solver(I=I, a=a, T=Nt*dt, dt=dt, theta=theta)
# Evaluate exact discrete solution on the mesh
u_de = np.array([exact_discrete_solution(n, I, a, theta, dt)
for n in range(Nt+1)])
# Find largest deviation
diff = np.abs(u_de - u).max()
tol = 1E-14
success = diff < tol
assert success
Among important things to consider when constructing test functions
is testing the effect of wrong input to the function being tested.
In our solver
function, for example, integer values of \(a\), \(\Delta t\), and
\(\theta\) may cause unintended integer
division. We should therefore add a test to make sure our solver
function does not fall into this potential trap:
def test_potential_integer_division():
"""Choose variables that can trigger integer division."""
theta = 1; a = 1; I = 1; dt = 2
Nt = 4
u, t = solver(I=I, a=a, T=Nt*dt, dt=dt, theta=theta)
u_de = np.array([exact_discrete_solution(n, I, a, theta, dt)
for n in range(Nt+1)])
diff = np.abs(u_de - u).max()
assert diff < 1E-14
The function read_command_line_positional
extracts numbers from the
command line. To test it, we must decide on a set of values for
the input data, fill sys.argv
accordingly, and check that we get the expected values:
def test_read_command_line_positional():
# Decide on a data set of input parameters
I = 1.6; a = 1.8; T = 2.2; theta = 0.5
dt_values = [0.1, 0.2, 0.05]
# Expected return from read_command_line_positional
expected = [I, a, T, theta, dt_values]
# Construct corresponding sys.argv array
sys.argv = [sys.argv[0], str(I), str(a), str(T), 'CN'] + \
[str(dt) for dt in dt_values]
computed = read_command_line_positional()
for expected_arg, computed_arg in zip(expected, computed):
assert expected_arg == computed_arg
Note that sys.argv[0]
is always the program name and that we have to
copy that string from the original sys.argv
array to the new one we
construct in the test function. (Actually, this test function destroys
the original sys.argv
that Python fetched from the command line.)
Any numerical code writer should always be skeptical to the use of the exact
equality operator ==
in test functions, since round-off errors often
come into play. Here, however, we set some real values, convert them
to strings and convert back again to real numbers (of the same precision).
This string-number conversion does not involve any finite precision
arithmetics effects so we
can safely use ==
in tests. Note also that the last element in
expected
and computed
is the list dt_values
, and ==
works
for comparing two lists as well.
The function read_command_line_argparse
can be verified with a
test function that has the same setup as test_read_command_line_positional
above.
However, the construction of the command line is a bit more complicated.
We find it convenient to construct the line as a string and then
split the line into words to get the desired list sys.argv
:
def test_read_command_line_argparse():
I = 1.6; a = 1.8; T = 2.2; theta = 0.5
dt_values = [0.1, 0.2, 0.05]
# Expected return from read_command_line_argparse
expected = [I, a, T, theta, dt_values]
# Construct corresponding sys.argv array
command_line = '%s --a %s --I %s --T %s --scheme CN --dt ' % \
(sys.argv[0], a, I, T)
command_line += ' '.join([str(dt) for dt in dt_values])
sys.argv = command_line.split()
computed = read_command_line_argparse()
for expected_arg, computed_arg in zip(expected, computed):
assert expected_arg == computed_arg
Recall that the Python function zip
enables iteration over
several lists, tuples, or arrays at the same time.
Let silent test functions speak up during development
When you develop test functions in a module, it is common to use IPython for interactive experimentation:
In[1]: import decay
In[2]: decay.test_read_command_line_argparse()
Note that a working test function is completely silent! Many find it psychologically annoying to convince themselves that a completely silent function is doing the right things. It can therefore, during development of a test function, be convenient to insert print statements in the function to monitor that the function body is indeed executed. For example, one can print the expected and computed values in the terminal window:
def test_read_command_line_argparse():
...
for expected_arg, computed_arg in zip(expected, computed):
print expected_arg, computed_arg
assert expected_arg == computed_arg
After performing this edit, we want to run the test again, but in IPython the module must first be reloaded (reimported):
In[3]: reload(decay) # force new import
In[2]: decay.test_read_command_line_argparse()
1.6 1.6
1.8 1.8
2.2 2.2
0.5 0.5
[0.1, 0.2, 0.05] [0.1, 0.2, 0.05]
Now we clearly see the objects that are compared.
The test functions written for the nose and pytest frameworks are
very straightforward and to the point, with no framework-required boilerplate
code. We just write the statements we need to get the computations and
comparisons done, before applying the required assert
.
The classical way of implementing unit tests (which derives from the
JUnit object-oriented tool in Java) leads to much more comprehensive
implementations with a lot of boilerplate code. Python comes with a
built-in module unittest
for doing this type of classical unit
tests. Although nose or pytest are much more convenient to use than
unittest
, class-based unit testing in the style of unittest
has a
very strong position in computer science and is so widespread in
the software industry that
even computational scientists should have an idea how such unit test
code is written. A short demo of unittest
is therefore included
next. (Readers who are not familiar with object-oriented programming
in Python may find the text hard to understand, but one can safely
jump to the next section.)
Suppose we have a function double(x)
in a module file mymod.py
:
def double(x):
return 2*x
Unit testing with the aid of the unittest
module
consists of writing a file test_mymod.py
for testing the functions
in mymod.py
. The individual tests must be methods with names
starting with test_
in a class derived from class TestCase
in
unittest
. With one test method for the function double
, the
test_mymod.py
file becomes
import unittest
import mymod
class TestMyCode(unittest.TestCase):
def test_double(self):
x = 4
expected = 2*x
computed = mymod.double(x)
self.assertEqual(expected, computed)
if __name__ == '__main__':
unittest.main()
The test is run by executing the test file test_mymod.py
as a standard
Python program. There is no support in unittest
for automatically
locating and running all tests in all test files in a directory tree.
We could use the basic assert
statement as we did with nose and pytest
functions, but those who write code based on unittest
almost
exclusively use the wide range of built-in assert functions such
as assertEqual
, assertNotEqual
, assertAlmostEqual
, to mention
some of them.
Translation of the test functions from the previous sections
to unittest
means making a new file test_decay.py
file with a
test class TestDecay
where the stand-alone functions for
nose/pytest now become methods in this class.
import unittest
import decay
import numpy as np
def exact_discrete_solution(n, I, a, theta, dt):
...
class TestDecay(unittest.TestCase):
def test_exact_discrete_solution(self):
theta = 0.8; a = 2; I = 0.1; dt = 0.8
Nt = int(8/dt) # no of steps
u, t = decay.solver(I=I, a=a, T=Nt*dt, dt=dt, theta=theta)
# Evaluate exact discrete solution on the mesh
u_de = np.array([exact_discrete_solution(n, I, a, theta, dt)
for n in range(Nt+1)])
diff = np.abs(u_de - u).max() # largest deviation
self.assertAlmostEqual(diff, 0, delta=1E-14)
def test_potential_integer_division(self):
...
self.assertAlmostEqual(diff, 0, delta=1E-14)
def test_read_command_line_positional(self):
...
for expected_arg, computed_arg in zip(expected, computed):
self.assertEqual(expected_arg, computed_arg)
def test_read_command_line_argparse(self):
...
if __name__ == '__main__':
unittest.main()