Debugging in Python

This chapter is taken from book A Primer on Scientific Programming with Python by H. P. Langtangen, 4th edition, Springer, 2014.

Application of the recipe

Let us illustrate the points above in a specific programming problem: implementation of the Midpoint rule for numerical integration. The Midpoint rule for approximating an integral $ \int_a^b f(x)dx $ reads $$ \begin{equation} I = h\sum_{i=1}^{n} f(a + (i-\frac{1}{2})h), \quad h = {b-a\over n}\tp \tag{1} \end{equation} $$ We just follow the individual steps in the recipe to develop the code.

1. Understand the problem. In this problem we must understand how to program the formula (1). Observe that we do not need to understand how the formula is derived, because we do not apply the derivation in the program. What is important, is to notice that the formula is an approximation of an integral. Comparing the result of the program with the exact value of the integral will in general show a discrepancy. Whether we have an approximation error or a programming error is always difficult to judge. We will meet this difficulty below.

2. Work out examples. As a test case we choose to integrate $$ \begin{equation} f(x) = \sin^{-1}(x)\tp \tag{2} \end{equation} $$ between 0 and $ \pi $. From a table of integrals we find that this integral equals $$ \begin{equation} \left\lbrack x\sin^{-1}(x) + \sqrt{1-x^2}\,\,\right\rbrack_{0}^{\pi}\tp \tag{3} \end{equation} $$ The formula (1) gives an approximation to this integral, so the program will (most likely) print out a result different from (3). It would therefore be very helpful to construct a calculation where there are no approximation errors. Numerical integration rules usually integrate some polynomial of low order exactly. For the Midpoint rule it is obvious, if you understand the derivation of this rule, that a constant function will be integrated exactly. We therefore also introduce a test problem where we integrate $ g(x)=1 $ from 0 to 10. The answer should be exactly 10.

Input and output: The input to the calculations is the function to integrate, the integration limits $ a $ and $ b $, and the $ n $ parameter (number of intervals) in the formula (1). The output from the calculations is the approximation to the integral.

3. Decide on a user interface. We find it easiest at this beginning stage to program the two functions $ f(x) $ and $ g(x) $ directly in the program. We also specify the corresponding integration limits $ a $ and $ b $ in the program, but we read a common $ n $ for both integrals from the command line. Note that this is not a flexible user interface, but it suffices as a start for creating a working program. A much better user interface is to read $ f $, $ a $, $ b $, and $ n $ from the command line, which will be done later in a more complete solution to the present problem.

4. Make algorithms. Like most mathematical programming problems, also this one has a generic part and an application part. The generic part is the formula (1), which is applicable to an arbitrary function $ f(x) $. The implementation should reflect that we can specify any Python function f(x) and get it integrated. This principle calls for calculating (1) in a Python function where the input to the computation ($ f $, $ a $, $ b $, $ n $) are arguments. The function heading can look as integrate(f, a, b, n), and the value of (1) is returned.

The test part of the program consists of defining the test functions $ f(x) $ and $ g(x) $ and writing out the calculated approximations to the corresponding integrals.

A first rough sketch of the program can then be

def integrate(f, a, b, n):
    # compute integral, store in I
    return I

def f(x):
...

def g(x):
...

# test/application part:
n = sys.argv[1]
I = integrate(g, 0, 10,  n)
print "Integral of g equals %g" % I
I = integrate(f, 0, pi,  n)
# calculate and print out the exact integral of f

The next step is to make a detailed implementation of the integrate function. Inside this function we need to compute the sum (1). In general, sums are computed by a for loop over the summation index, and inside the loop we calculate a term in the sum and add it to an accumulation variable. Here is the algorithm in Python code:

s = 0
for i in range(1, n+1):
    s = s + f(a + (i-0.5)*h)
I = s*h

5. Look up information. Our test function $ f(x)=\sin^{-1}(x) $ must be evaluated in the program. How can we do this? We know that many common mathematical functions are offered by the math module. It is therefore natural to check if this module has an inverse sine function. The best place to look for Python modules is the Python Standard Library [2] documentation, which has a search facility. Typing math brings up a link to the math module, there we find math.asin as the function we need. Alternatively, one can use the command line utility pydoc and write pydoc math to look up all the functions in the module.

In this simple problem, we use very basic programming constructs and there is hardly any need for looking at similar examples to get started with the problem solving process. We need to know how to program a sum, though, via a for loop and an accumulation variable for the sum.

6. Write the program. Here is our first attempt to write the program. You can find the whole code in the file integrate_v1.py.

def integrate(f, a, b, n):
    s = 0
    for i in range(1, n):
        s += f(a + i*h)
    return s

def f(x):
return asin(x)

def g(x):
return 1

# Test/application part
n = sys.argv[1]
I = integrate(g, 0, 10,  n)
print "Integral of g equals %g" % I
I = integrate(f, 0, pi,  n)
I_exact = pi*asin(pi) - sqrt(1 - pi**2) - 1
print "Integral of f equals %g (exact value is %g)' % \
  (I, I_exact)

7. Run the program. We try a first execution from IPython

In [1]: run integrate_v1.py

Unfortunately, the program aborts with an error:

  File "integrate_v1.py", line 8
    return asin(x)
         ^
IndentationError: expected an indented block

We go to line 8 and look at that line and the surrounding code:

def f(x):
return asin(x)

Python expects that the return line is indented, because the function body must always be indented. By the way, we realize that there is a similar error in the g(x) function as well. We correct these errors:

def f(x):
    return asin(x)

def g(x):
    return 1

Running the program again makes Python respond with

  File "integrate_v1.py", line 24
    (I, I_exact)
               ^
SyntaxError: EOL while scanning single-quoted string

There is nothing wrong with line 24, but line 24 is a part of the statement starting on line 23:

print "Integral of f equals %g (exact value is %g)' % \ 
      (I, I_exact)

A SyntaxError implies that we have written illegal Python code. Inspecting line 23 reveals that the string to be printed starts with a double quote, but ends with a single quote. We must be consistent and use the same enclosing quotes in a string. Correcting the statement,

print "Integral of f equals %g (exact value is %g)" % \ 
      (I, I_exact)

and rerunning the program yields the output

Traceback (most recent call last):
  File "integrate_v1.py", line 18, in <module>
    n = sys.argv[1]
NameError: name 'sys' is not defined

Obviously, we need to import sys before using it. We add import sys and run again:

Traceback (most recent call last):
  File "integrate_v1.py", line 19, in <module>
    n = sys.argv[1]
IndexError: list index out of range

This is a very common error: we index the list sys.argv out of range because we have not provided enough command-line arguments. Let us use $ n=10 $ in the test and provide that number on the command line:

In [5]: run integrate_v1.py 10

We still have problems:

Traceback (most recent call last):
  File "integrate_v1.py", line 20, in <module>
    I = integrate(g, 0, 10,  n)
  File "integrate_v1.py", line 7, in integrate
    for i in range(1, n):
TypeError: range() integer end argument expected, got str.

It is the final File line that counts (the previous ones describe the nested functions calls up to the point where the error occurred). The error message for line 7 is very precise: the end argument to range, n, should be an integer, but it is a string. We need to convert the string sys.argv[1] to int before sending it to the integrate function:

n = int(sys.argv[1])

After a new edit-and-run cycle we have other error messages waiting:

Traceback (most recent call last):
  File "integrate_v1.py", line 20, in <module>
    I = integrate(g, 0, 10,  n)
  File "integrate_v1.py", line 8, in integrate
    s += f(a + i*h)
NameError: global name 'h' is not defined

The h variable is used without being assigned a value. From the formula (1) we see that $ h=(b-a)/n $, so we insert this assignment at the top of the integrate function:

def integrate(f, a, b, n):
    h = (b-a)/n
    ...

A new run results in a new error:

Integral of g equals 9
Traceback (most recent call last):
  File "integrate_v1.py", line 23, in <module>
    I = integrate(f, 0, pi,  n)
NameError: name 'pi' is not defined

Looking carefully at all output, we see that the program managed to call the integrate function with g as input and write out the integral. However, in the call to integrate with f as argument, we get a NameError, saying that pi is undefined. When we wrote the program we took it for granted that pi was $ \pi $, but we need to import pi from math to get this variable defined, before we call integrate:

from math import pi
I = integrate(f, 0, pi,  n)

The output of a new run is now

Integral of g equals 9
Traceback (most recent call last):
  File "integrate_v1.py", line 24, in <module>
    I = integrate(f, 0, pi,  n)
  File "integrate_v1.py", line 9, in integrate
    s += f(a + i*h)
  File "integrate_v1.py", line 13, in f
    return asin(x)
NameError: global name 'asin' is not defined

A similar error occurred: asin is not defined as a function, and we need to import it from math. We can either do a

from math import pi, asin

or just do the rough

from math import *

to avoid any further errors with undefined names from the math module (we will get one for the sqrt function later, so we simply use the last "import all" kind of statement).

There are still more errors:

Integral of g equals 9
Traceback (most recent call last):
  File "integrate_v1.py", line 24, in <module>
    I = integrate(f, 0, pi,  n)
  File "integrate_v1.py", line 9, in integrate
    s += f(a + i*h)
  File "integrate_v1.py", line 13, in f
    return asin(x)
ValueError: math domain error

Now the error concerns a wrong x value in the f function. Let us print out x:

def f(x):
    print x
    return asin(x)

The output becomes

Integral of g equals 9
0.314159265359
0.628318530718
0.942477796077
1.25663706144
Traceback (most recent call last):
  File "integrate_v1.py", line 25, in <module>
    I = integrate(f, 0, pi,  n)
  File "integrate_v1.py", line 9, in integrate
    s += f(a + i*h)
  File "integrate_v1.py", line 14, in f
    return asin(x)
ValueError: math domain error

We see that all the asin(x) computations are successful up to and including $ x=0.942477796077 $, but for $ x=1.25663706144 $ we get an error. A math domain error may point to a wrong $ x $ value for $ \sin^{-1}(x) $ (recall that the domain of a function specifies the legal $ x $ values for that function).

To proceed, we need to think about the mathematics of our problem: Since $ \sin (x) $ is always between $ -1 $ and $ 1 $, the inverse sine function cannot take $ x $ values outside the interval $ [-1,1] $. The problem is that we try to integrate $ \sin^{-1}(x) $ from 0 to $ \pi $, but only integration limits within $ [-1,1] $ make sense (unless we allow for complex-valued trigonometric functions). Our test problem is hence wrong from a mathematical point of view. We need to adjust the limits, say 0 to 1 instead of 0 to $ \pi $. The corresponding program modification reads

I = integrate(f, 0, 1,  n)

We run again and get

Integral of g equals 9
0
0
0
0
0
0
0
0
0
Traceback (most recent call last):
  File "integrate_v1.py", line 26, in <module>
    I_exact = pi*asin(pi) - sqrt(1 - pi**2) - 1
ValueError: math domain error

It is easy to go directly to the ValueError now, but one should always examine the output from top to bottom. If there is strange output before Python reports an error, there may be an error indicated by our print statements. This is not the case in the present example, but it is a good habit to start at the top of the output anyway. We see that all our print x statements inside the f function say that x is zero. This must be wrong - the idea of the integration rule is to pick $ n $ different points in the integration interval $ [0,1] $.

Our f(x) function is called from the integrate function. The argument to f, a + i*h, is seemingly always 0. Why? We print out the argument and the values of the variables that make up the argument:

def integrate(f, a, b, n):
    h = (b-a)/n
    s = 0
    for i in range(1, n):
        print a, i, h, a+i*h
        s += f(a + i*h)
    return s

Running the program shows that h is zero and therefore a+i*h is zero.

Why is h zero? We need a new print statement in the computation of h:

def integrate(f, a, b, n):
    h = (b-a)/n
    print b, a, n, h
    ...

The output shows that a, b, and n are correct. Now we have encountered a very common error in Python version 2 and C-like programming languages: integer division. The formula $ (1-0)/10=1/10 $ is zero according to integer division. The reason is that a and b are specified as 0 and 1 in the call to integrate, and 0 and 1 imply int objects. Then b-a becomes an int, and n is an int, causing an int/int division. We must ensure that b-a is float to get the right mathematical division in the computation of h:

def integrate(f, a, b, n):
    h = float(b-a)/n
    ...

Thinking that the problem with wrong $ x $ values in the inverse sine function is resolved, we may remove all the print statements in the program, and run again.

The output now reads

Integral of g equals 9
Traceback (most recent call last):
  File "integrate_v1.py", line 25, in <module>
    I_exact = pi*asin(pi) - sqrt(1 - pi**2) - 1
ValueError: math domain error

That is, we are back to the ValueError we have seen before. The reason is that asin(pi) does not make sense, and the argument to sqrt is negative. The error is simply that we forgot to adjust the upper integration limit in the computation of the exact result. This is another very common error. The correct line is

I_exact = 1*asin(1) - sqrt(1 - 1**2) - 1

We could have avoided the error by introducing variables for the integration limits, and a function for $ \int f(x)dx $ would make the code cleaner:

a = 0; b = 1
def int_f_exact(x):
    return x*asin(x) - sqrt(1 - x**2)
I_exact = int_f_exact(b) - int_f_exact(a)

Although this is more work than what we initially aimed at, it usually saves time in the debugging phase to do things this proper way.

Eventually, the program seems to work! The output is just the result of our two print statements:

Integral of g equals 9
Integral of f equals 5.0073 (exact value is 0.570796)

8. Verify the results. Now it is time to check if the numerical results are correct. We start with the simple integral of 1 from 0 to 10: the answer should be 10, not 9. Recall that for this particular choice of integration function, there is no approximation error involved (but there could be a small round-off error). Hence, there must be a programming error.

To proceed, we need to calculate some intermediate mathematical results by hand and compare these with the corresponding statements in the program. We choose a very simple test problem with $ n=2 $ and $ h=(10-0)/2=5 $. The formula (1) becomes $$ \begin{equation*} I = 5\cdot\left( 1 + 1\right) = 10\tp\end{equation*} $$ Running the program with $ n=2 $ gives

Integral of g equals 1

We insert some print statements inside the integrate function:

def integrate(f, a, b, n):
    h = float(b-a)/n
    s = 0
    for i in range(1, n):
        print 'i=%d, a+i*h=%g' % (i, a+i*h)
        s += f(a + i*h)
    return s

Here is the output:

i=1, a+i*h=5
Integral of g equals 1
i=1, a+i*h=0.5
Integral of f equals 0.523599 (exact value is 0.570796)

There was only one pass in the i loop in integrate. According to the formula, there should be $ n $ passes, i.e., two in this test case. The limits of i must be wrong. The limits are produced by the call range(1,n). We recall that such a call results in integers going from 1 up to n, but not including n. We need to include n as value of i, so the right call to range is range(1,n+1).

We make this correction and rerun the program. The output is now

i=1, a+i*h=5
i=2, a+i*h=10
Integral of g equals 2
i=1, a+i*h=0.5
i=2, a+i*h=1
Integral of f equals 2.0944 (exact value is 0.570796)

The integral of 1 is still not correct. We need more intermediate results!

In our quick hand calculation we knew that $ g(x)=1 $ so all the $ f(a+(i-\frac{1}{2})h) $ evaluations were rapidly replaced by ones. Let us now compute all the $ x $ coordinates $ a+(i-\frac{1}{2})h $ that are used in the formula: $$ \begin{equation*} i=1:\ a+(i-\frac{1}{2})h=2.5,\quad i=2:\ a+(i-\frac{1}{2})h=7.5\tp \end{equation*} $$ Looking at the output from the program, we see that the argument to g has a different value - and fortunately we realize that the formula we have coded is wrong. It should be a+(i-0.5)*h.

We correct this error and run the program:

i=1, a+(i-0.5)*h=2.5
i=2, a+(i-0.5)*h=7.5
Integral of g equals 2
...

Still the integral is wrong. At this point you may give up programming, but the more skills you pick up in debugging, the more fun it is to hunt for errors! Debugging is like reading an exciting criminal novel: the detective follows different ideas and tracks, but never gives up before the culprit is caught.

Now we read the code more carefully and compare expressions with those in the mathematical formula. We should, of course, have done this already when writing the program, but it is easy to get excited when writing code and hurry for the end. This ongoing story of debugging probably shows that reading the code carefully can save much debugging time. (Actually, being extremely careful with what you write, and comparing all formulas with the mathematics, may be the best way to get more spare time when taking a programming course!)

We clearly add up all the $ f $ evaluations correctly, but then this sum must be multiplied by $ h $, and we forgot that in the code. The return statement in integrate must therefore be modified to

    return s*h

Eventually, the output is

Integral of g equals 10
Integral of f equals 0.568484 (exact value is 0.570796)

and we have managed to integrate a constant function in our program! Even the second integral looks promising!

To judge the result of integrating the inverse sine function, we need to run several increasing $ n $ values and see that the approximation gets better. For $ n=2,10,100,1000 $ we get 0.550371, 0.568484, 0.570714, 0.570794, to be compared to the exact value 0.570796. (This is not the mathematically exact value, because it involves computations of $ \sin^{-1}(x) $, which is only approximately calculated by the asin function in the math module. However, the approximation error is very small ($ \sim 10^{-16} $).) The decreasing error provides evidence for a correct program, but it is not a strong proof. We should try out more functions. In particular, linear functions are integrated exactly by the Midpoint rule. We can also measure the speed of the decrease of the error and check that the speed is consistent with the properties of the Midpoint rule, but this is a mathematically more advanced topic.

The very important lesson learned from these debugging sessions is that you should start with a simple test problem where all formulas can be computed by hand. If you start out with $ n=100 $ and try to integrate the inverse sine function, you will have a much harder job with tracking down all the errors.

9. Use a debugger. Another lesson learned from these sessions is that we needed many print statements to see intermediate results. It is an open question if it would be more efficient to run a debugger and stop the code at relevant lines. In an edit-and-run cycle of the type we met here, we frequently need to examine many numerical results, correct something, and look at all the intermediate results again. Plain print statements are often better suited for this massive output than the pure manual operation of a debugger, unless one writes a program to automate the interaction with the debugger.

The correct code for the implementation of the Midpoint rule is found in integrate_v2.py. Some readers might be frightened by all the energy it took to debug this code, but this is just the nature of programming. The experience of developing programs that finally work is very awarding.

People only become computer programmers if they're obsessive about details, crave power over machines, and can bear to be told day after day exactly how stupid they are. Gregory J. E. Rawlins [3], computer scientist.

Refining the user interface

We briefly mentioned that the chosen user interface, where the user can only specify $ n $, is not particularly user friendly. We should allow $ f $, $ a $, $ b $, and $ n $ to be specified on the command line. Since $ f $ is a function and the command line can only provide strings to the program, we may use the StringFunction object from scitools.std to convert a string expression for the function to be integrated to an ordinary Python function. The other parameters should be easy to retrieve from the command line by looking up the sys.argv list. We enclose the input statements in a try-except block, here with a specific exception type IndexError (because an index in sys.argv out of bounds is the only type of error we expect to handle):

try:
    f_formula = sys.argv[1]
    a = eval(sys.argv[2])
    b = eval(sys.argv[3])
    n = int(sys.argv[4])
except IndexError:
    print 'Usage: %s f-formula a b n' % sys.argv[0]
    sys.exit(1)

Note that the use of eval allows us to specify a and b as pi or exp(5) or another mathematical expression.

With the input above we can perform the general task of the program:

from scitools.std import StringFunction
f = StringFunction(f_formula)
I = integrate(f, a, b, n)
print I

Writing a test function

Instead of having these test statements as a main program we follow the good habits and make a module with

the integrate function,
a test_integrate function for testing the integrate function's ability to exactly integrate linear functions,
a main function for reading data from the command line and calling integrate for the user's problem at hand.

Any module should also have a test block, as well as doc strings for the module itself and all functions.

The test_integrate function can perform a loop over some specified n values and check that the Midpoint rule integrates a linear function exactly. As always, we must be prepared for round-off errors, so "exactly" means errors less than (say) $ 10^{-14} $. The relevant code becomes

def test_integrate():
    """Check that linear functions are integrated exactly."""

    def g(x):
        return p*x + q   # general linear function

    def int_g_exact(x):  # integral of g(x)
        return 0.5*p*x**2 + q*x

    a = -1.2; b = 2.8    # "arbitrary" integration limits
    p = -2;   q = 10
    success = True        # True if all tests below are passed
    for n in 1, 10, 100:
        I = integrate(g, a, b, n)
        I_exact = int_g_exact(b) - int_g_exact(a)
        error = abs(I_exact - I)
        if error > 1E-14:
            success = False
    assert success

We have followed the programming standard that will make this test function automatically work with the nose test framework:

the name of the function starts with test_,
the function has no arguments,
checks of whether a test is passed or not are done with assert.

The assert success statement raises an AssertionError exception if success is false, otherwise nothing happens. The nose testing framework searches for functions whose name start with test_, execute each function, and record if an AssertionError is raised. It is overkill to use nose for small programs, but in larger projects with many functions in many files, nose can run all tests with a short command and write back a notification that all tests passed.

The main function is simply a wrapping of the main program given above. The test block may call or test_integrate function or main, depending on whether the user will test the module or use it:

if __name__ == '__main__':
    if sys.argv[1] == 'verify':
        verify()
    else:
        # Compute the integral specified on the command line
        main()

Here is a short demo computing $ \int_0^{2\pi}(\cos (x) + \sin(x))dx $ with the aid of the integrate.py file:

integrate.py 'cos(x)+sin(x)' 0 2*pi 10
-3.48786849801e-16