This chapter is taken from book A Primer on Scientific Programming with Python by H. P. Langtangen, 4th edition, Springer, 2014.

Run any operating system command

The simplest way of running another program from Python is to use os.system:

cmd = 'python myprog.py 21 --mass 4'   # command to be run
failure = os.system(cmd)
if failure:
    print 'Execution of "%s" failed!\n' % cmd
    sys.exit(1)

The recommended way to run operating system commands is to use the subprocess module. The above command is equivalent to

import subprocess
cmd = 'python myprog.py 21 --mass 4'
failure = subprocess.call(cmd, shell=True)

# or
failure = subprocess.call(
            ['python', 'myprog.py', '21', '--mass', '4'])

The output of an operating system command can be stored in a string object:

try:
    output = subprocess.check_output(cmd, shell=True,
                                     stderr=subprocess.STDOUT)
except subprocess.CalledProcessError:
    print 'Execution of "%s" failed!\n' % cmd
    sys.exit(1)

# Process output
for line in output.splitlines():
    ...

The stderr argument ensures that the output string contains everything that the command cmd wrote to both standard output and standard error.

The constructions above are mainly used for running stand-alone programs. Any file or folder listing or manipulation should be done by the functionality in the os and shutil modules.

Split file or folder name

Given data/file1.dat as a file path relative to the home folder /users/me ($HOME/data/file1.dat in Unix). Python has tools for extracting the complete folder name /users/me/data, the basename file1.dat, and the extension .dat:

>>> path = os.path.join(os.environ['HOME'], 'data', 'file1.dat')
>>> path
'/users/me/data/file1.dat'
>>> foldername, basename = os.path.split(path)
>>> foldername
'/users/me/data'
>>> basename
'file1.dat'
>>> stem, ext = os.path.splitext(basename)
>>> stem
'file1'
>>> ext
'.dat'
>>> outfile = stem + '.out'
>>> outfile
'file1.out'

Variable number of function arguments

Arguments to Python functions are of four types:

The corresponding general function definition can be sketched as

def f(pos1, pos2, key1=val1, key2=val2, *args, **kwargs):

Here, pos1 and pos2 are positional arguments, key1 and key2 are keyword arguments, args is a tuple holding a variable number of positional arguments, and kwargs is a dictionary holding a variable number of keyword arguments. This document describes how to program with the args and kwargs variables and why these are handy in many situations.

Variable number of positional arguments

Let us start by making a function that takes an arbitrary number of arguments and computes their sum:

>>> def add(*args):
...     print 'args:', args
...     s = 0
...     for arg in args:
...         s = s + arg
...     return s
...
>>> add(1)
args: (1,)
1
>>> add(1,5,10)
args: (1, 5, 10)
16

We observe that args is a tuple and that all the arguments we provide in a call to add are stored in args.

Combination of ordinary positional arguments and a variable number of arguments is allowed, but the *args argument must appear after the ordinary positional arguments, e.g.,

def f(pos1, pos2, pos3, *args):

In each call to f we must provide at least three arguments. If more arguments are supplied in the call, these are collected in the args tuple inside the f function.

Example

Consider a mathematical function with one independent variable \( t \) and a parameter \( v_0 \), as in \( y(t;v_0) = v_0t - \frac{1}{2}gt^2 \). A more general case with \( n \) parameters is \( f(x; p_1,\ldots,p_n) \). The Python implementation of such functions can take both the independent variable and the parameters as arguments: y(t, v0) and f(x, p1, p2, ...,pn). Suppose that we have a general library routine that operates on functions of one variable. The routine can, e.g., perform numerical differentiation, integration, or root finding. A simple example is a numerical differentiation function

def diff(f, x, h):
    return (f(x+h) - f(x))/h

This diff function cannot be used with functions f that take more than one argument. For example, passing an y(t, v0) function as f leads to the exception

TypeError: y() takes exactly 2 arguments (1 given)

A good solution to this problem is to make a class Y that has a __call__(self, t) method and that stores \( v_0 \) as an attribute. Here we shall describe an alternative solution that allows our y(t, v0) function to be used as is.

The idea is that we pass additional arguments for the parameters in the f function through the diff function. That is, we view the f function as f(x, *f_prms) in diff. Our diff routine can then be written as

def diff(f, x, h, *f_prms):
    print 'x:', x, 'h:', h, 'f_prms:', f_prms
    return (f(x+h, *f_prms) - f(x, *f_prms))/h

Before explaining this function in detail, we demonstrate that it works in an example:

def y(t, v0):
    g = 9.81
     return v0*t - 0.5*g*t**2

dydt = diff(y, 0.1, 1E-9, 3)  # t=0.1, h=1E-9, v0=3

The output from the call to diff becomes

x: 0.1 h: 1e-09 f_prms: (3,)

The point is that the v0 parameter, which we want to pass on to our y function, is now stored in f_prms. Inside the diff function, calling

f(x, *f_prms)

is the same as if we had written

f(x, f_prms[0], f_prms[1], ...)

That is, *f_prms in a call takes all the values in the tuple *f_prms and places them after each other as positional arguments. In the present example with the y function, f(x, *f_prms) implies f(x, f_prms[0]), which for the current set of argument values in our example becomes a call y(0.1, 3).

For a function with many parameters,

def G(x, t, A, a, w):
    return A*exp(-a*t)*sin(w*x)

the output from

dGdx = diff(G, 0.5, 1E-9, 0, 1, 0.6, 100)

becomes

x: 0.5 h: 1e-09 f_prms: (0, 1, 1.5, 100)

We pass here the arguments t, A, a, and w, in that sequence, as the last four arguments to diff, and all the values are stored in the f_prms tuple.

The diff function also works for a plain function f with one argument:

from math import sin
mycos = diff(sin, 0, 1E-9)

In this case, *f_prms becomes an empty tuple, and a call like f(x, *f_prms) is just f(x).

The use of a variable set of arguments for sending problem-specific parameters through a general library function, as we have demonstrated here with the diff function, is perhaps the most frequent use of *args-type arguments.

Variable number of keyword arguments

A simple test function

>>> def test(**kwargs):
...     print kwargs

exemplifies that kwargs is a dictionary inside the test function, and that we can pass any set of keyword arguments to test, e.g.,

>>> test(a=1, q=9, method='Newton')
{'a': 1, 'q': 9, 'method': 'Newton'}

We can combine an arbitrary set of positional and keyword arguments, provided all the keyword arguments appear at the end of the call:

>>> def test(*args, **kwargs):
...     print args, kwargs
...
>>> test(1,3,5,4,a=1,b=2)
(1, 3, 5, 4) {'a': 1, 'b': 2}

From the output we understand that all the arguments in the call where we provide a name and a value are treated as keyword arguments and hence placed in kwargs, while all the remaining arguments are positional and placed in args.

Example

We may extend the example in the section Variable number of positional arguments to make use of a variable number of keyword arguments instead of a variable number of positional arguments. Suppose all functions with parameters in addition to an independent variable take the parameters as keyword arguments. For example,

def y(t, v0=1):
    g = 9.81
    return v0*t - 0.5*g*t**2

In the diff function we transfer the parameters in the f function as a set of keyword arguments **f_prms:

def diff(f, x, h=1E-10, **f_prms):
    print 'x:', x, 'h:', h, 'f_prms:', f_prms
    return (f(x+h, **f_prms) - f(x, **f_prms))/h

In general, the **f_prms argument in a call

f(x, **f_prms)

implies that all the key-value pairs in **f_prms are provided as keyword arguments:

f(x, key1=f_prms[key1], key2=f_prms[key2], ...)

In our special case with the y function and the call

dydt = diff(y, 0.1, h=1E-9, v0=3)

f(x, **f_prms) becomes y(0.1, v0=3). The output from diff is now

x: 0.1 h: 1e-09 f_prms: {'v0': 3}

showing explicitly that our v0=3 in the call to diff is placed in the f_prms dictionary.

The G function from the section Variable number of positional arguments can also have its parameters as keyword arguments:

def G(x, t=0, A=1, a=1, w=1):
    return A*exp(-a*t)*sin(w*x)

We can now make the call

dGdx = diff(G, 0.5, h=1E-9, t=0, A=1, w=100, a=1.5)

and view the output from diff,

x: 0.5 h: 1e-09 f_prms: {'A': 1, 'a': 1.5, 't': 0, 'w': 100}

to see that all the parameters get stored in f_prms. The h parameter can be placed anywhere in the collection of keyword arguments, e.g.,

dGdx = diff(G, 0.5, t=0, A=1, w=100, a=1.5, h=1E-9)

We can allow the f function of one variable and a set of parameters to have the general form f(x, *f_args, **f_kwargs). That is, the parameters can either be positional or keyword arguments. The diff function must take the arguments *f_args and **f_kwargs and transfer these to f:

def diff(f, x, h=1E-10, *f_args, **f_kwargs):
    print f_args, f_kwargs
    return (f(x+h, *f_args, **f_kwargs) -
            f(x,   *f_args, **f_kwargs))/h

This diff function gives the writer of an f function full freedom to choose positional and/or keyword arguments for the parameters. Here is an example of the G function where we let the t parameter be positional and the other parameters be keyword arguments:

def G(x, t, A=1, a=1, w=1):
    return A*exp(-a*t)*sin(w*x)

A call

dGdx = diff(G, 0.5, 1E-9, 0, A=1, w=100, a=1.5)

gives the output

(0,) {'A': 1, 'a': 1.5, 'w': 100}

showing that t is put in f_args and transferred as positional argument to G, while A, a, and w are put in f_kwargs and transferred as keyword arguments. We remark that in the last call to diff, h and t must be treated as positional arguments, i.e., we cannot write h=1E-9 and t=0 unless all arguments in the call are on the name=value form.

In the case we use both *f_args and **f_kwargs arguments in f and there is no need for these arguments, *f_args becomes an empty tuple and **f_kwargs becomes an empty dictionary. The example

mycos = diff(sin, 0)

shows that the tuple and dictionary are indeed empty since diff just prints out

() {}

Therefore, a variable set of positional and keyword arguments can be incorporated in a general library function such as diff without any disadvantage, just the benefit that diff works with different types of f functions: parameters as global variables, parameters as additional positional arguments, parameters as additional keyword arguments, or parameters as instance variables.

The program varargs1.py in the src/varargs folder implements the examples in this document.

Evaluating program efficiency

Making time measurements

The term time has multiple meanings on a computer. The elapsed time or wall clock time is the same time as you can measure on a watch or wall clock, while CPU time is the amount of time the program keeps the central processing unit busy. The system time is the time spent on operating system tasks like I/O. The concept user time is the difference between the CPU and system times. If your computer is occupied by many concurrent processes, the CPU time of your program might be very different from the elapsed time.

The time module

Python has a time module with some useful functions for measuring the elapsed time and the CPU time:

import time
e0 = time.time()     # elapsed time since the epoch
c0 = time.clock()    # total CPU time spent in the program so far
<do tasks...>
elapsed_time = time.time() - e0
cpu_time = time.clock() - c0

The term epoch means initial time (time.time() would return 0), which is 00:00:00 January 1, 1970. The time module also has numerous functions for nice formatting of dates and time, and the newer datetime module has more functionality and an improved interface. Although the timing has a finer resolution than seconds, one should construct test cases that last some seconds to obtain reliable results.

Using timeit from IPython

To measure the efficiency of a certain set of statements, an expression, or a function call, the code should be run a large number of times so the overall CPU time is of order seconds. Python's timeit module has functionality for running a code segment repeatedly. The simplest and most convenient way of using timeit is within an IPython shell. Here is a session comparing the efficiency of sin(1.2) versus math.sin(1.2):

In [1]: import math

In [2]: from math import sin

In [3]: %timeit sin(1.2)
10000000 loops, best of 3: 198 ns per loop

In [4]: %timeit math.sin(1.2)
1000000 loops, best of 3: 258 ns per loop

That is, looking up sin through the math prefix degrades the performance by a factor of \( 258/198\approx 1.3 \).

Any statement, including function calls, can be timed the same way. Timing of multiple statements is possible by using %%timeit. The timeit module can be used inside ordinary programs as demonstrated in the file pow_eff.py.

Hardware information

Along with CPU time measurements it is often convenient to print out information about the hardware on which the experiment was done. Python has a module platform with information on the current hardware. The function scitools.misc.hardware_info applies the platform module and other modules to extract relevant hardware information. A sample call is

>>> import scitools.misc, pprint
>>> pprint.pprint(scitools.misc.hardware_info())
{'numpy.distutils.cpuinfo.cpu.info': [
 {'address sizes': '40 bits physical, 48 bits virtual',
  'bogomips': '4598.87',
  'cache size': '4096 KB',
  'cache_alignment': '64',
  'cpu MHz': '2299.435',
  ...
 },
 'platform module': {
  'identifier': 'Linux-3.11.0-12-generic-x86_64-with-Ubuntu-13.10',
  'python build': ('default', 'Sep 19 2013 13:48:49'),
  'python version': '2.7.5+',
  'uname': ('Linux', 'hpl-ubuntu2-mac11', '3.11.0-12-generic',
            '#19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013',
            'x86_64', 'x86_64')}}
}

Profiling Python programs

A profiler computes the time spent in the various functions of a program. From the timings a ranked list of the most time-consuming functions can be created. This is an indispensable tool for detecting bottlenecks in the code, and you should always perform a profiling before spending time on code optimization. The golden rule is to first write an easy-to-understand program, then verify it, then profile it, and then think about optimization.

Premature optimization is the root of all evil.
Donald Knuth, computer scientist, 1938-.

Python 2.7 comes with two recommended profilers, implemented in the modules cProfile and profiles. The section The Python Profilers in the Python Standard Library documentation [12] has a good introduction to the usage of these modules. The results produced by the modules are normally processed by a special statistics utility pstats developed for analyzing profiling results. The usage of the profile, cProfile, and pstats modules is straightforward, but somewhat tedious. The SciTools package therefore comes with a command scitools profiler that allows you to profile any program (say) m.py by just writing

Terminal> scitools profiler m.py c1 c2 c3

Here, c1, c2, and c3 are command-line arguments to m.py.

A sample output might read

    1082 function calls (728 primitive calls) in 17.890 CPU seconds

Ordered by: internal time
List reduced from 210 to 20 due to restriction <20>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     5    5.850    1.170    5.850    1.170 m.py:43(loop1)
     1    2.590    2.590    2.590    2.590 m.py:26(empty)
     5    2.510    0.502    2.510    0.502 m.py:32(myfunc2)
     5    2.490    0.498    2.490    0.498 m.py:37(init)
     1    2.190    2.190    2.190    2.190 m.py:13(run1)
     6    0.050    0.008   17.720    2.953 funcs.py:126(timer)
...

In this test, loop1 is the most expensive function, using 5.85 seconds, which is to be compared with 2.59 seconds for the next most time-consuming function, empty. The tottime entry is the total time spent in a specific function, while cumtime reflects the total time spent in the function and all the functions it calls. We refer to the documentation of the profiling tools in the Python Standard Library documentation for detailed information on how to interpret the output.

The CPU time of a Python program typically increases with a factor of about five when run under the administration of the profile module. Nevertheless, the relative CPU time among the functions are not much affected by the profiler overhead.

References

  1. Python Programming Language. http://python.org.
  2. T. E. Oliphant et al.. NumPy Array Processing Package for Python, http://www.numpy.org.
  3. T. E. Oliphant. Python for Scientific Computing, Computing in Science & Engineering, 9, 2007.
  4. J. D. Hunter et al.. Matplotlib: Software Package for 2D Graphics, http://matplotlib.org/.
  5. J. D. Hunter. Matplotlib: a 2D Graphics Environment, Computing in Science & Engineering, 9, 2007.
  6. F. Perez, B. E. Granger et al.. IPython Software Package for Interactive Scientific Computing, http://ipython.org/.
  7. F. Perez and B. E. Granger. IPython: a System for Interactive Scientific Computing, Computing in Science & Engineering, 9, 2007.
  8. ScientificPython Software Package. http://starship.python.net/crew/hinsen.
  9. O. Certik et al.. SymPy: Python library for symbolic mathematics, http://sympy.org.
  10. E. Jones, T. E. Oliphant, P. Peterson et al.. SciPy Scientific Computing Library for Python, http://scipy.org.
  11. H. P. Langtangen. Debugging in Python, http://hplgit.github.io/primer.html/doc/pub/debug.
  12. P. S. Foundation. The Python Standard Library, http://docs.python.org/2/library/.