Loading [MathJax]/extensions/TeX/boldsymbol.js

$\newcommand{\tp}{\thinspace .}$

This chapter is taken from the book A Primer on Scientific Programming with Python by H. P. Langtangen, 5th edition, Springer, 2016.

Making modules

Sometimes you want to reuse a function from an old program in a new program. The simplest way to do this is to copy and paste the old source code into the new program. However, this is not good programming practice, because you then over time end up with multiple identical versions of the same function. When you want to improve the function or correct a bug, you need to remember to do the same update in all files with a copy of the function, and in real life most programmers fail to do so. You easily end up with a mess of different versions with different quality of basically the same code. Therefore, a golden rule of programming is to have one and only one version of a piece of code. All programs that want to use this piece of code must access one and only one place where the source code is kept. This principle is easy to implement if we create a module containing the code we want to reuse later in different programs.

When reading this, you probably know how to use a ready-made module. For example, if you want to compute the factorial $k!=k(k-1)(k-2)\cdots 1$ , there is a function factorial in Python's math module that can be help us out. The usage goes with the math prefix,

import math
value = math.factorial(5)

or without,

from math import factorial
# or: from math import *
value = factorial(5)

Now you shall learn how to make your own Python modules. There is hardly anything to learn, because you just collect all the functions that constitute the module in one file, say with name mymodule.py. This file is automatically a module, with name mymodule, and you can import functions from this module in the standard way. Let us make everything clear in detail by looking at an example.

Example: Interest on bank deposits

The classical formula for the growth of money in a bank reads $\begin{equation} A = A_0\left( 1 + {p\over 360\cdot 100}\right)^n, \tag{2} \end{equation}$ where $A_0$ is the initial amount of money, and $A$ is the present amount after $n$ days with $p$ percent annual interest rate. (The formula applies the convention that the rate per day is computed as $p/360$ , while $n$ counts the actual number of days the money is in the bank, see the Wikipedia entry Day count convention for explanation. There is a handy Python module datetime for computing the number of days between two dates.)

Equation (2) involves four parameters: $A$ , $A_0$ , $p$ , and $n$ . We may solve for any of these, given the other three: $\begin{align} A_0 &= A\left( 1 + {p\over 360\cdot 100}\right)^{-n}, \tag{3}\\ n &= \frac{\ln {A\over A_0}}{\ln \left( 1 + {p\over 360\cdot 100}\right)} , \tag{4}\\ p &= 360\cdot 100 \left(\left({A\over A_0}\right)^{1/n} - 1\right)\tp \tag{5} \end{align}$ Suppose we have implemented (2)-(5) in four functions:

from math import log as ln

def present_amount(A0, p, n):
    return A0*(1 + p/(360.0*100))**n

def initial_amount(A, p, n):
    return A*(1 + p/(360.0*100))**(-n)

def days(A0, A, p):
    return ln(A/A0)/ln(1 + p/(360.0*100))

def annual_rate(A0, A, n):
    return 360*100*((A/A0)**(1.0/n) - 1)

We want to make these functions available in a module, say with name interest, so that we can import functions and compute with them in a program. For example,

from interest import days
A0 = 1; A = 2; p = 5
n = days(A0, 2, p)
years = n/365.0
print 'Money has doubled after %.1f years' % years

How to make the interest module is described next.

Collecting functions in a module file

To make a module of the four functions present_amount, initial_amount, days, and annual_rate, we simply open an empty file in a text editor and copy the program code for all the four functions over to this file. This file is then automatically a Python module provided we save the file under any valid filename. The extension must be .py, but the module name is only the base part of the filename. In our case, the filename interest.py implies a module name interest. To use the annual_rate function in another program we simply write, in that program file,

from interest import annual_rate

or we can write

from interest import *

to import all four functions, or we can write

import interest

and access individual functions as interest.annual_rate and so forth.

Test block

It is recommended to only have functions and not any statements outside functions in a module. The reason is that the module file is executed from top to bottom during the import. With function definitions only in the module file, and no main program, there will be no calculations or output from the import, just definitions of functions. This is the desirable behavior. However, it is often convenient to have test or demonstrations in the module file, and then there is need for a main program. Python allows a very fortunate construction to let the file act both as a module with function definitions only (and no main program) and as an ordinary program we can run, with functions and a main program.

This two-fold "magic" is realized by putting the main program after an if test of the form

if __name__ == '__main__':
    <block of statements>

The __name__ variable is automatically defined in any module and equals the module name if the module file is imported in another program, or __name__ equals the string '__main__' if the module file is run as a program. This implies that the <block of statements> part is executed if and only if we run the module file as a program. We shall refer to <block of statements> as the test block of a module.

Example on a test block in a minimalistic module

A very simple example will illustrate how this works. Consider a file mymod.py with the content

def add1(x):
    return x + 1

if __name__ == '__main__':
    print 'run as program'
    import sys
    print add1(float(sys.argv[1]))

We can import mymod as a module and make use of the add1 function:

>>> import mymod
>>> print mymod.add1(4)
5

During the import, the if test is false, and the only the function definition is executed. However, if we run mymod.py as a program,

mymod.py 5
run as program
6

the if test becomes true, and the print statements are executed.

Tip on easy creation of a module.

If you have some functions and a main program in some program file, just move the main program to the test block. Then the file can act as a module, giving access to all the functions in other files, or the file can be executed from the command line, in the same way as the original program.

A test block in the `interest` module

Let us write a little main program for demonstrating the interest module in a test block. We read $p$ from the command line and write out how many years it takes to double an amount with that interest rate:

if __name__ == '__main__':
    import sys
    p = float(sys.argv[1])
    years = days(1, 2, p)/365.0
    print 'With p=%.2f it takes %.1 years to double' % (p, years)

Running the module file as a program gives this output:

interest.py 2.45
With p=2.45 it takes 27.9 years to double

To test that the interest.py file also works as a module, invoke a Python shell and try to import a function and compute with it:

>>> from interest import present_amount
>>> present_amount(2, 5, 730)
2.2133983053266699

We have hence demonstrated that the file interest.py works both as a program and as a module.

Recommended practice in a test block.

It is a good programming habit to let the test block do one or more of three things:

provide information on how the module or program is used,
test if the module functions work properly,
offer interaction with users such that the module file can be applied as a useful program.

Instead of having a lot of statements in the test block, it is better to collect the statements in separate functions, which then are called from the test block.

Verification of the module code

Functions that verify the implementation in a module should

have names starting with test_,
express the success or failure of a test through a boolean variable, say success,
run assert success, msg to raise an AssertionError with an optional message msg in case the test fails.

Adopting this style makes it trivial to let the tools pytest or nose automatically run through all our test_*() functions in all files in a folder tree. The document Unit testing with pytest and nose [7] contains a more thorough introduction to the pytest and nose testing frameworks for beginners.

Test functions are used for unit testing. This means that we identify some units of our software and write a dedicated test function for testing the behavior of each unit. A unit in the present example can be the interest module, but we could also think of the individual Python functions in interest as units. From a practical point of view, the unit is often defined as what we find appropriate to verify in a test function. For now it is convenient to test all functions in the interest.py file in the same test function, so the module becomes the unit.

A proper test function for verifying the functionality of the interest module, written in a way that is compatible with the pytest and nose testing frameworks, looks as follows:

def test_all_functions():
    # Compatible values
    A = 2.2133983053266699; A0 = 2.0; p = 5; n = 730
    # Given three of these, compute the remaining one
    # and compare with the correct value (in parenthesis)
    A_computed  = present_amount(A0, p, n)
    A0_computed = initial_amount(A, p, n)
    n_computed  = days(A0, A, p)
    p_computed  = annual_rate(A0, A, n)

    def float_eq(a, b, tolerance=1E-12):
        """Return True if a == b within the tolerance."""
        return abs(a - b) < tolerance

    success = float_eq(A_computed,  A)  and \ 
              float_eq(A0_computed, A0) and \ 
              float_eq(p_computed,  p)  and \ 
              float_eq(n_computed,  n)
    msg = """Computations failed (correct answers in parenthesis):
A=%g (%g)
A0=%g (%.1f)
n=%d (%d)
p=%g (%.1f)""" % (A_computed, A, A0_computed, A0,
                  n_computed, n, p_computed, p)
    assert success, msg

We may require a single command-line argument test to run the verification. The test block can then be expressed as

if __name__ == '__main__':
    if len(sys.argv) == 2 and sys.argv[1] == 'test':
        test_all_functions()

Getting input data

To make a useful program, we should allow setting three parameters on the command line and let the program compute the remaining parameter. For example, running the program as

interest.py A0=1 A=2 n=1095

will lead to a computation of

$p$ , in this case for seeing the size of the annual interest rate if the amount is to be doubled after three years.

How can we achieve the desired functionality? Since variables are already introduced and "initialized" on the command line, we could grab this text and execute it as Python code, either as three different lines or with semicolon between each assignment. This is easy:

init_code = ''
for statement in sys.argv[1:]:
    init_code += statement + '\n'
exec(init_code)

(We remark that an experienced Python programmer would have created init_code by '\n'.join(sys.argv[1:]).) For the sample run above with A0=1 A=2 n=1095 on the command line, init_code becomes the string

A0=1
A=2
n=1095

Note that one cannot have spaces around the equal signs on the command line as this will break an assignment like A0 = 1 into three command-line arguments, which will give rise to a SyntaxError in exec(init_code). To tell the user about such errors, we execute init_code inside a try-except block:

try:
    exec(init_code)
except SyntaxError as e:
    print e
    print init_code
    sys.exit(1)

At this stage, our program has hopefully initialized three parameters in a successful way, and it remains to detect the remaining parameter to be computed. The following code does the work:

if 'A=' not in init_code:
    print 'A =', present_amount(A0, p, n)
elif 'A0=' not in init_code:
    print 'A0 =', initial_amount(A, p, n)
elif 'n=' not in init_code:
    print 'n =', days(A0, A , p)
elif 'p=' not in init_code:
    print 'p =', annual_rate(A0, A, n)

It may happen that the user of the program assigns value to a parameter with wrong name or forget a parameter. In those cases we call one of our four functions with uninitialized arguments, and Python raises an exception. Therefore, we should embed the code above in a try-except block. An uninitialized variable will lead to a NameError exception, while another frequent error is illegal values in the computations, leading to a ValueError exception. It is also a good habit to collect all the code related to computing the remaining, fourth parameter in a function for separating this piece of code from other parts of the module file:

def compute_missing_parameter(init_code):
    try:
        exec(init_code)
    except SyntaxError as e:
        print e
        print init_code
        sys.exit(1)
    # Find missing parameter
    try:
        if 'A=' not in init_code:
            print 'A =', present_amount(A0, p, n)
        elif 'A0=' not in init_code:
            print 'A0 =', initial_amount(A, p, n)
        elif 'n=' not in init_code:
            print 'n =', days(A0, A , p)
        elif 'p=' not in init_code:
            print 'p =', annual_rate(A0, A, n)
    except NameError as e:
        print e
        sys.exit(1)
    except ValueError:
        print 'Illegal values in input:', init_code
        sys.exit(1)

If the user of the program fails to give any command-line arguments, we print a usage statement. Otherwise, we run a verification if the first command-line argument is test, and else we run the missing parameter computation (i.e., the useful main program):

_filename = sys.argv[0]
_usage = """
Usage: %s A=10 p=5 n=730
Program computes and prints the 4th parameter'
(A, A0, p, or n)""" % _filename

if __name__ == '__main__':
    if len(sys.argv) == 1:
        print _usage
    elif len(sys.argv) == 2 and sys.argv[1] == 'test':
        test_all_functions()
    else:
        init_code = ''
        for statement in sys.argv[1:]:
            init_code += statement + '\n'
        compute_missing_parameter(init_code)

Executing user input can be dangerous.

Some purists would never demonstrate exec the way we do above. The reason is that our program tries to execute whatever the user writes. Consider

input.py 'import shutil; shutil.rmtree("/")'

This evil use of the program leads to an attempt to remove all files on the computer system (the same as writing rm -rf / in the terminal window!). However, for small private programs helping the program writer out with mathematical calculations, this potential dangerous misuse is not so much of a concern (the user just does harm to his own computer anyway).

Doc strings in modules

It is also a good habit to include a doc string in the beginning of the module file. This doc string explains the purpose and use of the module:

"""
Module for computing with interest rates.
Symbols: A is present amount, A0 is initial amount,
n counts days, and p is the interest rate per year.

Given three of these parameters, the fourth can be
computed as follows:

    A  = present_amount(A0, p, n)
    A0 = initial_amount(A, p, n)
    n  = days(A0, A, p)
    p  = annual_rate(A0, A, n)
"""

You can run the pydoc program to see a documentation of the new module, containing the doc string above and a list of the functions in the module: just write pydoc interest in a terminal window.

Now the reader is recommended to take a look at the actual file interest.py to see all elements of a good module file at once: doc strings, a set of functions, a test function, a function with the main program, a usage string, and a test block.

Using modules

Let us further demonstrate how to use the interest.py module in programs. For illustration purposes, we make a separate program file, say with name doubling.py, containing some computations:

from interest import days

# How many days does it take to double an amount when the
# interest rate is p=1,2,3,...14?
for p in range(1, 15):
    years = days(1, 2, p)/365.0
    print 'p=%d%% implies %.1f years to double the amount' %\ 
    (p, years)

What gets imported by various import statements?

There are different ways to import functions in a module, and let us explore these in an interactive session. The function call dir() will list all names we have defined, including imported names of variables and functions. Calling dir(m) will print the names defined inside a module with name m. First we start an interactive shell and call dir()

>>> dir()
['__builtins__', '__doc__', '__name__', '__package__']

These variables are always defined. Running the IPython shell will introduce several other standard variables too. Doing

>>> from interest import *
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__',
 'annual_rate', 'compute_missing_parameter', 'days',
 'initial_amount', 'ln', 'present_amount', 'sys',
 'test_all_functions']

shows that we get our four functions imported, along with ln and sys. The latter two are needed in the interest module, but not necessarily in our new program doubling.py.

The alternative import interest actually gives us access to more names in the module, namely also all variables and functions that start with an underscore:

>>> import interest
>>> dir(interest)
['__builtins__', '__doc__', '__file__', '__name__',
 '__package__', '_filename', '_usage', 'annual_rate',
 'compute_missing_parameter', 'days', 'initial_amount',
 'ln', 'present_amount', 'sys', 'test_all_functions']

It is a habit to use an underscore for all variables that are not to be included in a from interest import * statement. These variables can, however, be reached through interest._filename and interest._usage in the present example.

It would be best that a statement from interest import * just imported the four functions doing the computations of general interest in other programs. This can be archived by deleting all unwanted names (among those without an initial underscore) at the very end of the module:

del sys, ln, compute_missing_parameter, test_all_functions

Instead of deleting variables and using initial underscores in names, it is in general better to specify the special variable __all__, which is used by Python to select functions to be imported in from interest import * statements. Here we can define __all__ to contain the four function of main interest:

__all__ = ['annual_rate', 'days', 'initial_amount', 'present_amount']

Now we get

>>> from interest import *
['__builtins__', '__doc__', '__name__', '__package__',
 'annual_rate', 'days', 'initial_amount', 'present_amount']

How to make Python find a module file

The doubling.py program works well as long as it is located in the same folder as the interest.py module. However, if we move doubling.py to another folder and run it, we get an error:

doubling.py
Traceback (most recent call last):
  File "doubling.py", line 1, in <module>
    from interest import days
ImportError: No module named interest

Unless the module file resides in the same folder, we need to tell Python where to find our module. Python looks for modules in the folders contained in the list sys.path. A little program

import sys, pprint
pprint.pprint(sys.path)

prints out all these predefined module folders. You can now do one of two things:

Place the module file in one of the folders in sys.path.
Include the folder containing the module file in sys.path.

There are two ways of doing the latter task. Alternative 1 is to explicitly insert a new folder name in sys.path in the program that uses the module:

modulefolder = '../../pymodules'
sys.path.insert(0, modulefolder)

(In this sample path, the slashes are Unix specific. On Windows you must use backslashes and a raw string. A better solution is to express the path as os.path.join(os.pardir, os.pardir, 'mymodules'). This will work on all platforms.)

Python searches the folders in the sequence they appear in the sys.path list so by inserting the folder name as the first list element we ensure that our module is found quickly, and in case there are other modules with the same name in other folders in sys.path, the one in modulefolder gets imported.

Alternative 2 is to specify the folder name in the PYTHONPATH environment variable. All folder names listed in PYTHONPATH are automatically included in sys.path when a Python program starts. On Mac and Linux systems, environment variables like PYTHONPATH are set in the .bashrc file in the home folder, typically as

export PYTHONPATH=$HOME/software/lib/pymodules:$PYTHONPATH

if $HOME/software/lib/pymodules is the folder containing Python modules. On Windows, you launch Computer - Properties - Advanced System Settings - Environment Variables, click under System Variable, write in PYTHONPATH as variable name and the relevant folder(s) as value.

How to make Python run the module file

The description above concerns importing the module in a program located anywhere on the system. If we want to run the module file as a program, anywhere on the system, the operating system searches the PATH environment variable for the program name interst.py. It is therefore necessary to update PATH with the folder where interest.py resides.

On Mac and Linux system this is done in .bashrc in the same way as for PYTHONPATH:

export PATH=$HOME/software/lib/pymodules:$PATH

On Windows, launch the dialog for setting environment variables as described above and find the PATH variable. It already has much content, so you add your new folder value either at the beginning or end, using a semicolon to separate the new value from the existing ones.

Distributing modules

Modules are usually useful pieces of software that others can take advantage of. Even though our simple interest module is of less interest to the world, we can illustrate how such a module is most effectively distributed to other users. The standard in Python is to distribute the module file together with a program called setup.py such that any user can just do

Terminal> sudo python setup.py install

to install the module in one of the directories in sys.path so that the module is immediately accessible anywhere, both for import in a Python program and for execution as a stand-alone program.

The setup.py file is in the case of one module file very short:

from distutils.core import setup
setup(name='interest',
      version='1.0',
      py_modules=['interest'],
      scripts=['interest.py'],
      )

The scripts= keyword argument can be dropped if the module is just to be imported and not run as a program as well. More module files can trivially be added to the list.

A user who runs setup.py install on an Ubuntu machine will see from the output that interest.py is copied to the system folders /usr/local/lib/python2.7/dist-packages and /usr/local/bin. The former folder is for module files, the latter for executable programs.

Remark.

Distributing a single module file can be done as shown, but if you have two or more module files that belong together, you should definitely create a package [8].

Making software available on the Internet

Distributing software today means making it available on one of the major project hosting sites such as GitHub or Bitbucket. You will develop and maintain the project files on your own computer(s), but frequently push the software out in the cloud such that others also get your updates. The mentioned sites have very strong support for collaborative software development.

Sign up for a GitHub account if you do not already have one. Go to your account settings and provide an SSH key (typically the file ~/.ssh/id_rsa.pub) such that you can communicate with GitHub without being prompted for your password.

To create a new project, click on New repository on the main page and fill out a project name. Click on the check button Initialize this repository with a README, and click on Create repository. The next step is to clone (copy) the GitHub repo (short for repository) to your own computer(s) and fill it with files. The typical clone command is

Terminal> git clone git://github.com:username/projname.git

where username is your GitHub username and projname is the name of the repo (project). The result of git clone is a directory projname. Go to this folder and add files. That is, copy setup.py and interst.py to the folder. It is good to also write a short README file explaining what the project is about. Run

Terminal> git add .
Terminal> git commit -am 'First registration of project files'
Terminal> git push origin master

The above git commands look cryptic, but these commands plus 2-3 more are the essence of how programmers today work on software projects, small or big. I strongly encourage you to learn more about version control systems and project hosting sites [9]. The tools are in nature like Dropbox and Google Drive, just much more powerful when you collaborate with others.

Your project files are now stored in the cloud at https://github.com/username/projname. Anyone can get the software by the listed git clone command you used above, or by clicking on the links for zip and tar files.

Every time you update the project files, you need to register the update at GitHub by

Terminal> git commit -am 'Description of the changes you made...'
Terminal> git push origin master

The files at GitHub are now synchronized with your local ones.

There is a bit more to be said here to make you up and going with this style of professional work [9], but the information above gives you at least a glimpse of how to put your software project in the cloud and opening it up for others. The GitHub address for the particular interest module described above is https://github.com/hplgit/interest-primer.

« Previous

Next »