This chapter is taken from the book A Primer on Scientific Programming with Python by H. P. Langtangen, 5th edition, Springer, 2016.
Sometimes you want to reuse a function from an old program in a new program. The simplest way to do this is to copy and paste the old source code into the new program. However, this is not good programming practice, because you then over time end up with multiple identical versions of the same function. When you want to improve the function or correct a bug, you need to remember to do the same update in all files with a copy of the function, and in real life most programmers fail to do so. You easily end up with a mess of different versions with different quality of basically the same code. Therefore, a golden rule of programming is to have one and only one version of a piece of code. All programs that want to use this piece of code must access one and only one place where the source code is kept. This principle is easy to implement if we create a module containing the code we want to reuse later in different programs.
When reading this, you probably know how to use a ready-made module.
For example, if you want to compute the factorial \( k!=k(k-1)(k-2)\cdots 1 \),
there is a function factorial
in Python's math
module that
can be help us out. The usage goes with the math
prefix,
import math
value = math.factorial(5)
or without,
from math import factorial
# or: from math import *
value = factorial(5)
Now you shall learn how to make your own Python modules. There is
hardly anything to learn, because you just collect all the functions
that constitute the module in one file, say with name
mymodule.py
. This file is automatically a module, with name
mymodule
, and you can import functions from this module in the
standard way. Let us make everything clear in detail by looking at an
example.
The classical formula for the growth of money in a bank reads
$$
\begin{equation}
A = A_0\left( 1 + {p\over 360\cdot 100}\right)^n,
\tag{2}
\end{equation}
$$
where \( A_0 \) is the initial amount of money, and \( A \) is the present amount
after \( n \) days with \( p \) percent annual interest rate.
(The formula applies the convention that
the rate per day is computed as \( p/360 \), while \( n \) counts the actual
number of days the money is in the bank, see the Wikipedia entry
Day count convention
for explanation.
There is a handy Python module
datetime
for
computing the number of days between two dates.)
Equation (2) involves four parameters: \( A \), \( A_0 \), \( p \), and \( n \). We may solve for any of these, given the other three: $$ \begin{align} A_0 &= A\left( 1 + {p\over 360\cdot 100}\right)^{-n}, \tag{3}\\ n &= \frac{\ln {A\over A_0}}{\ln \left( 1 + {p\over 360\cdot 100}\right)} , \tag{4}\\ p &= 360\cdot 100 \left(\left({A\over A_0}\right)^{1/n} - 1\right)\tp \tag{5} \end{align} $$ Suppose we have implemented (2)-(5) in four functions:
from math import log as ln
def present_amount(A0, p, n):
return A0*(1 + p/(360.0*100))**n
def initial_amount(A, p, n):
return A*(1 + p/(360.0*100))**(-n)
def days(A0, A, p):
return ln(A/A0)/ln(1 + p/(360.0*100))
def annual_rate(A0, A, n):
return 360*100*((A/A0)**(1.0/n) - 1)
We want to make these functions available in a module, say with
name interest
, so that we
can import functions and compute with them in a program. For example,
from interest import days
A0 = 1; A = 2; p = 5
n = days(A0, 2, p)
years = n/365.0
print 'Money has doubled after %.1f years' % years
How to make the interest
module is described next.
To make a module of the four functions
present_amount
,
initial_amount
,
days
, and
annual_rate
,
we simply open an empty file in a text
editor and copy the program code for all the four functions over
to this file. This file is then automatically a Python module
provided we save the file under any valid filename.
The extension must be .py
, but the module name is only the
base part of the filename. In our case, the filename
interest.py
implies a module name
interest
. To use the annual_rate
function in another program
we simply write, in that program file,
from interest import annual_rate
or we can write
from interest import *
to import all four functions, or we can write
import interest
and access individual functions as interest.annual_rate
and so forth.
It is recommended to only have functions and not any statements outside functions in a module. The reason is that the module file is executed from top to bottom during the import. With function definitions only in the module file, and no main program, there will be no calculations or output from the import, just definitions of functions. This is the desirable behavior. However, it is often convenient to have test or demonstrations in the module file, and then there is need for a main program. Python allows a very fortunate construction to let the file act both as a module with function definitions only (and no main program) and as an ordinary program we can run, with functions and a main program.
This two-fold "magic" is realized by putting the main program after an
if
test of the form
if __name__ == '__main__':
<block of statements>
The __name__
variable is automatically defined in any
module and equals the module name if the module file is imported
in another program, or __name__
equals
the string '__main__'
if the module file is run
as a program. This implies that the <block of statements>
part
is executed if and only if
we run the module file as a program. We shall refer to
<block of statements>
as the test block of a module.
A very simple example will illustrate how this works. Consider a
file mymod.py
with the content
def add1(x):
return x + 1
if __name__ == '__main__':
print 'run as program'
import sys
print add1(float(sys.argv[1]))
We can import mymod
as a module and make use of the add1
function:
>>> import mymod
>>> print mymod.add1(4)
5
During the import, the if
test is false, and the only the function
definition is executed. However, if we run mymod.py
as a program,
mymod.py 5
run as program
6
the if
test becomes true, and the print
statements are
executed.
If you have some functions and a main program in some program file, just move the main program to the test block. Then the file can act as a module, giving access to all the functions in other files, or the file can be executed from the command line, in the same way as the original program.
interest
module
Let us write a little main program for demonstrating
the interest
module in a test block.
We read \( p \) from the command line and write out how many years
it takes to double an amount with that interest rate:
if __name__ == '__main__':
import sys
p = float(sys.argv[1])
years = days(1, 2, p)/365.0
print 'With p=%.2f it takes %.1 years to double' % (p, years)
Running the module file as a program gives this output:
interest.py 2.45
With p=2.45 it takes 27.9 years to double
To test that the interest.py
file also works as a module,
invoke a Python shell and try to
import a function and compute with it:
>>> from interest import present_amount
>>> present_amount(2, 5, 730)
2.2133983053266699
We have hence demonstrated that
the file interest.py
works both as a program and as a module.
It is a good programming habit to let the test block do one or more of three things:
Functions that verify the implementation in a module should
test_
,success
,assert success, msg
to raise an AssertionError
with an optional message msg
in case the test fails.test_*()
functions in all files
in a folder tree.
The
document Unit testing with pytest and nose
[7]
contains a more thorough introduction to the pytest and nose testing
frameworks for beginners.
Test functions are used for unit testing. This means that we
identify some units of our software and write a dedicated test
function for testing the behavior of each unit. A unit in the present
example can be the interest
module, but we could also think of the
individual Python functions in interest
as units. From a practical
point of view, the unit is often defined as what we find appropriate to verify
in a test function. For now it is convenient to test all functions in
the interest.py
file in the same test function,
so the module becomes the unit.
A proper test function for verifying the functionality of the
interest
module, written in a way that is compatible with the pytest
and nose testing frameworks, looks as follows:
def test_all_functions():
# Compatible values
A = 2.2133983053266699; A0 = 2.0; p = 5; n = 730
# Given three of these, compute the remaining one
# and compare with the correct value (in parenthesis)
A_computed = present_amount(A0, p, n)
A0_computed = initial_amount(A, p, n)
n_computed = days(A0, A, p)
p_computed = annual_rate(A0, A, n)
def float_eq(a, b, tolerance=1E-12):
"""Return True if a == b within the tolerance."""
return abs(a - b) < tolerance
success = float_eq(A_computed, A) and \
float_eq(A0_computed, A0) and \
float_eq(p_computed, p) and \
float_eq(n_computed, n)
msg = """Computations failed (correct answers in parenthesis):
A=%g (%g)
A0=%g (%.1f)
n=%d (%d)
p=%g (%.1f)""" % (A_computed, A, A0_computed, A0,
n_computed, n, p_computed, p)
assert success, msg
We may require a single command-line argument
test
to run the verification. The test block can then be expressed as
if __name__ == '__main__':
if len(sys.argv) == 2 and sys.argv[1] == 'test':
test_all_functions()
To make a useful program, we should allow setting three parameters on the command line and let the program compute the remaining parameter. For example, running the program as
interest.py A0=1 A=2 n=1095
will lead to a computation of \( p \), in this case
for seeing the size of the annual
interest rate if the amount is to be doubled after three years.
How can we achieve the desired functionality? Since variables are already introduced and "initialized" on the command line, we could grab this text and execute it as Python code, either as three different lines or with semicolon between each assignment. This is easy:
init_code = ''
for statement in sys.argv[1:]:
init_code += statement + '\n'
exec(init_code)
(We remark that an experienced Python programmer
would have created init_code
by '\n'.join(sys.argv[1:])
.)
For the sample run above with A0=1 A=2 n=1095
on the command line,
init_code
becomes the string
A0=1
A=2
n=1095
Note that one cannot have spaces around the equal signs on the command
line as this will break an assignment like A0 = 1
into
three command-line arguments, which will give rise to a
SyntaxError
in exec(init_code)
.
To tell the user about such errors, we
execute init_code
inside a try-except
block:
try:
exec(init_code)
except SyntaxError as e:
print e
print init_code
sys.exit(1)
At this stage, our program has hopefully initialized three parameters in a successful way, and it remains to detect the remaining parameter to be computed. The following code does the work:
if 'A=' not in init_code:
print 'A =', present_amount(A0, p, n)
elif 'A0=' not in init_code:
print 'A0 =', initial_amount(A, p, n)
elif 'n=' not in init_code:
print 'n =', days(A0, A , p)
elif 'p=' not in init_code:
print 'p =', annual_rate(A0, A, n)
It may happen that the user of the program assigns value to a parameter
with wrong name or forget a parameter. In those cases we call
one of our four functions with uninitialized arguments, and Python raises
an exception.
Therefore, we should embed the code above in a try-except
block. An uninitialized variable will lead to a NameError
exception, while
another frequent error is illegal values in the computations, leading to
a ValueError
exception.
It is also a good habit to collect
all the code related to computing the remaining, fourth parameter
in a function for separating this piece of code from other parts of
the module file:
def compute_missing_parameter(init_code):
try:
exec(init_code)
except SyntaxError as e:
print e
print init_code
sys.exit(1)
# Find missing parameter
try:
if 'A=' not in init_code:
print 'A =', present_amount(A0, p, n)
elif 'A0=' not in init_code:
print 'A0 =', initial_amount(A, p, n)
elif 'n=' not in init_code:
print 'n =', days(A0, A , p)
elif 'p=' not in init_code:
print 'p =', annual_rate(A0, A, n)
except NameError as e:
print e
sys.exit(1)
except ValueError:
print 'Illegal values in input:', init_code
sys.exit(1)
If the user of the program fails to give any command-line arguments,
we print a usage statement.
Otherwise, we run a verification if the first command-line
argument is test
, and else we run the missing parameter
computation (i.e., the useful main program):
_filename = sys.argv[0]
_usage = """
Usage: %s A=10 p=5 n=730
Program computes and prints the 4th parameter'
(A, A0, p, or n)""" % _filename
if __name__ == '__main__':
if len(sys.argv) == 1:
print _usage
elif len(sys.argv) == 2 and sys.argv[1] == 'test':
test_all_functions()
else:
init_code = ''
for statement in sys.argv[1:]:
init_code += statement + '\n'
compute_missing_parameter(init_code)
Some purists would never demonstrate exec
the way we do above.
The reason is that our program tries to execute whatever the
user writes. Consider
input.py 'import shutil; shutil.rmtree("/")'
This evil use of the program leads to an attempt to remove all
files on the computer system (the same as writing rm -rf /
in the terminal window!). However, for small private programs
helping the program writer out with mathematical calculations,
this potential dangerous misuse is not so much of a concern
(the user just does harm to his own computer anyway).
It is also a good habit to include a doc string in the beginning of the module file. This doc string explains the purpose and use of the module:
"""
Module for computing with interest rates.
Symbols: A is present amount, A0 is initial amount,
n counts days, and p is the interest rate per year.
Given three of these parameters, the fourth can be
computed as follows:
A = present_amount(A0, p, n)
A0 = initial_amount(A, p, n)
n = days(A0, A, p)
p = annual_rate(A0, A, n)
"""
You can run the pydoc
program to see a documentation of the new
module, containing the doc string above and a list of the functions
in the module: just write pydoc interest
in a terminal window.
Now the reader is recommended to take a look at the actual file interest.py to see all elements of a good module file at once: doc strings, a set of functions, a test function, a function with the main program, a usage string, and a test block.
Let us further demonstrate how to use the interest.py
module in
programs. For illustration purposes, we make a separate program file,
say with name doubling.py
, containing some computations:
from interest import days
# How many days does it take to double an amount when the
# interest rate is p=1,2,3,...14?
for p in range(1, 15):
years = days(1, 2, p)/365.0
print 'p=%d%% implies %.1f years to double the amount' %\
(p, years)
There are different ways to import functions in a module, and let us
explore these in an interactive session. The function call dir()
will list all names we have defined, including imported names of
variables and functions. Calling dir(m)
will print the names
defined inside a module with name m
.
First we start an interactive shell and call dir()
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__']
These variables are always defined. Running the IPython shell
will introduce several
other standard variables too.
Doing
>>> from interest import *
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__',
'annual_rate', 'compute_missing_parameter', 'days',
'initial_amount', 'ln', 'present_amount', 'sys',
'test_all_functions']
shows that we get our four functions imported, along with
ln
and sys
. The latter two are needed in the
interest
module, but not necessarily in our new program
doubling.py
.
The alternative import interest
actually gives us access to
more names in the module, namely also all variables and functions
that start with an underscore:
>>> import interest
>>> dir(interest)
['__builtins__', '__doc__', '__file__', '__name__',
'__package__', '_filename', '_usage', 'annual_rate',
'compute_missing_parameter', 'days', 'initial_amount',
'ln', 'present_amount', 'sys', 'test_all_functions']
It is a habit to use an underscore for all variables that
are not to be included in a from interest import *
statement.
These variables can, however, be reached through interest._filename
and interest._usage
in the present example.
It would be best that a statement from interest import *
just
imported the four functions doing the computations of
general interest in other programs. This can be archived by
deleting all unwanted names (among those without an initial underscore)
at the very end of the module:
del sys, ln, compute_missing_parameter, test_all_functions
Instead of deleting variables and using initial underscores in names,
it is in general better to specify the special variable __all__
,
which is used by Python to select functions to be imported in
from interest import *
statements. Here we can define
__all__
to contain the four function of main interest:
__all__ = ['annual_rate', 'days', 'initial_amount', 'present_amount']
Now we get
>>> from interest import *
['__builtins__', '__doc__', '__name__', '__package__',
'annual_rate', 'days', 'initial_amount', 'present_amount']
The doubling.py
program works well as long as it is located in the
same folder as the interest.py
module. However, if we
move doubling.py
to another folder and run it, we get
an error:
doubling.py
Traceback (most recent call last):
File "doubling.py", line 1, in <module>
from interest import days
ImportError: No module named interest
Unless the module file resides in the same folder, we need to tell Python
where to find our module.
Python looks for modules in the folders contained
in the list sys.path
.
A little program
import sys, pprint
pprint.pprint(sys.path)
prints out all these predefined module folders.
You can now do one of two things:
sys.path
.sys.path
.sys.path
in
the program that uses the module:
modulefolder = '../../pymodules'
sys.path.insert(0, modulefolder)
(In this sample path,
the slashes are Unix specific. On
Windows you must use backslashes and a raw string.
A better solution is to express the path as
os.path.join(os.pardir, os.pardir, 'mymodules')
.
This will work on all platforms.)
Python searches the folders in the sequence they appear in the
sys.path
list so by inserting the folder name as the
first list element we ensure that our module is found quickly,
and in case there are other modules with the same name in other
folders in sys.path
, the one in modulefolder
gets
imported.
Alternative 2 is to specify the folder name in the
PYTHONPATH
environment variable. All folder names listed
in PYTHONPATH
are automatically included in sys.path
when a Python program starts. On Mac and Linux systems, environment
variables like PYTHONPATH
are set in the .bashrc
file
in the home folder, typically as
export PYTHONPATH=$HOME/software/lib/pymodules:$PYTHONPATH
if $HOME/software/lib/pymodules
is the folder containing
Python modules. On Windows, you launch
Computer - Properties - Advanced System Settings - Environment Variables,
click under System Variable, write in PYTHONPATH
as
variable name and the relevant folder(s) as value.
The description above concerns importing the module in a program
located anywhere on the system. If we want to
run the module file as a program, anywhere on the system,
the operating system searches the PATH
environment variable
for the program name interst.py
. It is therefore necessary
to update PATH
with the folder where interest.py
resides.
On Mac and Linux system this is done in .bashrc
in the same
way as for PYTHONPATH
:
export PATH=$HOME/software/lib/pymodules:$PATH
On Windows, launch the dialog for setting environment variables
as described above and find the PATH
variable. It already
has much content, so you add your new folder value either at
the beginning or end, using a semicolon to separate the new
value from the existing ones.
Modules are usually useful pieces of software that others can take
advantage of. Even though our simple interest
module is of less
interest to the world, we can illustrate how such a module is most
effectively distributed to other users. The standard in Python is to
distribute the module file together with a program called setup.py
such that any user can just do
Terminal> sudo python setup.py install
to install the module in one of the directories in sys.path
so
that the module is immediately accessible anywhere, both for
import in a Python program and for execution as a stand-alone program.
The setup.py file is in the case of one module file very short:
from distutils.core import setup
setup(name='interest',
version='1.0',
py_modules=['interest'],
scripts=['interest.py'],
)
The scripts=
keyword argument can be dropped if the module is just
to be imported and not run as a program as well.
More module files can trivially be added to the list.
A user who runs setup.py install
on an Ubuntu machine will see from
the output that interest.py
is copied to the system
folders /usr/local/lib/python2.7/dist-packages
and /usr/local/bin
.
The former folder is for module files, the latter for executable
programs.
Distributing a single module file can be done as shown, but if you have two or more module files that belong together, you should definitely create a package [8].
Distributing software today means making it available on one of the major project hosting sites such as GitHub or Bitbucket. You will develop and maintain the project files on your own computer(s), but frequently push the software out in the cloud such that others also get your updates. The mentioned sites have very strong support for collaborative software development.
Sign up for a GitHub account if you do not already have one.
Go to your account settings and provide an SSH key (typically
the file ~/.ssh/id_rsa.pub
) such that
you can communicate with GitHub without being prompted for your password.
To create a new project, click on New repository on the main page and fill out a project name. Click on the check button Initialize this repository with a README, and click on Create repository. The next step is to clone (copy) the GitHub repo (short for repository) to your own computer(s) and fill it with files. The typical clone command is
Terminal> git clone git://github.com:username/projname.git
where username
is your GitHub username and projname
is the
name of the repo (project). The result of git clone
is a
directory projname
. Go to this folder and add files. That is,
copy
setup.py
and interst.py
to the folder.
It is good to also write a short README
file explaining what
the project is about.
Run
Terminal> git add .
Terminal> git commit -am 'First registration of project files'
Terminal> git push origin master
The above git
commands look cryptic, but these commands plus
2-3 more are the essence of how programmers today work on
software projects, small or big. I strongly encourage you to
learn more about version control systems and project hosting
sites [9]. The tools are in nature like Dropbox
and Google Drive, just much more powerful when you collaborate
with others.
Your project files are now stored in the cloud at
https://github.com/username/projname. Anyone can
get the software by the listed git clone
command you used above,
or by clicking on the links for zip and tar files.
Every time you update the project files, you need to register the update at GitHub by
Terminal> git commit -am 'Description of the changes you made...'
Terminal> git push origin master
The files at GitHub are now synchronized with your local ones.
There is a bit more
to be said here to make you up and going with this style of
professional work [9], but the information above
gives you at least a glimpse of how to put your software project
in the cloud and opening it up for others.
The GitHub address for the
particular interest
module described above is
https://github.com/hplgit/interest-primer.