Variables, loops, lists, and arrays

Getting access to Python

Simple mathematical calculations can be done in plain Python, but for more advanced scientific computing, as we do here, several add-on packages are needed. Getting all software correctly installed used to be quite challenging, but today there are several easy-to-use approaches to get access to Python.

Mac and Windows

We recommend to download and install Anaconda, which is a free distribution of Python that comes with most of the packages you need for advanced scientific computing.

Ubuntu

Debian-based Linux systems, such as Ubuntu, can also use Anaconda, but it is more common to use the apt-get command-line tool or the Ubuntu installer to install a set of Debian packages. Here is a list of the packages you need to install for this introduction:

Terminal> sudo apt-get install python-pip python-setuptools \ 
  python-scipy python-sympy python-cython python-matplotlib \ 
  python-dev python-profiler pydb spyder imagemagick gedit vim \ 
  emacs python-mode git mercurial lib-avtools gnome-terminal

In addition, run

Terminal> pip install nose
Terminal> pip install pytest
Terminal> pip install ipython --upgrade
Terminal> pip install tornado --upgrade
Terminal> pip install pyzmq --upgrade

Web access

You can also access Python directly through a web browser without having it installed on your local computer. We refer to the document How to access Python for doing scientific computing for more details on such tools and also installations based on Anaconda and Ubuntu.

Mathematical example

We shall use a famous mathematical formula in our forthcoming programming examples. The formula tells how long distance \( s \) an object has moved in a time interval \( [0,t] \) if it starts out with a velocity \( v_0 \) and undergoes constant acceleration \( a \): $$ \begin{equation} s = v_0t + \frac{1}{2}at^2\thinspace . \tag{1} \end{equation} $$ We may view \( s \) as a function of \( t \): \( s(t) \), and also include the parameters in the notation: \( s(t;v_0,a) \).

A program for evaluating a formula

Here is a Python program for computing \( s \), given \( t=0.5 \), \( v_0=2 \), and \( a=0.2 \):

t = 0.5
v0 = 2
a = 0.2
s = v0*t + 0.5*a*t**2
print s

The program is pure text and must be placed in a pure text file using a text editor. Popular text editors are Gedit, Nano, Emacs, and Vim on Linux, TextWrangler on Mac OS X, and Notepad++ on Windows. Save the text to a program file whose name ends in .py, say distance.py.

The program is run in a terminal window (Command Prompt on Windows, Terminal application on Mac OS X, xterm or gnome-terminal on Linux):

Terminal> python distance.py
1.025

The result of the print statement is the number 1.025 in the terminal window.

As an alternative to writing programs in a text editor and executing them in a terminal window, you may use the Spyder graphical interface which gives a more Matlab-style environment to work in. Anaconda installations come with Spyder.

The distance.py program first contains four assignment statements where we assign numbers or the results of arithmetic expressions to variables. The variables have names coinciding with the mathematical notation used in the formula: t, v0, a, and s. You may think of variables in this programs just as variables in mathematics.

More technical details. A statement like t = 0.5 works as follows. First, the right-hand side is interpreted by Python as a real number and a float object containing the value 0.5 is created. Then the name t is defined as a reference to this object.

In the statement s = v0*t + 0.5*a*t**2, Python will first go to the right-hand side and observe that it is an arithmetic expression involving variables with known values (or names referring to existing objects). The arithmetic expression is calculated, resulting in a real number that is saved as a float object with value 1.025 in the computer's memory.

Everything in Python is an object of some type. Here, t, a, and s are float objects, representing real (floating-point) numbers, while v0 is an int object, representing the integer 2. There are many other types of objects: strings, lists, tuples, dictionaries, arrays, files, ...

Formatted output with text and numbers

For any object s in Python, print s will (normally) print the contents of s. However, sometimes we want to combine the content of an object with some text. Say we want to print s=1.025 rather than just the number. This is easily accomplished using printf syntax:

print 's=%g' % s

The output is specified as a string, enclosed in single or double quotes. Inside the string, there can be "slots" for numbers (or other objects), indicated by a percentage sign followed by a specification of kind of data that will be inserted at this place. In the string above, there is one such slot, %g, where the g implies a real number written as compactly as possible.

It is easy to control the number of decimals using printf syntax. Printing out s=1.03, i.e., s with two decimals, is done by

print 's=%.2f' % s

where the f signifies a decimal number and the preceding .2 means 2 decimals. Scientific notation, as in s=1.03E+00 (\( 1.03\cdot 10^0 \)), is specified as %.2E (2 decimals).

The printf syntax is available in numerous programming languages. Python also offers a related variant, called format string syntax:

print 's={s:.2f}'.format(s=s)

While loops

Suppose we want to make a table with two columns, one with \( t \) values and one with the corresponding \( s \) values. Now we have to repeat a lot of calculations with the formula (1). This is easily done with a loop. There are two types of loops in Python: while loops and for loops.

Let the \( t \) values go from 0 to 2 in increments of 0.1. The following program applies a while loop:

v0 = 2
a = 0.2
dt = 0.1  # Increment
t = 0     # Start value
while t <= 2:
    s = v0*t + 0.5*a*t**2
    print t, s
    t = t + dt

The result of running this program in a terminal window is

Terminal> python while.py
0 0.0
0.1 0.201
0.2 0.404
0.3 0.609
0.4 0.816
0.5 1.025
0.6 1.236
0.7 1.449
0.8 1.664
0.9 1.881
1.0 2.1
1.1 2.321
1.2 2.544
1.3 2.769
1.4 2.996
1.5 3.225
1.6 3.456
1.7 3.689
1.8 3.924
1.9 4.161

So, how do we interpret the contents of this program? First we initialize four variables: v0, a, dt, and t. Everything after # on a line is a comment and does not affect what happens in the program, but is meant to be of help for a human reading the program. Then comes the while loop:

while condition:
    <intented statement>
    <intented statement>
    <intented statement>

Observe the colon at the end of the while line. The set of indented statements are repeated as long as the expression or variable condition evaluates to True. In the present case, the condition is the boolean expression t <= 2, so as long as the value is less than or equal to 2, t <= 2 evaluates to True, otherwise it evaluates to False.

Error in the code. According to the code, the last pass in the while loop should correspond to \( t=2 \), but looking at the output, we see that the last print statement has \( t=1.9 \). The next test in the while condition involves \( t=2 \) and the boolean condition is expected to be 2 == 2. However, it seems that the condition is False since the computations for \( t=2 \) are not printed. Why do we experience this behavior?

The Python Online Tutor is a great tool to examine the program flow. Consider this little loop run in the Python Online Tutor (view in Chrome):

The Python Online Tutor executes each statement and displays the contents of variables such that you can track the program flow and the evolution of variables.

So, why is not a printed for 1.2? That is, why does a == 1.2 evaluate to True when a is (expected to be) 2? We must look at the accuracy of a to investigate this question and write it out with 16 decimals in scientific notation (printf specification %.16E):

The problem is that da is not exactly 0.4, but contains a small round-off error. Doing a = a + da then results in a slightly inaccurate a. When the exact a should reach 1.2, the a in the program has a an error and equals 1.2000000000000002. Obviously the test for exact equality, a == 1.2, becomes False, and the loop is terminated.

Rule: use exact equality == when comparing real numbers. Always compare real numbers using the difference and a tolerance:

a = 1.2
b = 0.4 + 0.4 + 0.4
boolean_condition1 = a == b              # False
tol = 1E-14
boolean_condition2 = abs(a - b) < tol    # True

The Python Online Tutor is ideal for demonstrating program flow and contents of variables in small and simple programs. However, you should use a real debugger instead when searching for errors in real programs.

Lists

The table created in the previous section has two columns of data. We could store all the numbers in each column in a list object. A list is just a collection of objects, here numbers, in a given sequence. For example,

L = [-1, 1, 8.0]

is a list of three numbers. Lists are enclosed in square brackets and may contain any type of objects separated by commas. Here we mix a filename (string), a real number, and an integer:

L = ['mydata.txt', 3.14, 10]

The different list elements can be reached via indexing: L[0] is the first element, L[1] is the second, and L[-1] is the last element. Here is an interactive Python shell where we can write Python statements and examine the contents of variables as we perform various operations:

>>> L = ['mydata.txt', 3.14, 10]
>>> print L[0]
mydata.txt
>>> print L[1]
3.14
>>> del L[0]  # delete the first element
>>> print L
[3.14, 10]
>>> print len(L)  # length of L
2
>>> L.append(-1)  # add -1 at the end of the list
>>> print L
[3.14, 10, -1]

Let us store the numbers in the previous table in lists, one for each column. We can start with empty lists [] and use append to add a new element to the lists in each pass of the while loop. Thereafter, we can run a new while loop and print the contents of the lists in a nice, tabular fashion:

v0 = 2
a = 0.2
dt = 0.1  # Increment
t = 0
t_values = []
s_values = []
while t <= 2:
    s = v0*t + 0.5*a*t**2
    t_values.append(t)
    s_values.append(s)
    t = t + dt
print s_values  # Just take a look at a created list

# Print a nicely formatted table
i = 0
while i <= len(t_values)-1:
    print '%.2f  %.4f' % (t_values[i], s_values[i])
    i += 1   # Same as i = i + 1

The output looks like

[0.0, 0.201, 0.404, 0.6090000000000001, 0.8160000000000001,
 1.025, 1.236, 1.4489999999999998, 1.664, 1.8809999999999998,
 2.0999999999999996, 2.3209999999999997, ...]
0.00  0.0000
0.10  0.2010
0.20  0.4040
0.30  0.6090
0.40  0.8160
0.50  1.0250
0.60  1.2360
0.70  1.4490
...

Note that print s_values here leads to output with many decimals and small round-off errors. To get complete control of the formatting of real numbers in the table, we use the printf syntax.

Lists come with a lot of functionality. See the Python Tutorial for many more examples.

For loops

A for loop is used for visiting elements in a list, one by one:

>>> L = [1, 4, 8, 9]
>>> for e in L:
...     print e
...
1
4
8
9

The variable e is successively set equal to the elements in the list, in the right order. Note the colon at the end of the for line. The statements to be executed in each pass of the loop (here only print e) must be indented. When e is set equal to the last element and all indented statements are executed, the loop is over, and the program flow continues with the next statement that is not indented. Try the following code out in the Python Online Tutor:

A for loop over the valid indices in a list is created by

for i in range(len(somelist)):
    # Work with somelist[i]

The range function returns a list of integers: range(a, b, s) returns the integers a, a+s, a+2*s, ... up to but not including b. Just writing range(b) implies a=0 and s=1, so range(len(somelist)) returns [0, 1, 2].

For loops over real numbers. The for i in range(...) construction can only run a loop over integers. If you need a loop over real values, you have to either create a list with the values first, or use a formula where each value is generated through an integer counter:

# Need loop over 0, 0.1, 0.2, ..., 1
values = []
for i in range(11):
    values.append(i*0.1)

# Shorter version using list comprehension (same as the loop above)
values = [i*0.1 for i in range(11)]

for value in values:
    # work with value

We can now rewrite our program that used lists and while loops to use for loops instead:

v0 = 2
a = 0.2
dt = 0.1  # Increment
t_values = []
s_values = []
n = int(round(2/dt)) + 1  # No of t values
for i in range(n):
    t = i*dt
    s = v0*t + 0.5*a*t**2
    t_values.append(t)
    s_values.append(s)
print s_values  # Just take a look at a created list

# Make nicely formatted table
for t, s in zip(t_values, s_values):
    print '%.2f  %.4f' % (t, s)

# Alternative implementation
for i in range(len(t_values)):
    print '%.2f  %.4f' % (t_values[i], s_values[i])

Observe that we have here used a slightly different technique for computing the \( t \) values inside the first loop: now we set \( t \) as \( i\Delta t \), where \( \Delta t \) (dt in the code) is the increment (0.1) between each \( t \) value. The computation of n, the number of \( t \) values, makes use of round to make a correct mathematical rounding to the nearest integer (and int makes an integer object out of the rounded real number). (In an interval \( [a,b] \) divided into subintervals of equal length \( \Delta t \), there will be \( 1 + (b-a)/\Delta t \) points in total.)

Running through multiple lists simultaneously is done with the zip construction:

for e1, e2, e3, ... in zip(list1, list2, list3, ...):

One may instead create a for loop over all the legal index values instead and index each array,

for i in range(len(list1)):
    e1 = list1[i]
    e2 = list2[i]
    ...

Arrays

Lists are useful for collecting a set of numbers or other objects in a single variable. Arrays are much like lists, but tailored for collection of numbers. The primary advantage of arrays is that you can use them very efficiently and conveniently in mathematical computations, but the downside is that an array has (in practice) a fixed length and all elements must be of the same type. This is usually no important restriction in scientific computations.

Much of Python's functionality are coded in modules. To use such functionality, one must import the relevant modules. Arrays, for example, are available in the numpy module, which must be imported. Functions or variables in the module must be prefixed by numpy. This will be demonstrated below. Here is an example involving basic operations with arrays:

>>> L = [1, 4, 10.0]    # List of numbers
>>> import numpy
>>> a = numpy.array(L)  # Make corresponding array
>>> print a
[  1.   4.  10.]
>>> print a[1]
4.0
>>> print a.dtype       # Data type of an element
float64
>>> b = 2*a + 1
>>> print b
[  3.   9.  21.]

Note that all elements in the a array are of float type (because one element in L was float). Arithmetic expressions such as 2*a+1 work with a as array, but not as list. In fact, we can pass arrays to mathematical functions:

>>> c = numpy.log(a)
>>> print c
[ 0.          1.38629436  2.30258509]

The numpy module has a lot of very useful utilities. To create \( n+1 \) uniformly distributed coordinates in an interval \( [a,b] \), stored in an array, one can use linspace:

t = numpy.linspace(a, b, n+1)

This construction makes it easy to create arrays for the \( t \) and \( s \) values in our tables:

import numpy
v0 = 2
a = 0.2
dt = 0.1  # Increment
n = int(round(2/dt)) + 1  # No of t values

t_values = numpy.linspace(0, 2, n+1)
s_values = v0*t + 0.5*a*t**2

# Make nicely formatted table
for t, s in zip(t_values, s_values):
    print '%.2f  %.4f' % (t, s)

Mathematical functions

Python offers access to all standard mathematical functions such as \( \sin x \), \( \cos x \), \( \tan x \), \( \sinh x \), \( \cosh x \), \( \tanh x \), all their inverses (called asin(x), asinh(x), and so forth), \( e^x \) (exp(x)), \( \ln x \) (log(x)), and \( x! \) (factorial(x)). However, one has to import a module to get access to these functions. For scalars (single numbers) the relevant module is math:

>>> import math
>>> print math.sin(math.pi)
1.2246467991473532e-16

which shows that the sine function is only approximate (to 16 digits). Many prefer to write mathematical expressions without the math prefix:

from math import sin, pi
print sin(pi)

# Or import everything from math
from math import *
print sin(pi), log(e), tanh(0.5)

The numpy module contains sine, cosine, and other mathematical functions that work on scalars as well as arrays.

Import of numpy. To get Python code that is as similar to Matlab as possible, one would do a "start import" of everything,

from numpy import *
x = linspace(0, 1, 101)
y = exp(-x)*sin(pi*x)

However, in the Python community it has been a culture to use a prefix np as abbreviation for numpy:

import numpy as np
x = np.linspace(0, 1, 101)
y = np.exp(-x)*np.sin(np.pi*x)

Our convention is to use the np prefix for typical Matlab functions, but skip the prefix when working with mathematical functions like `exp(x)*sin(pi*x)`to get a one-to-one correspondence between formulas in the program and in the mathematical description of the problem.

import numpy as np
from numpy import sin, exp, pi
x = np.linspace(0, 1, 101)
y = exp(-x)*sin(pi*x)

Plotting

We can easily make a graph of a function \( s(t) \) using the module matplotlib. The technique is to compute an array of \( t \) values and a corresponding array of function values \( s(t) \). Plotting programs will draw straight lines between the points on the curve, so a sufficient number of points are needed to give the impression of a smooth curve. Our \( s(t) \) function is plotted by the following code:

import numpy as np
import matplotlib.pyplot as plt

v0 = 0.2
a = 2
n = 21  # No of t values for plotting

t = np.linspace(0, 2, n+1)
s = v0*t + 0.5*a*t**2

plt.plot(t, s)
plt.savefig('myplot.png')
plt.show()

The plotfile myplot.png looks like





Matlab users may prefer to do

from numpy import *
from matplotlib.pyplot import *

such that they can use linspace and plot without any prefix, just as in Matlab.

Two curves can easily be plotted, this time also with labels on the axis and a box with legends such that we can distinguish the two curves:

import numpy as np
import matplotlib.pyplot as plt

v0 = 0.2
a = 2
n = 21  # No of t values for plotting

t = np.linspace(0, 2, n+1)
s = v0*t + 0.5*a*t**2

plt.plot(t, s)
plt.savefig('myplot.png')
plt.show()