$$ \newcommand{\tp}{\thinspace .} $$

This chapter is taken from the book A Primer on Scientific Programming with Python by H. P. Langtangen, 5th edition, Springer, 2016.

Arrays in Python programs

This section introduces array programming in Python, but first we create some lists and show how arrays differ from lists.

Using lists for collecting function data

Suppose we have a function \( f(x) \) and want to evaluate this function at a number of \( x \) points \( x_0,x_1,\ldots,x_{n-1} \). We could collect the \( n \) pairs \( (x_i,f(x_i)) \) in a list, or we could collect all the \( x_i \) values, for \( i=0,\ldots,n-1 \), in a list and all the associated \( f(x_i) \) values in another list. The following interactive session demonstrates how to create these three types of lists:

>>> def f(x):
...     return x**3       # sample function
...
>>> n = 5                 # no of points along the x axis
>>> dx = 1.0/(n-1)        # spacing between x points in [0,1]
>>> xlist = [i*dx for i in range(n)]
>>> ylist = [f(x) for x in xlist]
>>> pairs = [[x, y] for x, y in zip(xlist, ylist)]
Here we have used list comprehensions for achieving compact code. Make sure that you understand what is going on in these list comprehensions (if not, try to write the same code using standard for loops and appending new list elements in each pass of the loops).

The list elements consist of objects of the same type: any element in pairs is a list of two float objects, while any element in xlist or ylist is a float. Lists are more flexible than that, because an element can be an object of any type, e.g.,

mylist = [2, 6.0, 'tmp.pdf', [0,1]]
Here mylist holds an int, a float, a string, and a list. This combination of diverse object types makes up what is known as heterogeneous lists. We can also easily remove elements from a list or add new elements anywhere in the list. This flexibility of lists is in general convenient to have as a programmer, but in cases where the elements are of the same type and the number of elements is fixed, arrays can be used instead. The benefits of arrays are faster computations, less memory demands, and extensive support for mathematical operations on the data. Because of greater efficiency and mathematical convenience, arrays will be used to a large extent in this document. The great use of arrays is also prominent in other programming environments such as MATLAB, Octave, and R, for instance. Lists will be our choice instead of arrays when we need the flexibility of adding or removing elements or when the elements may be of different object types.

Basics of numerical Python arrays

An array object can be viewed as a variant of a list, but with the following assumptions and features:

We have two remarks to the above list. First, there is actually an object type called array in standard Python, but this data type is not so efficient for mathematical computations, and we will not use it in this document. Second, the number of elements in an array can be changed, but at a substantial computational cost.

The following text lists some important functionality of NumPy arrays. A more comprehensive treatment is found in the excellent NumPy Tutorial, NumPy User Guide, NumPy Reference, Guide to NumPy, and NumPy for MATLAB Users, all accessible at scipy.org.

The standard import statement for Numerical Python reads

import numpy as np
To convert a list r to an array, we use the array function from numpy:

a = np.array(r)
To create a new array of length n, filled with zeros, we write

a = np.zeros(n)
The array elements are of a type that corresponds to Python's float type. A second argument to np.zeros can be used to specify other element types, e.g., int. A similar function,

a = np.zeros_like(c)
generates an array of zeros where the length is that of the array c and the element type is the same as those in c. Arrays with more than one index are treated in the section Higher-dimensional arrays.

Often one wants an array to have \( n \) elements with uniformly distributed values in an interval \( [p,q] \). The numpy function linspace creates such arrays:

a = np.linspace(p, q, n)

Array elements are accessed by square brackets as for lists: a[i]. Slices also work as for lists, for example, a[1:-1] picks out all elements except the first and the last, but contrary to lists, a[1:-1] is not a copy of the data in a. Hence,

b = a[1:-1]
b[2] = 0.1
will also change a[3] to 0.1. A slice a[i:j:s] picks out the elements starting with index i and stepping s indices at the time up to, but not including, j. Omitting i implies i=0, and omitting j implies j=n if n is the number of elements in the array. For example, a[0:-1:2] picks out every two elements up to, but not including, the last element, while a[::4] picks out every four elements in the whole array.

Remarks on importing NumPy.

The statement

import numpy as np
with subsequent prefixing of all NumPy functions and variables by np, has evolved as a standard syntax in the Python scientific computing community. However, to make Python programs look closer to MATLAB and ease the transition to and from that language, one can do

from numpy import *
to get rid of the prefix (this is evolved as the standard in interactive Python shells). This author prefers mathematical functions from numpy to be written without the prefix to make the formulas as close as possible to the mathematics. So, \( f(x)=\sinh(x-1)\sin(w t) \) would be coded as

from numpy import sinh, sin

def f(x):
    return sinh(x-1)*sin(w*t)
or one may take the less recommended lazy approach from numpy import * and fill up the program with a lot of functions and variables from numpy.

Computing coordinates and function values

With these basic operations at hand, we can continue the session from the previous section and make arrays out of the lists xlist and ylist:

>>> import numpy as np
>>> x2 = np.array(xlist)      # turn list xlist into array x2
>>> y2 = np.array(ylist)
>>> x2
array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])
>>> y2
array([ 0.      ,  0.015625,  0.125   ,  0.421875,  1.      ])
Instead of first making a list and then converting the list to an array, we can compute the arrays directly. The equally spaced coordinates in x2 are naturally computed by the np.linspace function. The y2 array can be created by np.zeros, to ensure that y2 has the right length len(x2), and then we can run a for loop to fill in all elements in y2 with f values:

>>> n = len(xlist)
>>> x2 = np.linspace(0, 1, n)
>>> y2 = np.zeros(n)
>>> for i in xrange(n):
...     y2[i] = f(x2[i])
...
>>> y2
array([ 0.      ,  0.015625,  0.125   ,  0.421875,  1.      ])
Note that we here in the for loop have used xrange instead of range. The former is faster for long loops because it avoids generating and storing a list of integers, it just generates the values one by one. Hence, we prefer xrange over range for loops over long arrays. In Python version 3.x, range is the same as xrange.

Creating an array of a given length is frequently referred to as allocating the array. It simply means that a part of the computer's memory is marked for being occupied by this array. Mathematical computations will often fill up most of the computer's memory by allocating long arrays.

We can shorten the previous code by creating the y2 data in a list comprehension, but list comprehensions produce lists, not arrays, so we need to transform the list object to an array object:

>>> x2 = np.linspace(0, 1, n)
>>> y2 = np.array([f(xi) for xi in x2])
Nevertheless, there is a much faster way of computing y2 as the next paragraph explains.

Vectorization

Loops over very long arrays may run slowly. A great advantage with arrays is that we can get rid of the loops and apply f directly to the whole array:

>>> y2 = f(x2)
>>> y2
array([ 0.      ,  0.015625,  0.125   ,  0.421875,  1.      ])
The magic that makes f(x2) work builds on the vector computing concepts from the section Vector arithmetics and vector functions. Instead of calling f(x2) we can equivalently write the function formula x2**3 directly.

The point is that numpy implements vector arithmetics for arrays of any dimension. Moreover, numpy provides its own versions of mathematical functions like cos, sin, exp, log, etc., which work for array arguments and apply the mathematical function to each element. The following code, computes each array element separately:

from math import sin, cos, exp
import numpy as np
x = np.linspace(0, 2, 201)
r = np.zeros(len(x))
for i in xrange(len(x)):
    r[i] = sin(np.pi*x[i])*cos(x[i])*exp(-x[i]**2) + 2 + x[i]**2
while here is a corresponding code that operates on arrays directly:

r = np.sin(np.pi*x)*np.cos(x)*np.exp(-x**2) + 2 + x**2
Many will prefer to see such formulas without the np prefix:

from numpy import sin, cos, exp, pi
r = sin(pi*x)*cos(x)*exp(-x**2) + 2 + x**2
An important thing to understand is that sin from the math module is different from the sin function provided by numpy. The former does not allow array arguments, while the latter accepts both real numbers and arrays.

Replacing a loop like the one above, for computing r[i], by a vector/array expression like sin(x)*cos(x)*exp(-x**2) + 2 + x**2, is called vectorization. The loop version is often referred to as scalar code. For example,

import numpy as np
import math
x = np.zeros(N);  y = np.zeros(N)
dx = 2.0/(N-1) # spacing of x coordinates
for i in range(N):
    x[i] = -1 + dx*i
    y[i] = math.exp(-x[i])*x[i]
is scalar code, while the corresponding vectorized version reads

x = np.linspace(-1, 1, N)
y = np.exp(-x)*x
We remark that list comprehensions,

x = array([-1 + dx*i for i in range(N)])
y = array([np.exp(-xi)*xi for xi in x])
result in scalar code because we still have explicit, slow Python for loops operating on scalar quantities. The requirement of vectorized code is that there are no explicit Python for loops. The loops required to compute each array element are performed in fast C or Fortran code in the numpy package.

Most Python functions intended for a scalar argument x, like

def f(x):
    return x**4*exp(-x)
automatically work for an array argument x:

x = np.linspace(-3, 3, 101)
y = f(x)
provided that the exp function in the definition of f accepts an array argument. This means that exp must have been imported as from numpy import * or explicitly as from numpy import exp. One can, of course, prefix exp as in np.exp, at the loss of a less attractive mathematical syntax in the formula.

When a Python function f(x) works for an array argument x, we say that the function f is vectorized. Provided that the mathematical expressions in f involve arithmetic operations and basic mathematical functions from the math module, f will be automatically vectorized by just importing the functions from numpy instead of math. However, if the expressions inside f involve if tests, the code needs a rewrite to work with arrays. The section Piecewisely defined functions presents examples where we have to do special actions in order to vectorize functions.

Vectorization is very important for speeding up Python programs that perform heavy computations with arrays. Moreover, vectorization gives more compact code that is easier to read. Vectorization is particularly important for statistical simulations, as demonstrated in the document Random numbers and simple games [1].