This chapter is taken from the book A Primer on Scientific Programming with Python by H. P. Langtangen, 5th edition, Springer, 2016.
This section introduces array programming in Python, but first we create some lists and show how arrays differ from lists.
Suppose we have a function \( f(x) \) and want to evaluate this function at a number of \( x \) points \( x_0,x_1,\ldots,x_{n-1} \). We could collect the \( n \) pairs \( (x_i,f(x_i)) \) in a list, or we could collect all the \( x_i \) values, for \( i=0,\ldots,n-1 \), in a list and all the associated \( f(x_i) \) values in another list. The following interactive session demonstrates how to create these three types of lists:
>>> def f(x):
... return x**3 # sample function
...
>>> n = 5 # no of points along the x axis
>>> dx = 1.0/(n-1) # spacing between x points in [0,1]
>>> xlist = [i*dx for i in range(n)]
>>> ylist = [f(x) for x in xlist]
>>> pairs = [[x, y] for x, y in zip(xlist, ylist)]
Here we have used list comprehensions for achieving compact code. Make
sure that you understand what is going on in these list comprehensions
(if not, try to write the same code using standard for loops and
appending new list elements in each pass of the loops).
The list elements consist of objects of the same type: any element in
pairs is a list of two float objects, while any element in xlist
or ylist is a float. Lists are more flexible than that, because an
element can be an object of any type, e.g.,
mylist = [2, 6.0, 'tmp.pdf', [0,1]]
Here mylist holds an int, a float, a string, and a list. This
combination of diverse object types makes up what is known as
heterogeneous lists. We can also easily remove elements from a list
or add new elements anywhere in the list. This flexibility of lists
is in general convenient to have as a programmer, but in cases where
the elements are of the same type and the number of elements is fixed,
arrays can be used instead. The benefits of arrays are faster
computations, less memory demands, and extensive support for
mathematical operations on the data. Because of greater efficiency
and mathematical convenience, arrays will be used to a large extent in
this document. The great use of arrays is also prominent in other
programming environments such as MATLAB, Octave, and R, for instance.
Lists will be our choice instead of arrays when we need the
flexibility of adding or removing elements or when the elements may be
of different object types.
An array object can be viewed as a variant of a list, but with the following assumptions and features:
import statements, is numpy.numpy, a wide range of mathematical operations can be done directly on complete arrays, thereby removing the need for loops over array elements. This is commonly called vectorization %or array computing and may cause a dramatic speed-up of Python programs. Vectorization makes use of the vector computing concepts from the section Vector arithmetics and vector functions.array in standard Python, but this data type is
not so efficient for mathematical computations, and we will not use it
in this document. Second, the number of elements in an array can be
changed, but at a substantial computational cost.
The following text lists some important functionality of NumPy arrays. A more comprehensive treatment is found in the excellent NumPy Tutorial, NumPy User Guide, NumPy Reference, Guide to NumPy, and NumPy for MATLAB Users, all accessible at scipy.org.
The standard import statement for Numerical Python reads
import numpy as np
To convert a list r to an array,
we use the array function
from numpy:
a = np.array(r)
To create a new array of length n, filled with zeros, we write
a = np.zeros(n)
The array elements are of a type that corresponds to Python's
float type. A second argument to np.zeros can be used to
specify other element types, e.g., int.
A similar function,
a = np.zeros_like(c)
generates an array of zeros where the length is that of the array c
and the element type is the same as those in c. Arrays with more
than one index are treated in the section Higher-dimensional arrays.
Often one wants an array to have \( n \) elements with uniformly
distributed values in an interval \( [p,q] \). The numpy function
linspace creates such arrays:
a = np.linspace(p, q, n)
Array elements are accessed by square brackets as for lists:
a[i]. Slices also work as for lists, for example, a[1:-1] picks
out all elements except the first and the last, but contrary to lists,
a[1:-1] is not a copy of the data in a. Hence,
b = a[1:-1]
b[2] = 0.1
will also change a[3] to 0.1. A slice a[i:j:s] picks out the
elements starting with index i and stepping s indices at the time
up to, but not including, j. Omitting i implies i=0, and
omitting j implies j=n if n is the number of elements in the
array. For example, a[0:-1:2] picks out every two elements up to,
but not including, the last element, while a[::4] picks out every
four elements in the whole array.
The statement
import numpy as np
with subsequent prefixing of all NumPy functions and variables by np,
has evolved as a standard syntax in the Python scientific computing
community. However, to make Python programs look closer to MATLAB
and ease the transition to and from that language, one can do
from numpy import *
to get rid of the prefix (this is evolved as the standard in
interactive Python shells). This author prefers mathematical
functions from numpy to be written without the prefix to
make the formulas as close as possible to the mathematics.
So, \( f(x)=\sinh(x-1)\sin(w t) \) would be coded as
from numpy import sinh, sin
def f(x):
return sinh(x-1)*sin(w*t)
or one may take the less recommended
lazy approach from numpy import * and fill up
the program with a lot of functions and variables from numpy.
With these basic operations at hand, we can continue the session from
the previous section and make arrays out of the lists xlist and
ylist:
>>> import numpy as np
>>> x2 = np.array(xlist) # turn list xlist into array x2
>>> y2 = np.array(ylist)
>>> x2
array([ 0. , 0.25, 0.5 , 0.75, 1. ])
>>> y2
array([ 0. , 0.015625, 0.125 , 0.421875, 1. ])
Instead of first making a list and then converting the list to an
array, we can compute the arrays directly. The equally spaced
coordinates in x2 are naturally computed by the np.linspace
function. The y2 array can be created by np.zeros, to ensure that
y2 has the right length len(x2), and then we can run a for loop
to fill in all elements in y2 with f values:
>>> n = len(xlist)
>>> x2 = np.linspace(0, 1, n)
>>> y2 = np.zeros(n)
>>> for i in xrange(n):
... y2[i] = f(x2[i])
...
>>> y2
array([ 0. , 0.015625, 0.125 , 0.421875, 1. ])
Note that we here in the for loop have used xrange instead of
range. The former is faster for long loops because it avoids
generating and storing a list of integers, it just generates the
values one by one. Hence, we prefer xrange over range for loops
over long arrays. In Python version 3.x, range is the same as
xrange.
Creating an array of a given length is frequently referred to as allocating the array. It simply means that a part of the computer's memory is marked for being occupied by this array. Mathematical computations will often fill up most of the computer's memory by allocating long arrays.
We can shorten the previous code by creating the y2 data in a list
comprehension, but list comprehensions produce lists, not arrays, so
we need to transform the list object to an array object:
>>> x2 = np.linspace(0, 1, n)
>>> y2 = np.array([f(xi) for xi in x2])
Nevertheless, there is a much faster way of computing y2 as the next
paragraph explains.
Loops over very long arrays may run slowly. A great advantage with
arrays is that we can get rid of the loops and apply f directly to
the whole array:
>>> y2 = f(x2)
>>> y2
array([ 0. , 0.015625, 0.125 , 0.421875, 1. ])
The magic that makes f(x2) work builds on the vector computing
concepts from the section Vector arithmetics and vector functions. Instead of calling
f(x2) we can equivalently write the function formula x2**3
directly.
The point is that numpy implements vector arithmetics for arrays
of any dimension. Moreover, numpy provides its own versions of
mathematical functions like cos, sin, exp, log, etc., which
work for array arguments and apply the mathematical function to each
element. The following code, computes each array element separately:
from math import sin, cos, exp
import numpy as np
x = np.linspace(0, 2, 201)
r = np.zeros(len(x))
for i in xrange(len(x)):
r[i] = sin(np.pi*x[i])*cos(x[i])*exp(-x[i]**2) + 2 + x[i]**2
while here is a corresponding code that operates on arrays directly:
r = np.sin(np.pi*x)*np.cos(x)*np.exp(-x**2) + 2 + x**2
Many will prefer to see such formulas without the np prefix:
from numpy import sin, cos, exp, pi
r = sin(pi*x)*cos(x)*exp(-x**2) + 2 + x**2
An important thing to understand is that sin from the math module
is different from the sin function provided by numpy. The former
does not allow array arguments, while the latter accepts both real
numbers and arrays.
Replacing a loop like the one above, for computing r[i], by a
vector/array expression like sin(x)*cos(x)*exp(-x**2) + 2 + x**2, is
called vectorization. The loop version is often referred to as
scalar code. For example,
import numpy as np
import math
x = np.zeros(N); y = np.zeros(N)
dx = 2.0/(N-1) # spacing of x coordinates
for i in range(N):
x[i] = -1 + dx*i
y[i] = math.exp(-x[i])*x[i]
is scalar code, while the corresponding vectorized version reads
x = np.linspace(-1, 1, N)
y = np.exp(-x)*x
We remark that list comprehensions,
x = array([-1 + dx*i for i in range(N)])
y = array([np.exp(-xi)*xi for xi in x])
result in scalar code because we still have explicit, slow Python
for loops operating on scalar quantities. The requirement of
vectorized code is that there are no explicit Python for loops. The
loops required to compute each array element are performed in fast C
or Fortran code in the numpy package.
Most Python functions intended for a scalar argument x, like
def f(x):
return x**4*exp(-x)
automatically work for an array argument x:
x = np.linspace(-3, 3, 101)
y = f(x)
provided that the exp function in the definition of f accepts an
array argument. This means that exp must have been imported as
from numpy import * or explicitly as from numpy import exp. One
can, of course, prefix exp as in np.exp, at the loss of a less
attractive mathematical syntax in the formula.
When a Python function f(x) works for an array argument x, we say
that the function f is vectorized. Provided that the mathematical
expressions in f involve arithmetic operations and basic
mathematical functions from the math module, f will be
automatically vectorized by just importing the functions from numpy
instead of math. However, if the expressions inside f involve
if tests, the code needs a rewrite to work with arrays. The section Piecewisely defined functions presents examples where we have to do special
actions in order to vectorize functions.
Vectorization is very important for speeding up Python programs that perform heavy computations with arrays. Moreover, vectorization gives more compact code that is easier to read. Vectorization is particularly important for statistical simulations, as demonstrated in the document Random numbers and simple games [1].