This chapter is taken from the book A Primer on Scientific Programming with Python by H. P. Langtangen, 5th edition, Springer, 2016.
This section introduces array programming in Python, but first we create some lists and show how arrays differ from lists.
Suppose we have a function \( f(x) \) and want to evaluate this function at a number of \( x \) points \( x_0,x_1,\ldots,x_{n-1} \). We could collect the \( n \) pairs \( (x_i,f(x_i)) \) in a list, or we could collect all the \( x_i \) values, for \( i=0,\ldots,n-1 \), in a list and all the associated \( f(x_i) \) values in another list. The following interactive session demonstrates how to create these three types of lists:
>>> def f(x):
... return x**3 # sample function
...
>>> n = 5 # no of points along the x axis
>>> dx = 1.0/(n-1) # spacing between x points in [0,1]
>>> xlist = [i*dx for i in range(n)]
>>> ylist = [f(x) for x in xlist]
>>> pairs = [[x, y] for x, y in zip(xlist, ylist)]
Here we have used list comprehensions for achieving compact code. Make
sure that you understand what is going on in these list comprehensions
(if not, try to write the same code using standard for
loops and
appending new list elements in each pass of the loops).
The list elements consist of objects of the same type: any element in
pairs
is a list of two float
objects, while any element in xlist
or ylist
is a float
. Lists are more flexible than that, because an
element can be an object of any type, e.g.,
mylist = [2, 6.0, 'tmp.pdf', [0,1]]
Here mylist
holds an int
, a float
, a string, and a list. This
combination of diverse object types makes up what is known as
heterogeneous lists. We can also easily remove elements from a list
or add new elements anywhere in the list. This flexibility of lists
is in general convenient to have as a programmer, but in cases where
the elements are of the same type and the number of elements is fixed,
arrays can be used instead. The benefits of arrays are faster
computations, less memory demands, and extensive support for
mathematical operations on the data. Because of greater efficiency
and mathematical convenience, arrays will be used to a large extent in
this document. The great use of arrays is also prominent in other
programming environments such as MATLAB, Octave, and R, for instance.
Lists will be our choice instead of arrays when we need the
flexibility of adding or removing elements or when the elements may be
of different object types.
An array object can be viewed as a variant of a list, but with the following assumptions and features:
import
statements, is numpy
.numpy
, a wide range of mathematical operations can be done directly on complete arrays, thereby removing the need for loops over array elements. This is commonly called vectorization %or array computing and may cause a dramatic speed-up of Python programs. Vectorization makes use of the vector computing concepts from the section Vector arithmetics and vector functions.array
in standard Python, but this data type is
not so efficient for mathematical computations, and we will not use it
in this document. Second, the number of elements in an array can be
changed, but at a substantial computational cost.
The following text lists some important functionality of NumPy arrays. A more comprehensive treatment is found in the excellent NumPy Tutorial, NumPy User Guide, NumPy Reference, Guide to NumPy, and NumPy for MATLAB Users, all accessible at scipy.org.
The standard import statement for Numerical Python reads
import numpy as np
To convert a list r
to an array,
we use the array
function
from numpy
:
a = np.array(r)
To create a new array of length n
, filled with zeros, we write
a = np.zeros(n)
The array elements are of a type that corresponds to Python's
float
type. A second argument to np.zeros
can be used to
specify other element types, e.g., int
.
A similar function,
a = np.zeros_like(c)
generates an array of zeros where the length is that of the array c
and the element type is the same as those in c
. Arrays with more
than one index are treated in the section Higher-dimensional arrays.
Often one wants an array to have \( n \) elements with uniformly
distributed values in an interval \( [p,q] \). The numpy
function
linspace
creates such arrays:
a = np.linspace(p, q, n)
Array elements are accessed by square brackets as for lists:
a[i]
. Slices also work as for lists, for example, a[1:-1]
picks
out all elements except the first and the last, but contrary to lists,
a[1:-1]
is not a copy of the data in a
. Hence,
b = a[1:-1]
b[2] = 0.1
will also change a[3]
to 0.1
. A slice a[i:j:s]
picks out the
elements starting with index i
and stepping s
indices at the time
up to, but not including, j
. Omitting i
implies i=0
, and
omitting j
implies j=n
if n
is the number of elements in the
array. For example, a[0:-1:2]
picks out every two elements up to,
but not including, the last element, while a[::4]
picks out every
four elements in the whole array.
import numpy as np
with subsequent prefixing of all NumPy functions and variables by np
,
has evolved as a standard syntax in the Python scientific computing
community. However, to make Python programs look closer to MATLAB
and ease the transition to and from that language, one can do
from numpy import *
to get rid of the prefix (this is evolved as the standard in
interactive Python shells). This author prefers mathematical
functions from numpy
to be written without the prefix to
make the formulas as close as possible to the mathematics.
So, \( f(x)=\sinh(x-1)\sin(w t) \) would be coded as
from numpy import sinh, sin
def f(x):
return sinh(x-1)*sin(w*t)
or one may take the less recommended
lazy approach from numpy import *
and fill up
the program with a lot of functions and variables from numpy
.
With these basic operations at hand, we can continue the session from
the previous section and make arrays out of the lists xlist
and
ylist
:
>>> import numpy as np
>>> x2 = np.array(xlist) # turn list xlist into array x2
>>> y2 = np.array(ylist)
>>> x2
array([ 0. , 0.25, 0.5 , 0.75, 1. ])
>>> y2
array([ 0. , 0.015625, 0.125 , 0.421875, 1. ])
Instead of first making a list and then converting the list to an
array, we can compute the arrays directly. The equally spaced
coordinates in x2
are naturally computed by the np.linspace
function. The y2
array can be created by np.zeros
, to ensure that
y2
has the right length len(x2)
, and then we can run a for
loop
to fill in all elements in y2
with f
values:
>>> n = len(xlist)
>>> x2 = np.linspace(0, 1, n)
>>> y2 = np.zeros(n)
>>> for i in xrange(n):
... y2[i] = f(x2[i])
...
>>> y2
array([ 0. , 0.015625, 0.125 , 0.421875, 1. ])
Note that we here in the for
loop have used xrange
instead of
range
. The former is faster for long loops because it avoids
generating and storing a list of integers, it just generates the
values one by one. Hence, we prefer xrange
over range
for loops
over long arrays. In Python version 3.x, range
is the same as
xrange
.
Creating an array of a given length is frequently referred to as allocating the array. It simply means that a part of the computer's memory is marked for being occupied by this array. Mathematical computations will often fill up most of the computer's memory by allocating long arrays.
We can shorten the previous code by creating the y2
data in a list
comprehension, but list comprehensions produce lists, not arrays, so
we need to transform the list object to an array object:
>>> x2 = np.linspace(0, 1, n)
>>> y2 = np.array([f(xi) for xi in x2])
Nevertheless, there is a much faster way of computing y2
as the next
paragraph explains.
Loops over very long arrays may run slowly. A great advantage with
arrays is that we can get rid of the loops and apply f
directly to
the whole array:
>>> y2 = f(x2)
>>> y2
array([ 0. , 0.015625, 0.125 , 0.421875, 1. ])
The magic that makes f(x2)
work builds on the vector computing
concepts from the section Vector arithmetics and vector functions. Instead of calling
f(x2)
we can equivalently write the function formula x2**3
directly.
The point is that numpy
implements vector arithmetics for arrays
of any dimension. Moreover, numpy
provides its own versions of
mathematical functions like cos
, sin
, exp
, log
, etc., which
work for array arguments and apply the mathematical function to each
element. The following code, computes each array element separately:
from math import sin, cos, exp
import numpy as np
x = np.linspace(0, 2, 201)
r = np.zeros(len(x))
for i in xrange(len(x)):
r[i] = sin(np.pi*x[i])*cos(x[i])*exp(-x[i]**2) + 2 + x[i]**2
while here is a corresponding code that operates on arrays directly:
r = np.sin(np.pi*x)*np.cos(x)*np.exp(-x**2) + 2 + x**2
Many will prefer to see such formulas without the np
prefix:
from numpy import sin, cos, exp, pi
r = sin(pi*x)*cos(x)*exp(-x**2) + 2 + x**2
An important thing to understand is that sin
from the math
module
is different from the sin
function provided by numpy
. The former
does not allow array arguments, while the latter accepts both real
numbers and arrays.
Replacing a loop like the one above, for computing r[i]
, by a
vector/array expression like sin(x)*cos(x)*exp(-x**2) + 2 + x**2
, is
called vectorization. The loop version is often referred to as
scalar code. For example,
import numpy as np
import math
x = np.zeros(N); y = np.zeros(N)
dx = 2.0/(N-1) # spacing of x coordinates
for i in range(N):
x[i] = -1 + dx*i
y[i] = math.exp(-x[i])*x[i]
is scalar code, while the corresponding vectorized version reads
x = np.linspace(-1, 1, N)
y = np.exp(-x)*x
We remark that list comprehensions,
x = array([-1 + dx*i for i in range(N)])
y = array([np.exp(-xi)*xi for xi in x])
result in scalar code because we still have explicit, slow Python
for
loops operating on scalar quantities. The requirement of
vectorized code is that there are no explicit Python for
loops. The
loops required to compute each array element are performed in fast C
or Fortran code in the numpy
package.
Most Python functions intended for a scalar argument x
, like
def f(x):
return x**4*exp(-x)
automatically work for an array argument x
:
x = np.linspace(-3, 3, 101)
y = f(x)
provided that the exp
function in the definition of f
accepts an
array argument. This means that exp
must have been imported as
from numpy import *
or explicitly as from numpy import exp
. One
can, of course, prefix exp
as in np.exp
, at the loss of a less
attractive mathematical syntax in the formula.
When a Python function f(x)
works for an array argument x
, we say
that the function f
is vectorized. Provided that the mathematical
expressions in f
involve arithmetic operations and basic
mathematical functions from the math
module, f
will be
automatically vectorized by just importing the functions from numpy
instead of math
. However, if the expressions inside f
involve
if
tests, the code needs a rewrite to work with arrays. The section Piecewisely defined functions presents examples where we have to do special
actions in order to vectorize functions.
Vectorization is very important for speeding up Python programs that perform heavy computations with arrays. Moreover, vectorization gives more compact code that is easier to read. Vectorization is particularly important for statistical simulations, as demonstrated in the document Random numbers and simple games [1].