$$ \newcommand{\Oof}[1]{\mathcal{O}(#1)} \newcommand{\F}{\boldsymbol{F}} \newcommand{\J}{\boldsymbol{J}} \newcommand{\x}{\boldsymbol{x}} \renewcommand{\c}{\boldsymbol{c}} $$




Lists and tuples - alternatives to arrays

We have seen that a group of numbers may be stored in an array that we may treat as a whole, or element by element. In Python, there is another way of organizing data that actually is much used, at least in non-numerical contexts, and that is a construction called list.

A list is quite similar to an array in many ways, but there are pros and cons to consider. For example, the number of elements in a list is allowed to change, whereas arrays have a fixed length that must be known at the time of memory allocation. Elements in a list can be of different type, i.e you may mix integers, floats and strings, whereas elements in an array must be of the same type. In general, lists provide more flexibility than do arrays. On the other hand, arrays give faster computations than lists, making arrays the prime choice unless the flexibility of lists is needed. Arrays also require less memory use and there is a lot of ready-made code for various mathematical operations. Vectorization requires arrays to be used.

The range() function that we used above in our for loop actually returns a list. If you for example write range(5) at the prompt in ipython, you get [0, 1, 2, 3, 4] in return, i.e., a list with 5 numbers. In a for loop, the line for i in range[5] makes i take on each of the numbers \( 0, 1, 2, 3, 4 \) in turn, as we saw above. Writing, e.g., x = range(5), gives a list by the name x, containing those five numbers. These numbers may now be accessed (e.g., as x[2], which contains the number 2) and used in computations just as we saw for array elements. As with arrays, indices run from \( 0 \) to \( n - 1 \), when n is the number of elements in a list. You may convert a list to an array by x = array(L).

A list may also be created by simply writing, e.g.,

x = ['hello', 4, 3.14, 6]

giving a list where x[0] contains the string hello, x[1] contains the integer 4, etc. We may add and/or delete elements anywhere in the list as shown in the following example.

x = ['hello', 4, 3.14, 6]
x.insert(0, -2) # x then becomes [-2, 'hello', 4, 3.14, 6]
del x[3]        # x then becomes [-2, 'hello', 4, 6]
x.append(3.14)  # x then becomes [-2, 'hello', 4, 6, 3.14]

Note the ways of writing the different operations here. Using append() will always increase the list at the end. If you like, you may create an empty list as x = [] before you enter a loop which appends element by element. If you need to know the length of the list, you get the number of elements from len(x), which in our case is 5, after appending 3.14 above. This function is handy if you want to traverse all list elements by index, since range(len(x)) gives you all legal indices. Note that there are many more operations on lists possible than shown here.

Previously, we saw how a for loop may run over array elements. When we want to do the same with a list in Python, we may do it as this little example shows,

x = ['hello', 4, 3.14, 6]
for e in x:
    print 'x element: ', e
print 'This was all the elements in the list x'

This is how it usually is done in Python, and we see that e runs over the elements of x directly, avoiding the need for indexing. Be aware, however, that when loops are written like this, you can not change any element in x by "changing" e. That is, writing e += 2 will not change anything in x, since e can only be used to read (as opposed to overwrite) the list elements.

There is a special construct in Python that allows you to run through all elements of a list, do the same operation on each, and store the new elements in another list. It is referred to as list comprehension and may be demonstrated as follows.

List_1 = [1, 2, 3, 4]
List_2 = [e*10 for e in List_1]

This will produce a new list by the name List_2, containing the elements 10, 20, 30 and 40, in that order. Notice the syntax within the brackets for List_2, for e in List_1 signals that e is to successively be each of the list elements in List_1, and for each e, create the next element in List_2 by doing e*10. More generally, the syntax may be written as

List_2 = [E(e) for e in List_1]

where E(e) means some expression involving e.

In some cases, it is required to run through 2 (or more) lists at the same time. Python has a handy function called zip for this purpose. An example of how to use zip is provided in the code file_handling.py below.

We should also briefly mention about tuples, which are very much like lists, the main difference being that tuples cannot be changed. To a freshman, it may seem strange that such "constant lists" could ever be preferable over lists. However, the property of being constant is a good safeguard against unintentional changes. Also, it is quicker for Python to handle data in a tuple than in a list, which contributes to faster code. With the data from above, we may create a tuple and print the content by writing

x = ('hello', 4, 3.14, 6)
for e in x:
    print 'x element: ', e
print 'This was all the elements in the tuple x'

Trying insert or append for the tuple gives an error message (because it cannot be changed), stating that the tuple object has no such attribute.

Reading from and writing to files

Input data for a program often come from files and the results of the computations are often written to file. To illustrate basic file handling, we consider an example where we read \( x \) and \( y \) coordinates from two columns in a file, apply a function \( f \) to the \( y \) coordinates, and write the results to a new two-column data file. The first line of the input file is a heading that we can just skip:

# x and y coordinates
1.0  3.44
2.0  4.8
3.5  6.61
4.0  5.0

The relevant Python lines for reading the numbers and writing out a similar file are given in the file file_handling.py

filename = 'tmp.dat'
infile = open(filename, 'r')  # Open file for reading
line = infile.readline()      # Read first line
# Read x and y coordinates from the file and store in lists
x = []
y = []
for line in infile:
    words = line.split()      # Split line into words

# Transform y coordinates
from math import log

def f(y):
    return log(y)

for i in range(len(y)):
    y[i] = f(y[i])

# Write out x and y to a two-column file
filename = 'tmp_out.dat'
outfile = open(filename, 'w')  # Open file for writing
outfile.write('# x and y coordinates\n')
for xi, yi in zip(x, y):
    outfile.write('%10.5f %10.5f\n' % (xi, yi))

Such a file with a comment line and numbers in tabular format is very common so numpy has functionality to ease reading and writing. Here is the same example using the loadtxt and savetxt functions in numpy for tabular data (file file_handling_numpy.py):

filename = 'tmp.dat'
import numpy
data = numpy.loadtxt(filename, comments='#')
x = data[:,0]
y = data[:,1]
data[:,1] = numpy.log(y)  # insert transformed y back in array
filename = 'tmp_out.dat'
filename = 'tmp_out.dat'
outfile = open(filename, 'w')  # open file for writing
outfile.write('# x and y coordinates\n')
numpy.savetxt(outfile, data, fmt='%10.5f')