Array or list-like objects with text or other (fixed-valued) Python objects as indices are called dictionaries. They are very useful for storing general collections of objects in a single data structure. The table below displays some of the most important dictionary operations.
Construction | Meaning |
a = {} | initialize an empty dictionary |
a = {'point': [0,0.1], 'value': 7} | initialize a dictionary |
a = dict(point=[2,7], value=3) | initialize a dictionary w/string keys |
a.update(b) | add/update key-value pairs from b in a |
a.update(key1=value1, key2=value2) | add/update key-value pairs in a |
a['hide'] = True | add new key-value pair to a |
a['point'] | get value corresponding to key point |
for key in a: | loop over keys in unknown order |
for key in sorted(a): | loop over keys in alphabetic order |
'value' in a | True if string value is a key in a |
del a['point'] | delete a key-value pair from a |
list(a.keys()) | list of keys |
list(a.values()) | list of values |
len(a) | number of key-value pairs in a |
isinstance(a, dict) | is True if a is a dictionary |
Some of the most useful functionalities in a string object s
are
listed below.
Split the string into substrings separated by delimiter
:
words = s.split(delimiter)
Join elements in a list of strings:
newstring = delimiter.join(words[i:j])
Extract a substring:
substring = s[2:n-4]
Replace a substring substr
by new a string replacement
:
s_new = s.replace(substr, replacement)
Check if a substring is contained within another string:
if 'some text' in s:
...
Find the index where some text starts:
index = s.find(text)
if index == -1:
print 'Could not find "%s" in "%s" (text, s)
else:
substring = s[index:] # strip off chars before text
Extend a string:
s += another_string # append at the end
s = another_string + s # append at the beginning
Check if a string contains whitespace only:
if s.isspace():
...
Note: you cannot change the characters in a string like you can change elements in a list (a string is in this sense like a tuple). You have to make a new string:
>>> filename = 'myfile1.txt'
>>> filename[6] = '2'
Traceback (most recent call last):
...
TypeError: 'str' object does not support item assignment
>>> filename.replace('1', '2')
'myfile2.txt'
>>> filename[:6] + '2' + filename[7:] # 'myfile' + '2' + '.txt'
'myfile2.txt'
Internet files can be downloaded if we know their URL:
import urllib
url = 'http://www.some.where.net/path/thing.html'
urllib.urlretrieve(url, filename='thing.html')
The downloaded information is put in the local file thing.html
in the current working folder.
Alternatively, we can open the URL as a file object:
webpage = urllib.urlopen(url)
HTML files are often messy to interpret by string operations.
The important computer science topics in this document are
We have a file containing information about the courses that students have
taken.
The file format consists of blocks with student data, where each
block starts with the student's name (Name:
), followed by
the courses that the student has taken. Each course line starts with the
name of the course, then comes the semester when the exam was taken,
then the size of the course in terms of credit points,
and finally the grade is listed (letters A
to F
).
Here is an example of a file with three student entries:
Name: John Doe
Astronomy 2003 fall 10 A
Introductory Physics 2003 fall 10 C
Calculus I 2003 fall 10 A
Calculus II 2004 spring 10 B
Linear Algebra 2004 spring 10 C
Quantum Mechanics I 2004 fall 10 A
Quantum Mechanics II 2005 spring 10 A
Numerical Linear Algebra 2004 fall 5 E
Numerical Methods 2004 spring 20 C
Name: Jan Modaal
Calculus I 2005 fall 10 A
Calculus II 2006 spring 10 A
Introductory C++ Programming 2005 fall 15 D
Introductory Python Programming 2006 spring 5 A
Astronomy 2005 fall 10 A
Basic Philosophy 2005 fall 10 F
Name: Kari Nordmann
Introductory Python Programming 2006 spring 5 A
Astronomy 2005 fall 10 D
Our problem consists of reading this file into a dictionary data
with the
student name as key and a list of courses as value.
Each element in the list of courses is a dictionary holding the
course name, the semester, the credit points, and the grade.
A value in the data
dictionary may look as
'Kari Nordmann': [{'credit': 5,
'grade': 'A',
'semester': '2006 spring',
'title': 'Introductory Python Programming'},
{'credit': 10,
'grade': 'D',
'semester': '2005 fall',
'title': 'Astronomy'}],
Having the data
dictionary, the next task is to print out the
average grade of each student.
We divide the problem into two major tasks: loading the file data into
the data
dictionary, and computing the average grades.
These two tasks are naturally placed in two functions.
We need to have a strategy for reading the file and interpreting the
contents. It will be natural to read the file line by line, and for
each line check if this is a line containing a new student's name,
a course information line, or a blank line.
In the latter case we jump to the next pass in the loop. When a
new student name is encountered, we initialize a new entry in the
data
dictionary to an empty list. In the case of a line about
a course, we must interpret the contents on that line, which we
postpone a bit.
We can now sketch the algorithm described above in terms of some unfinished Python code, just to get the overview:
def load(studentfile):
infile = open(studentfile, 'r')
data = {}
for line in infile:
i = line.find('Name:')
if i != -1:
# line contains 'Name:', extract the name.
...
elif line.isspace(): # Blank line?
continue # Yes, go to next loop iteration.
else:
# This must be a course line, interpret the line.
...
infile.close()
return data
If we find 'Name:'
as a substring in line
, we must extract
the name. This can be done by the substring
line[i+5:]
. Alternatively, we can split the line with respect to
colon and strip off the first word:
words = line.split(':')
name = ' '.join(words[1:])
We have chosen the former strategy of extracting the name as a substring in the final program.
Each course line is naturally split into words for extracting information:
words = line.split()
The name of the course consists of a number of words, but we do not know
how many.
Nevertheless, we know that the final words contain the semester, the credit
points, and the grade. We can hence count from the right and extract
information, and when we are finished with the semester information,
the rest of the words
list holds the words in the name of the course.
The code goes as follows:
grade = words[-1]
credit = int(words[-2])
semester = ' '.join(words[-4:-2])
course_name = ' '.join(words[:-4])
data[name].append({'title': course_name,
'semester': semester,
'credit': credit,
'grade': grade})
This code is a good example of the usefulness of split and join operations when extracting information from a text.
Now to the second task of computing the average grade. Since the grades are letters we cannot compute with them. A natural way to proceed is to convert the letters to numbers, compute the average number, and then convert that number back to a letter. Conversion between letters and numbers is easily represented by a dictionary:
grade2number = {'A': 5, 'B': 4, 'C': 3, 'D': 2, 'E': 1, 'F': 0}
To convert from numbers to grades, we construct the inverse dictionary:
number2grade = {}
for grade in grade2number:
number2grade[grade2number[grade]] = grade
In the computation of the average grade we should use a weighted
sum such that larger courses count more than smaller courses.
The weighted mean value of a set of numbers \( r_i \) with weights \( w_i \),
\( i=0,\ldots,n-1 \), is given by
$$
\begin{equation*} {\sum_{i=0}^{n-1} w_ir_i\over\sum_{i=0}^{n-1} w_i}\tp\end{equation*}
$$
This weighted mean value must then be rounded to the nearest integer,
which can be used as key in number2grade
to find the corresponding
grade expressed as a letter.
The weight \( w_i \) is naturally taken as the number of credit points
in the course with grade \( r_i \).
The whole process is performed by the following function:
def average_grade(data, name):
sum = 0; weights = 0
for course in data[name]:
weight = course['credit']
grade = course['grade']
sum += grade2number[grade]*weight
weights += weight
avg = sum/float(weights)
return number2grade[round(avg)]
The complete code is found in the file students.py. Running this program gives the following output of the average grades:
John Doe: B
Kari Nordmann: C
Jan Modaal: C
One feature of the students.py
code is that the output of the names
are sorted after the last name. How can we accomplish that?
A straight for name in data
loop will visit the keys in an unknown
(random) order. To visit the keys in alphabetic order, we must
use
for name in sorted(data):
This default sort will sort with respect to the first character in
the name
strings. We want a sort according to the last part of
the name. A tailored sort function can then be
written.
In this function we
extract the last word in the names and compare them:
def sort_names(name1, name2):
last_name1 = name1.split()[-1]
last_name2 = name2.split()[-1]
if last_name1 < last_name2:
return -1
elif last_name1 > last_name2:
return 1
else:
return 0
We can now pass on sort_names
to the sorted
function to
get a sequence that is sorted with respect to the last word in the students'
names:
for name in sorted(data, sort_names):
print '%s: %s' % (name, average_grade(data, name))