I’m sure you’re all familiar with tuples, lists, and dictionaries, right? Let’s do a quick tour nonetheless.
‘tuples’ are all over the place. For example, this code for swapping two numbers implicitly uses tuples:
>>> a = 5
>>> b = 6
>>> a, b = b, a
>>> print a == 6, b == 5
True True
That’s about all I have to say about tuples.
I use lists and dictionaries all the time. They’re the two greatest inventions of mankind, at least as far as Python goes. With lists, it’s just easy to keep track of stuff:
>>> x = []
>>> x.append(5)
>>> x.extend([6, 7, 8])
>>> x
[5, 6, 7, 8]
>>> x.reverse()
>>> x
[8, 7, 6, 5]
It’s also easy to sort. Consider this set of data:
>>> y = [ ('IBM', 5), ('Zil', 3), ('DEC', 18) ]
The sort method will run cmp on each of the tuples, which sort on the first element of each tuple:
>>> y.sort()
>>> y
[('DEC', 18), ('IBM', 5), ('Zil', 3)]
Often it’s handy to sort tuples on a different tuple element, and there are several ways to do that. I prefer to provide my own sort method:
>>> def sort_on_second(a, b):
... return cmp(a[1], b[1])
>>> y.sort(sort_on_second)
>>> y
[('Zil', 3), ('IBM', 5), ('DEC', 18)]
Note that here I’m using the builtin cmp method (which is what sort uses by default: y.sort() is equivalent to y.sort(cmp)) to do the comparison of the second part of the tuple.
This kind of function is really handy for sorting dictionaries by value, as I’ll show you below.
(For a more in-depth discussion of sorting options, check out the Sorting HowTo.)
On to dictionaries!
Your basic dictionary is just a hash table that takes keys and returns values:
>>> d = {}
>>> d['a'] = 5
>>> d['b'] = 4
>>> d['c'] = 18
>>> d
{'a': 5, 'c': 18, 'b': 4}
>>> d['a']
5
You can also initialize a dictionary using the dict type to create a dict object:
>>> e = dict(a=5, b=4, c=18)
>>> e
{'a': 5, 'c': 18, 'b': 4}
Dictionaries have a few really neat features that I use pretty frequently. For example, let’s collect (key, value) pairs where we potentially have multiple values for each key. That is, given a file containing this data,
a 5 b 6 d 7 a 2 c 1
suppose we want to keep all the values? If we just did it the simple way,
>>> d = {}
>>> for line in file('data/keyvalue.txt'):
... key, value = line.split()
... d[key] = int(value)
we would lose all but the last value for each key:
>>> d
{'a': 2, 'c': 1, 'b': 6, 'd': 7}
You can collect all the values by using get:
>>> d = {}
>>> for line in file('data/keyvalue.txt'):
... key, value = line.split()
... l = d.get(key, [])
... l.append(int(value))
... d[key] = l
>>> d
{'a': [5, 2], 'c': [1], 'b': [6], 'd': [7]}
The key point here is that d.get(k, default) is equivalent to d[k] if d[k] already exists; otherwise, it returns default. So, the first time each key is used, l is set to an empty list; the value is appended to this list, and then the value is set for that key.
(There are tons of little tricks like the ones above, but these are the ones I use the most; see the Python Cookbook for an endless supply!)
Now let’s try combining some of the sorting stuff above with dictionaries. This time, our contrived problem is that we’d like to sort the keys in the dictionary d that we just loaded, but rather than sorting by key we want to sort by the sum of the values for each key.
First, let’s define a sort function:
>>> def sort_by_sum_value(a, b):
... sum_a = sum(a[1])
... sum_b = sum(b[1])
... return cmp(sum_a, sum_b)
Now apply it to the dictionary items:
>>> items = d.items()
>>> items
[('a', [5, 2]), ('c', [1]), ('b', [6]), ('d', [7])]
>>> items.sort(sort_by_sum_value)
>>> items
[('c', [1]), ('b', [6]), ('a', [5, 2]), ('d', [7])]
and voila, you have your list of keys sorted by summed values!
As I said, there are tons and tons of cute little tricks that you can do with dictionaries. I think they’re incredibly powerful.