Python itertools cheat sheet

The itertools module includes a set of functions for working with sequence data sets. Iterator-based code offers better memory consumption characteristics than code that uses lists. Since data is not produced from the iterator until it is needed, all data does not need to be stored in memory at the same time.

The count() function returns an iterator that produces consecutive integers, indefi- nitely. The first number can be passed as an argument (the default is zero).

from itertools import *

for i in izip(count(1), ['a', 'b', 'c']):

# (1, 'a') (2, 'b') (3, 'c') 

The cycle() function returns an iterator that indefinitely repeats the contents of the arguments it is given. Since it has to remember the entire contents of the input iterator, it may consume quite a bit of memory if the iterator is long.

from itertools import *

for i, item in izip(xrange(7), cycle(['a', 'b', 'c'])):
    print('%s. %s;' % (i, item)),

The repeat() function returns an iterator that produces the same value each time it is accessed.

from itertools import *

for i in repeat('a', 5):

# a a a a a

The chain() function takes several iterators as arguments and returns a single iterator that produces the contents of all of them as though they came from a single iterator.

from itertools import *

for i in chain([5, 6, 7], [14, 15, 16]):
# 5 6 7 14 15 16

izip() returns an iterator that combines the elements of several iterators into tuples.

from itertools import *

for i in izip([5, 6, 7], [14, 15, 16]):
# (5, 14) (6, 15) (7, 16)

The tee() function returns several independent iterators (defaults to 2) based on a single original input. The iterators returned by tee() can be used to feed the same set of data into multiple algorithms to be processed in parallel.

from itertools import *

r = islice(count(), 5)
i1, i2 = tee(r)

print('i1: %s' % list(i1))
print('i2: %s' % list(i2))

# i1: [0, 1, 2, 3, 4]
# i2: [0, 1, 2, 3, 4]

The imap() function returns an iterator that calls a function on the values in the input iterators and returns the results. It works like the built-in map() , except that it stops when any input iterator is exhausted (instead of inserting None values to completely consume all inputs).

from itertools import *

for i in imap(lambda x:2*x, xrange(5)):

# Doubles: 0 2 4 6 8

The dropwhile() function returns an iterator that produces elements of the input iterator after a condition becomes False for the first time. dropwhile() does not filter every item of the input; after the condition is false the first time, all remaining items in the input are returned.

from itertools import *

def should_drop(x):
    print 'Testing:', x
    return (x < 1)

for i in dropwhile(should_drop, [ -1, 0, 1, 2, -2 ]):
    print 'Yielding:', i

# Testing: -1
# Testing: 0
# Testing: 1
# Yielding: 1
# Yielding: 2
# Yielding: -2    

The opposite of dropwhile() is takewhile(). It returns an iterator that returns items from the input iterator, as long as the test function returns True.

ifilter() returns an iterator that works like the built-in filter() does for lists, including only items for which the test function returns True.

from itertools import *

def check_item(x):
    return (x < 1)

for i in ifilter(check_item, [ -1, 0, 1, 2, -2 ]):
    print('Yielding: %s' % i)

# Yielding: -1
# Yielding: 0
# Yielding: -2

ifilter() is different from dropwhile() in that every item is tested before it is returned.

ifilterfalse() returns an iterator that includes only items where the test func tion returns False.

The groupby() function returns an iterator that produces sets of values organized by a common key. It's useful for splitting up the results of large data source. It takes an iterable and a key function. The key function is used to group items with consecutively similar key values together.

from itertools import *
from operator import itemgetter

movies_by_years = [('2013', ['Lone Survivor', 'About Time']),
                   ('2011', ['Intouchables',]),
                   ('2010', ['The Next Three Days']),

for year, items in groupby(movies_by_years, itemgetter(0)):
    movies = list(items)[0][1]
    for movie in movies:
        print('\t%s' % movie)

# 2013
#   Lone Survivor
#   About Time
# 2011
#   Intouchables
# 2010
#   The Next Three Days

Another example:

for key, igroup in groupby(xrange(12), lambda x: x // 5):
    print key, list(igroup)

# 0 [0, 1, 2, 3, 4]
# 1 [5, 6, 7, 8, 9]
# 2 [10, 11]

The compress() makes an iterator that filters elements from data returning only those that have a corresponding element in selectors that evaluates to True. Stops when either the data or selectors iterables has been exhausted.

from itertools import *

letters = compress('ABCDEF', [1,0,1,0,1,1])

for letter in letters:

# A C E F

Useful links

comments powered by Disqus