Lazy sequences in Kotlin

Sequences in Kotlin is a data structure using lazy evaluation. Kotlin’s sequences work the same way as streams from Java 8. The reason behind reinventing the wheel here instead of reusing Java’s streams was to support them on all platforms, even those that do not support Java 8 (primarily Android).

The concept of lazy evaluation is all about evaluating expressions only if and when it becomes necessary at runtime. This is in contrast to eager evaluation, where every expression is eagerly evaluated even if the result or part of it is never used.

The main benefit of lazy evaluation is that it can improve performance when working with large collections or when performing expensive operations on them when only part of the results are actually required. This performance benefit stems from two things: avoiding unnecessary computations and avoiding the creation of list objects to hold intermediate results.

Following listing shows a simple example in which animals could be a normal collection (list, set, or map) using eager evaluation or a sequence using lazy evaluation. How these would behave differently is explained below, but the way you can work with them using higher­order functions is the same in either case.

// 'animals' stores the strings "Dog", "Cat", "Chicken", "Frog"
// 'animals' may be an eager collection or a lazy sequence but usage is the same
animals.filter { it.startsWith("C") }
    .map { "$it starts with a 'C'" }
    .take(1)

This code filters the animals to keep only those starting with a "C" maps these to a different string, and finally takes the first result. Let’s explore how this code behaves in both eager evaluation and lazy evaluation:

  • In eager evaluation, if animals is a collection or an array, the code would first filter all four elements and then store the intermediate result in a newly created list object. After that, it performs the map on each element of this intermediate result and produces another intermediate list. Finally, it takes the first element from the mapped list. In total, two intermediate objects are created, and four filter operations and two map operations are performed - even though ultimately just one element is used.
  • In lazy evaluation, if animals is a sequence, each element goes through the function chain one by one. Thus, it would first filter out "Dog" because it doesn’t start with a "C" and immediately proceed with the next element. It then moves on to "Cat", which passes the filter. Then, it maps this element and calls take(1). With this, the query is fulfilled and no more operations are performed—the last two elements are not touched.

There are three ways to get hold of a sequence. First, there is a helper function sequenceOf, consistent with the helpers that create collections.

val sequence = sequenceOf(­5, 0, 5)
println(sequence.joinToString())

Second, when you already have a (potentially large) collection, you can transform this collection into a sequence by calling asSequence. For instance, imagine you had a list of all large cities in the world and want to print those cities whose names start with "W".

// Transforms eager list into a lazy sequence
val output = cities.asSequence() 
    .filter { it.startsWith("W") }
    .map { "City: $it" }
    .joinToString()

The third way in which lazy sequences may be used is to create a sequence from the get­-go instead of transforming an existing collection into a sequence. For this, the helper function generateSequence is used, which takes in a seed element and a function to calculate the next element based on its predecessor.

val naturalNumbers = generateSequence(0) { it + 1 } 
val integers = generateSequence(0) { if (it > 0) ­it else ­it + 1 }

The first line uses zero as the seed element to start the sequence. Each next element adds one to its predecessor, resulting in the sequence 0, 1, 2, 3, ... of natural numbers. The second line takes it one step further to define the sequence 0, 1, ­1, 2, ­2, 3, ­3, ... of all integers. This shows that each element in the sequence is calculated based on the previous one, and only when necessary.

With this, you defined your first infinite sequence. This is only possible due to lazy evaluation; there is no way to actually store a data structure of all natural numbers in memory.

To illustrate the difference between lazy sequences and eager collections, let us look again at the minimal example that performs a chain of functions on a list of cities.

val cities = listOf("Washington", "Houston", "Seattle", "Worcester", "San Francisco")

cities.filter { println("filter: $it"); it.startsWith("W") } 
    .map { println("map: $it"); "City: $it" } 
    .take(2) 

If using normal collections, each function in the chain is fully evaluated, the intermediate result is stored in an object, and only then the next function is executed.

  1. First, filter iterates over the whole collection, leaving only Washington and Worcester. These are stored in a new list as an intermediate result.
  2. Then, map is called on this intermediate list of two elements, mapping each to a string with a "City: " prefix. Another list is created to store this next intermediate result.
  3. Finally, take returns another newly created list as the result.

Thus, this produces the output due to eager evaluation.

filter: Washington
filter: Houston
filter: Seattle
filter: Worcester
filter: San Francisco
map: Washington
map: Worcester

With sequences, the processing is very different. Each element goes through the chain one by one.

cities.asSequence()
    .filter { println("filter: $it"); it.startsWith("W") }
    .map { println("map: $it"); "City: $it" }
    .take(2) 
    .toList()

Here, a call to asSequence is included and also a call to toList as the terminal operation. Without such a terminal operation, no computation would be performed due to the laziness.

filter: Washington
map: Washington 
filter: Houston
filter: Seattle
filter: Worcester
map: Worcester

First, you can see that each element goes through the chain of functions one by one. Thus, there is no need to store intermediate results. Second, it becomes obvious that map is only performed on elements passing the filter (this is also the case in eager evaluation). Lastly, due to taking only two elements, the last element (San Francisco) is not processed at all. This is different from eager evaluation and is the second reason why sequences can improve performance.

Now it becomes clear why sequences tend to improve performance for large collections or when performing expensive operations that are partly unnecessary.