Creating an Iterator in Rust

Tutorial 09 Mar 2021 19 minutes read

When I woke up today, I thought, what a great day to start a blog! So here we are. Before we take off, just a short introduction: I’m Ludwig, I’m a CS student from Germany, and I love Rust. Since this blog is about Rust, I hope you do too!

This post is about a core concept in Rust, iterators. If you don’t know what iterators are, please read the chapter about iterators in the Rust book first.

The collection type

Iterators usually iterate over some sort of collection. Our collection type is a tree:

enum Node<Item> {
    Leaf(Item),
    Children(Vec<Node<Item>>),
}

Simple, right? A tree node is either a Leaf, in which case it contains an Item, or a list of child nodes.

A traverse method

We want to traverse (i.e. iterate over) this kind of tree depth-first. This means that when a node has multiple children, we first traverse the first child and all its descendants before moving on to the second child. This is easy to implement with a recursive algorithm:

impl<It> Node<It> {
    fn traverse(&self) {
        match self {
            Node::Leaf(_) => {}
            Node::Children(children) => {
                for node in children {
                    node.traverse();
                }
            }
        }
    }
}

If we want to do something with each item, we can pass a closure to the function:

impl<It> Node<It> {
    fn traverse(&self, f: impl Fn(&It)) {
        match self {
            Node::Leaf(item) => {
                f(item);
            }
            Node::Children(children) => {
                for node in children {
                    node.traverse(&f);
                }
            }
        }
    }
}

This allows us to iterate over the items. However, Node still doesn’t implement the Iterator trait, which would be useful because it provides helper methods such as filter, fold and collect.

As we will see shortly, implementing this trait is quite tricky in this case. The reason for this is that the Iterator trait provides external iteration, whereas our traverse method provides internal iteration.

External and internal iteration

Internal iteration means that a closure is passed to a function, which calls the closure for every element. This means that the iterator function has full control over the iteration.

External iteration on the other hand means that there’s a struct with a method to get the next element. This means that the code using the iterator controls the iteration. It can pause the iteration, do something else, pass the iterator to another function and maybe resume it later. External iteration is therefore very powerful and flexible.

Implementing the Iterator trait

To be able to externally iterate over the tree, we need to implement the Iterator trait. It looks roughly like this:

trait Iterator {
    type Item;

    fn next(&mut self) -> Option<Self::Item>;
}

The iterator trait is usually not implemented for a collection directly. Instead, a new type is created that wraps the collection:

struct NodeIter<'a, It>(&'a Node<It>);
Why not implement Iterator for the collection?

This great question was asked recently on Reddit. One reason is, if the collection implemented the Iterator type directly, the collection could be modified during iteration, which is is error-prone and usually undesired. It would also make it impossible to have multiple iterators that iterate over the same collection simultaneously and independently from each other, because the Iterator::next() method borrows the iterator mutably.

More importantly, though, iterators usually need additional state to keep track of the position of the next item in the collection. This state needs to be updated in every iteration, which doesn’t work if the collection is behind an immutable reference and the state is part of the collection.

If the iterator is a separate type, it can be mutable even though the collection is immutable. You’ll see how this works in a moment, just read on!

As the immutable reference in the NodeIter type indicates, it can only iterate over immutable references of the items. To get an instance of this iterator, we add a .iter() method:

impl<It> Node<It> {
    fn iter(&self) -> NodeIter<'_, It> {
        NodeIter(self)
    }
}

Now we can start with the actual implementation!

impl<'a, It> Iterator for NodeIter<'a, It> {
    type Item = &'a It;

    fn next(&mut self) -> Option<Self::Item> {
        todo!()
    }
}

Now there’s a problem: Since Iterator provides external iteration, we have to produce one item at a time. This means that we can’t use the simple recursive algorithm we used in the traverse method. Instead, we have to keep track of the state of the iterator manually:

struct NodeIter<'a, It> {
    children: &'a [Node<It>],
    parent: Option<Box<NodeIter<'a, It>>>,
}

The children field contains the remaining children of a node, the parent field is the iterator of the parent node, if present. It must be wrapped in a Box because a struct in Rust can’t contain itself without indirection—​otherwise, it would be impossible to compute its size on the stack.

So how does this work? When we create the iterator, we put the node into the children slice and set parent to None:

impl<It> Node<It> {
    fn iter(&self) -> NodeIter<'_, It> {
        NodeIter {
            children: std::slice::from_ref(self),
            parent: None,
        }
    }
}

When the iterator is advanced, we first check if children is empty. If that’s the case, we try to continue iterating the parent node. If there is no parent node, we return None.

If children is not empty, we remove the first child and check its variant. If it is a Node::Leaf, we return its content; if it is a Node::Children, we create a new iterator for the children. The parent field is set to self, and self is replaced with the newly created iterator:

use std::mem;

impl<'a, It> Iterator for NodeIter<'a, It> {
    type Item = &'a It;

    fn next(&mut self) -> Option<Self::Item> {
        match self.children.get(0) {
            None => match self.parent.take() {
                Some(parent) => {
                    // continue with the parent node
                    *self = *parent;
                    self.next()
                }
                None => None,
            },
            Some(Node::Leaf(item)) => {
                self.children = &self.children[1..];
                Some(item)
            }
            Some(Node::Children(children)) => {
                self.children = &self.children[1..];

                // start iterating the child trees
                *self = NodeIter {
                    children: children.as_slice(),
                    parent: Some(Box::new(mem::take(self))),
                };
                self.next()
            }
        }
    }
}

This doesn’t work yet, because mem::take() requires that NodeIter implements Default. But this can be fixed easily:

impl<It> Default for NodeIter<'_, It> {
    fn default() -> Self {
        NodeIter { children: &[], parent: None }
    }
}
The mem::take() function

mem::take() replaces a mutable reference with its default value and returns the previous value. The previous value is effectively moved out of the reference. We use it here to convert &mut self to an owned value, because parent must be owned.

Now let’s see if the iterator works!

Testing

To check if it works, we can write a unit test:

#[test]
fn test_borrowing_iterator() {
    let tree = Node::Children(vec![
        Node::Leaf(5),
        Node::Leaf(4),
        Node::Children(vec![
            Node::Leaf(3),
            Node::Leaf(2),
            Node::Children(vec![]),
        ]),
        Node::Children(vec![
            Node::Children(vec![
                Node::Children(vec![Node::Leaf(1)]),
                Node::Leaf(0),
            ]),
        ]),
    ]);

    let nums: Vec<i32> = tree.iter().copied().collect();
    assert_eq!(nums, vec![5, 4, 3, 2, 1, 0]);
}
> cargo test -q

running 1 test
.
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

That looks reassuring!

Adding features

Now that we have a working iterator, let’s see how we can improve it. First, let’s check if we can implement more iterator methods to make it more efficient!

Size hint

Every iterator has a size hint, to help the collect methods decide how much memory to allocate when collecting into something like a Vec. By default the lower bound of the size hint is 0, so the collect method might have to re-allocate a few times. This is still better than setting the size hint too high, because that would waste memory.

Unfortunately, we don’t know how many elements a Node contains, and calculating the number of elements would be expensive, so we’ll skip the size_hint method.

FusedIterator

Sometimes it’s useful to ensure that after the iterator produces None for the first time, it will only produce None values. Iterators with this property are called fused iterators, and any iterator can be converted to a fused iterator with the .fused() method.

However, if we implement the FusedIterator trait for our iterator, calling the .fused() method is more efficient, because it has a specialized implementation for types that implement this trait. So let’s add it:

use std::iter::FusedIterator;

impl<It> FusedIterator for NodeIter<'_, It> {}

That’s it!

IntoIterator

This trait doesn’t make the iterator more efficient, just more ergonomic. Implementing IntoIterator for &Node<T> makes it possible to use a node in a for loop without having to write .iter() explicitly:

impl<'a, It> IntoIterator for &'a Node<It> {
    type Item = &'a It;

    type IntoIter = NodeIter<'a, It>;

    fn into_iter(self) -> Self::IntoIter {
        self.iter()
    }
}

Let’s try it out:

#[test]
fn test_borrowing_for_loop() {
    let tree = Node::Leaf(42);

    for &node in &tree {
        let _: i32 = node;
    }
}

And…​ it compiles! 🎉

An owned iterator

We can also implement an iterator that consumes the tree and produces the items as owned values. To implement this iterator, we can copy-paste the borrowed iterator and make a few adjustments:

struct NodeIntoIter<It> {
    // we use a VecDeque because it allows
    // removing elements from the front efficiently [1]
    children: VecDeque<Node<It>>,
    parent: Option<Box<NodeIntoIter<It>>>,
}

impl<It> Default for NodeIntoIter<It> {
    fn default() -> Self {
        NodeIntoIter {
            children: Default::default(),
            parent: None,
        }
    }
}

impl<It> Iterator for NodeIntoIter<It> {
    type Item = It;

    fn next(&mut self) -> Option<Self::Item> {
        match self.children.pop_front() {
            None => match self.parent.take() {
                Some(parent) => {
                    // continue with the parent node
                    *self = *parent;
                    self.next()
                }
                None => None,
            },
            Some(Node::Leaf(item)) => Some(item),
            Some(Node::Children(children)) => {
                // start iterating the child trees
                *self = NodeIntoIter {
                    children: children.into(),
                    parent: Some(Box::new(mem::take(self))),
                };
                self.next()
            }
        }
    }
}

Now let’s implement IntoIterator for Node, so we can use it:

impl<It> IntoIterator for Node<It> {
    type Item = It;

    type IntoIter = NodeIntoIter<It>;

    fn into_iter(self) -> Self::IntoIter {
        let mut children = VecDeque::with_capacity(1);
        children.push_back(self);

        NodeIntoIter {
            children,
            parent: None,
        }
    }
}

Don’t forget to test it:

> cargo test -q

running 4 tests
....
test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

A mutable iterator

Collections usually also have an iterator for mutating the items. They can be particularly tricky to implement safely, because you have to ensure that no part of the iterator is ever borrowed mutably multiple times.

But because this blog post is already way too long, I leave this part as an exercise to the reader. 😛

How to borrow multiple things from a slice mutably?

Getting multiple mutable references into a slice isn’t easy. One way is to create a mutable iterator with .iter_mut(). Also there’s a number of methods to help you out:

If you got stuck implementing this yourself, you may take a peek at this playground.

What about DoubleEndedIterator?

DoubleEndedIterator is a trait implemented by iterators that can consume items from both ends. However, we won’t implement this trait for our iterators, because it would make them much more complicated. And who needs that trait anyway? 😉

Fin

You should now be able to implement iterators for tree-like data structures.

All code is available in this playground. Discussion on Reddit.

If you have suggestions what topics I should cover next, please file a bug in the issue tracker. Also file a bug if you have questions or want some things explained in more detail, or if you found a mistake.

I will write posts regularly from now on. If you enjoyed this post, please subscribe to the atom feed (see at the bottom) and share it with your friends! Until next time!


1. I was told that you can use a std::vec::IntoIter instead of a VecDeque, which is more efficient, and also more idiomatic. In the same way, a std::slice::Iter can be used for the reference iterator.