enum Node<Item> {
Leaf(Item),
Children(Vec<Node<Item>>),
}
Creating an Iterator in Rust
When I woke up today, I thought, what a great day to start a blog! So here we are. Before we take off, just a short introduction: I’m Ludwig, I’m a CS student from Germany, and I love Rust. Since this blog is about Rust, I hope you do too!
This post is about a core concept in Rust, iterators. If you don’t know what iterators are, please read the chapter about iterators in the Rust book first.
The collection type
Iterators usually iterate over some sort of collection. Our collection type is a tree:
Simple, right? A tree node is either a Leaf
, in which case it contains an Item
, or a list of child nodes.
A traverse
method
We want to traverse (i.e. iterate over) this kind of tree depth-first. This means that when a node has multiple children, we first traverse the first child and all its descendants before moving on to the second child. This is easy to implement with a recursive algorithm:
impl<It> Node<It> {
fn traverse(&self) {
match self {
Node::Leaf(_) => {}
Node::Children(children) => {
for node in children {
node.traverse();
}
}
}
}
}
If we want to do something with each item, we can pass a closure to the function:
impl<It> Node<It> {
fn traverse(&self, f: impl Fn(&It)) {
match self {
Node::Leaf(item) => {
f(item);
}
Node::Children(children) => {
for node in children {
node.traverse(&f);
}
}
}
}
}
This allows us to iterate over the items. However, Node
still doesn’t implement the Iterator
trait, which would be useful because it provides helper methods such as filter
, fold
and collect
.
As we will see shortly, implementing this trait is quite tricky in this case. The reason for this is that the Iterator
trait provides external iteration, whereas our traverse
method provides internal iteration.
External and internal iteration
Internal iteration means that a closure is passed to a function, which calls the closure for every element. This means that the iterator function has full control over the iteration. External iteration on the other hand means that there’s a struct with a method to get the next element. This means that the code using the iterator controls the iteration. It can pause the iteration, do something else, pass the iterator to another function and maybe resume it later. External iteration is therefore very powerful and flexible. |
Implementing the Iterator
trait
To be able to externally iterate over the tree, we need to implement the Iterator
trait. It looks roughly like this:
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
}
The iterator trait is usually not implemented for a collection directly. Instead, a new type is created that wraps the collection:
struct NodeIter<'a, It>(&'a Node<It>);
Why not implement
Iterator for the collection?This great question was asked recently on Reddit. One reason is, if the collection implemented the More importantly, though, iterators usually need additional state to keep track of the position of the next item in the collection. This state needs to be updated in every iteration, which doesn’t work if the collection is behind an immutable reference and the state is part of the collection. If the iterator is a separate type, it can be mutable even though the collection is immutable. You’ll see how this works in a moment, just read on! |
As the immutable reference in the NodeIter
type indicates, it can only iterate over immutable references of the items. To get an instance of this iterator, we add a .iter()
method:
impl<It> Node<It> {
fn iter(&self) -> NodeIter<'_, It> {
NodeIter(self)
}
}
Now we can start with the actual implementation!
impl<'a, It> Iterator for NodeIter<'a, It> {
type Item = &'a It;
fn next(&mut self) -> Option<Self::Item> {
todo!()
}
}
Now there’s a problem: Since Iterator
provides external iteration, we have to produce one item at a time. This means that we can’t use the simple recursive algorithm we used in the traverse
method. Instead, we have to keep track of the state of the iterator manually:
struct NodeIter<'a, It> {
children: &'a [Node<It>],
parent: Option<Box<NodeIter<'a, It>>>,
}
The children
field contains the remaining children of a node, the parent
field is the iterator of the parent node, if present. It must be wrapped in a Box
because a struct in Rust can’t contain itself without indirection—otherwise, it would be impossible to compute its size on the stack.
So how does this work? When we create the iterator, we put the node into the children
slice and set parent
to None
:
impl<It> Node<It> {
fn iter(&self) -> NodeIter<'_, It> {
NodeIter {
children: std::slice::from_ref(self),
parent: None,
}
}
}
When the iterator is advanced, we first check if children
is empty. If that’s the case, we try to continue iterating the parent node. If there is no parent node, we return None
.
If children
is not empty, we remove the first child and check its variant. If it is a Node::Leaf
, we return its content; if it is a Node::Children
, we create a new iterator for the children. The parent
field is set to self
, and self
is replaced with the newly created iterator:
use std::mem;
impl<'a, It> Iterator for NodeIter<'a, It> {
type Item = &'a It;
fn next(&mut self) -> Option<Self::Item> {
match self.children.get(0) {
None => match self.parent.take() {
Some(parent) => {
// continue with the parent node
*self = *parent;
self.next()
}
None => None,
},
Some(Node::Leaf(item)) => {
self.children = &self.children[1..];
Some(item)
}
Some(Node::Children(children)) => {
self.children = &self.children[1..];
// start iterating the child trees
*self = NodeIter {
children: children.as_slice(),
parent: Some(Box::new(mem::take(self))),
};
self.next()
}
}
}
}
This doesn’t work yet, because mem::take()
requires that NodeIter
implements Default
. But this can be fixed easily:
impl<It> Default for NodeIter<'_, It> {
fn default() -> Self {
NodeIter { children: &[], parent: None }
}
}
The
mem::take() function
|
Now let’s see if the iterator works!
Testing
To check if it works, we can write a unit test:
#[test]
fn test_borrowing_iterator() {
let tree = Node::Children(vec![
Node::Leaf(5),
Node::Leaf(4),
Node::Children(vec![
Node::Leaf(3),
Node::Leaf(2),
Node::Children(vec![]),
]),
Node::Children(vec![
Node::Children(vec![
Node::Children(vec![Node::Leaf(1)]),
Node::Leaf(0),
]),
]),
]);
let nums: Vec<i32> = tree.iter().copied().collect();
assert_eq!(nums, vec![5, 4, 3, 2, 1, 0]);
}
> cargo test -q
running 1 test
.
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
That looks reassuring!
Adding features
Now that we have a working iterator, let’s see how we can improve it. First, let’s check if we can implement more iterator methods to make it more efficient!
Size hint
Every iterator has a size hint, to help the collect
methods decide how much memory to allocate when collecting into something like a Vec
. By default the lower bound of the size hint is 0, so the collect
method might have to re-allocate a few times. This is still better than setting the size hint too high, because that would waste memory.
Unfortunately, we don’t know how many elements a Node
contains, and calculating the number of elements would be expensive, so we’ll skip the size_hint
method.
FusedIterator
Sometimes it’s useful to ensure that after the iterator produces None
for the first time, it will only produce None
values. Iterators with this property are called fused iterators, and any iterator can be converted to a fused iterator with the .fused()
method.
However, if we implement the FusedIterator
trait for our iterator, calling the .fused()
method is more efficient, because it has a specialized implementation for types that implement this trait. So let’s add it:
use std::iter::FusedIterator;
impl<It> FusedIterator for NodeIter<'_, It> {}
That’s it!
IntoIterator
This trait doesn’t make the iterator more efficient, just more ergonomic. Implementing IntoIterator
for &Node<T>
makes it possible to use a node in a for
loop without having to write .iter()
explicitly:
impl<'a, It> IntoIterator for &'a Node<It> {
type Item = &'a It;
type IntoIter = NodeIter<'a, It>;
fn into_iter(self) -> Self::IntoIter {
self.iter()
}
}
Let’s try it out:
#[test]
fn test_borrowing_for_loop() {
let tree = Node::Leaf(42);
for &node in &tree {
let _: i32 = node;
}
}
And… it compiles! 🎉
An owned iterator
We can also implement an iterator that consumes the tree and produces the items as owned values. To implement this iterator, we can copy-paste the borrowed iterator and make a few adjustments:
struct NodeIntoIter<It> {
// we use a VecDeque because it allows
// removing elements from the front efficiently [1]
children: VecDeque<Node<It>>,
parent: Option<Box<NodeIntoIter<It>>>,
}
impl<It> Default for NodeIntoIter<It> {
fn default() -> Self {
NodeIntoIter {
children: Default::default(),
parent: None,
}
}
}
impl<It> Iterator for NodeIntoIter<It> {
type Item = It;
fn next(&mut self) -> Option<Self::Item> {
match self.children.pop_front() {
None => match self.parent.take() {
Some(parent) => {
// continue with the parent node
*self = *parent;
self.next()
}
None => None,
},
Some(Node::Leaf(item)) => Some(item),
Some(Node::Children(children)) => {
// start iterating the child trees
*self = NodeIntoIter {
children: children.into(),
parent: Some(Box::new(mem::take(self))),
};
self.next()
}
}
}
}
Now let’s implement IntoIterator
for Node
, so we can use it:
impl<It> IntoIterator for Node<It> {
type Item = It;
type IntoIter = NodeIntoIter<It>;
fn into_iter(self) -> Self::IntoIter {
let mut children = VecDeque::with_capacity(1);
children.push_back(self);
NodeIntoIter {
children,
parent: None,
}
}
}
Don’t forget to test it:
> cargo test -q
running 4 tests
....
test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
A mutable iterator
Collections usually also have an iterator for mutating the items. They can be particularly tricky to implement safely, because you have to ensure that no part of the iterator is ever borrowed mutably multiple times.
But because this blog post is already way too long, I leave this part as an exercise to the reader. 😛
How to borrow multiple things from a slice mutably?
Getting multiple mutable references into a slice isn’t easy.
One way is to create a mutable iterator with |
If you got stuck implementing this yourself, you may take a peek at this playground.
What about DoubleEndedIterator
?
DoubleEndedIterator
is a trait implemented by iterators that can consume items from both ends. However, we won’t implement this trait for our iterators, because it would make them much more complicated. And who needs that trait anyway? 😉
Fin
You should now be able to implement iterators for tree-like data structures.
All code is available in this playground. Discussion on Reddit.
If you have suggestions what topics I should cover next, please file a bug in the issue tracker. Also file a bug if you have questions or want some things explained in more detail, or if you found a mistake.
I will write posts regularly from now on. If you enjoyed this post, please subscribe to the atom feed (see at the bottom) and share it with your friends! Until next time!
std::vec::IntoIter
instead of a VecDeque
, which is more efficient, and also more idiomatic. In the same way, a std::slice::Iter
can be used for the reference iterator.