Rusts Module System Explained

Tutorial 28 Mar 2021 18 minutes read

The Rust programming language can be confusing for beginners, and the module system is one part that causes frustration particularly often. There are quite a few blog posts out there trying to explain the module system in a simple way, but I often have the feeling that they over-simplify things. So here’s my take—​a more detailed explanation of the module system.

This post assumes that you can write at least a “hello world” Rust program. It’s a rather long read, so get comfortable, maybe with a cup of tea, hot chocolate, or whatever your heart desires 😊

Why do modules exist?

Modules give your code structure: Dividing your code into modules is like dividing your house into several rooms: Each room has a different purpose, and rooms can be locked for privacy.

Floor plan
Figure 1. Floor plan. Source (modified; license)

The module tree

Modules are structured in a hierarchy, the module tree, which is similar to the file tree of the source files. There are two kinds of modules: Inline modules and β€œnormal” modules:

mod inline {
    // content of the module
}

mod normal;
// the content is in another file

These are functionally equivalent. If the content of an inline module is very long, move it to another file, to keep the code neat and manageable.

When the module is not inline, Rust looks for the content of the module in another file, either module_name.rs or module_name/mod.rs. It might seem odd that we have to declare modules explicitly (unlike in Python, where modules are inferred from the file system). However, there are good reasons for this, as we’ll see later.

Like every tree, the module tree has a root. This is the file lib.rs in case of a library crate, or the file main.rs in case of a binary crate [1].

Submodules

Unfortunately, Rust is not the most consistent language when it comes to modules: There are two different ways to structure a module tree, and they can be mixed within the same crate.

Say we have a library crate with a module parent, which contains a sub-module child:

└─ library root
   └─ parent
      └─ child

The crate root is in a lib.rs file in the src directory. However, the parent module can be either in a parent.rs file next to lib.rs, or in a mod.rs file in a parent directory:

File tree A
β”œβ”€ Cargo.toml
└─ src/
   β”œβ”€ lib.rs
   β”œβ”€ parent.rs  // parent module
   └─ parent/
      └─ child.rs
File tree B
β”œβ”€ Cargo.toml
└─ src/
   β”œβ”€ lib.rs
   └─ parent/
      β”œβ”€ mod.rs  // parent module
      └─ child.rs

It doesn’t really matter which way you go, just do what you prefer. I use the first way (“File tree A”), since it’s easier to add sub-modules. For example, if you want to add a submodule to child, you just need to create a folder and a new file, and add a mod declaration:

 β”œβ”€ Cargo.toml
 └─ src/
    β”œβ”€ lib.rs
    β”œβ”€ parent.rs
    └─ parent/
       β”œβ”€ child.rs
      └─ child/
          └─ grand_child.rs

The path of a module can also be specified explicitly with the #[path] attribute, but this is rarely used in practice.

An example

Hopefully this will make more sense once you see an example. Here’s the module structure of a library crate:

β”œβ”€ Cargo.toml
└─ src/
   β”œβ”€ lib.rs
   β”œβ”€ foo.rs
   β”œβ”€ bar.rs
   └─ bar/
      └─ baz.rs
lib.rs
// root module

pub mod foo;
pub mod bar;
bar.rs
pub mod baz;

pub use baz::*;
pub use crate::foo::Answer;
foo.rs
mod answer {
    pub struct Answer(pub i32);
}

pub use answer::Answer;
baz.rs
use super::Answer;

pub fn answer() -> Answer {
    Answer(42)
}

No worries if you don’t understand everything here! All the concepts that are used here will be explained. You can look at this example later and see if you understand everything. For now, do you know what the module tree is?

See solution
└─ library root  /src/lib.rs
   β”œβ”€ foo        /src/foo.rs
   β”‚  └─ answer  /src/foo.rs
   └─ bar        /src/bar.rs
      └─ baz     /src/bar/baz.rs

Items and paths

A module contains items. Items are

  • Functions

  • Types (structs, enums, unions, type aliases)

  • Traits

  • Impl blocks

  • Macros

  • Constants and statics

  • Extern blocks

  • Extern crates

  • Imports

  • Modules

  • Associated items (not important right now)

You can refer to items by their path. For example, the path foo::bar::Baz refers to the Baz item within the bar item within the foo item. Paths are usually relative: To use foo::bar::Baz, the foo item must be available in the current scope; absolute paths (starting at the root module) are prefixed with crate::. A super:: path segment changes to the parent module (similar to ../ in the file system).

Imports are used to shorten paths. Instead of having to write foo::bar::Baz every time, we can write use foo::bar::Baz; once. This brings the item into scope, so we can refer to it with the much shorter path Baz.

Changes to paths in the 2018 edition

Prior to the 2018 edition, absolute paths started with just :: instead of crate::. In the 2018 edition, this syntax is still available, but it’s not recommended and can only be used for external crates.

In the 2015 edition, imports were always absolute, even when they weren’t prefixed with ::. This was fixed in the 2018 edition for more consistency.

The 2018 edition also changed how external crates are used: In the 2015 edition, to use an external crate, an extern crate declaration was needed. This is no longer required in most cases: We can just put dependencies in our Cargo.toml, and use them right away.

Visibility

Visibility, or privacy, is the concept of making parts of a module inaccessible from other modules. Things that are only accessible in the same module are called private, and things that are accessible everywhere are called public.

This concept exists in many programming languages. However, in most object-oriented languages, the privacy boundary is the class, whereas in Rust, the modules are privacy boundaries.

In Rust, most things are private by default. To make something public, the pub keyword is written before it. This makes the item accessible everywhere:

lib.rs
mod foo {  (1)
    pub mod bar {
        struct Baz;
    }
    // use bar::Baz;  (2)
}

use foo::bar;  (3)
1 This declares a private module, so it can only be used within this root module. It can’t be accessed from another crate.
2 If we uncommented this, it would fail to compile. Baz is private, therefore it can only be used within the bar module.
3 The module bar can be used here, because it is declared as public. This is somewhat counter-intuitive, since the foo module is private. But when a module is private, it can still be accessed within its direct parent module, since a module is just like any other item.

Encapsulation

When designing an API, there are often invariants that need to be preserved. An invariant is a property that never changes. For example, a struct might contain a value that is supposed to always be within the interval [0; 360):

pub struct Angle(pub f32);

Let’s write a new function that validates this invariant, and a getter for the value:

impl Angle {
    pub fn new(value: f32) -> Self {
        Angle(value.rem_euclid(360.0)) (1)
    }

    pub fn value(&self) -> f32 {
        self.0
    }
}
1 rem_euclid calculates the least nonnegative remainder of self (mod rhs).

By ensuring that the angle is always in [0; 360), we can implement operations such as equality (where 0Β° == 360Β°) very easily. But wait! Since the field is public, a user of the API can create an Angle object without calling the new function, or modify it without checking the invariant.

By making the field private, the struct’s implementation details are hidden. This is called encapsulation: Within this module, we still have to take special care that the invariant is preserved, but if the code is correct, the public API is impossible to use incorrectly.

Fine-grained visibility

Items can be private or public. However, there are also visibilities in-between: Most notably, an item can be declared as pub(crate). This means that it is visible within the current crate, but not outside. With pub(super), an item is visible within the parent module. With pub(in path), visibility can also be limited to any other module as well:

pub(crate) mod foo {
    pub(super) fn bar() {}
    pub(in crate::foo) struct Baz;
}

When something is visible in one module, it is also visible in all its child modules. It still needs to be imported (or referred to with its path) though:

struct Foo;
// Foo is visible in this module

mod inner {
    use super::Foo;
    // Foo is also visible here!
}

Visibilities overview

pub

The item is visible everywhere

pub(crate)

The item is visible in the current crate

pub(super)

The item is visible in the parent module

pub(in some::path)

The item is visible in the specified path. The path must refer to an ancestor module of the item.

pub(self)

The item is private, i.e. visible only in the same module. This is equivalent to omitting the visibility entirely.

Exports

With pub use declarations, items can be re-exported from a different module than the one they were declared in. A re-exported item has multiple paths that refer to the same thing. For example:

lib.rs
pub mod answer {
    pub const ANSWER: i32 = 42;
}
pub use answer::ANSWER;

Now ANSWER can be referred to as either crate::ANSWER or crate​::answer::ANSWER. However, not every path is always reachable. Take, for example:

lib.rs
mod answer {
    pub const ANSWER: i32 = 42;
}
pub use answer::ANSWER;

crate​::answer::ANSWER is public, but it can’t be used from outside the crate, because the answer module is private. Only the re-export crate::ANSWER can be used from outside the crate.

Common pitfalls

The module tree must be built manually.

There’s no implicit mapping from the file system tree to the module tree: We need to declare all modules with the mod keyword.

Don’t confuse visibility with reachability.

The visibility of an item is like an upper bound, it can’t be increased with re-exports. For example, we can’t re-export a private struct outside of its module.

However, a public item might not be reachable from outside the crate, if it’s in a private module and isn’t publicly re-exported. To make an item available in the crate root, it’s not enough to make it public; we also need to make it reachable from the crate root.

Don’t confuse visibility with availability.

Visibility means that we are principally allowed to use an item somewhere. It doesn’t mean that the item is available, i.e. in scope, so we might still have to import it (or refer to it with its path).

Special cases

There are a few language constructs that don’t adhere to the same rules as everything else:

Enum variants and fields

Enum variants and variant fields are always public, and it’s not possible to make them private. Therefore we should be careful when exposing enums publicly, because changing the variants or fields later is not backwards compatible.

You can add the #[non_exhaustive] attribute to an enum to allow adding more variants later. This mean that the enum can’t be matched exhaustively; we’ll always need to add a wildcard match arm (_ => {}).

Sometimes it’s a good idea to wrap an enum in a struct to hide the implementation details:

pub struct Foo(FooImpl); // FooImpl is private

enum FooImpl {
    // ...
}

Also, when an enum variant has multiple fields, it’s usually better to put them in a separate struct, so it’s possible to make the fields private or make the struct non_exhaustive:

// Instead of this:
pub enum Foo {
    Variant {
        field: i32,
        other_field: bool,
    }
}

// do this:
pub enum Foo {
    Variant(FooVariant),
}
pub struct FooVariant {
    field: i32,
    other_field: bool,
}

Macros

Declarative macros (the ones that are declared with macro_rules!) behave more like local variables within a function than like items in some regards. For example, they can be shadowed, they have to be declared before they can be used, and they don’t need to be explicitly imported in child modules [2].

And, they can’t be declared public. The #[macro_export] attribute can be added to a macro, which exports it publicly at the crate root. This can be undesirable, however, if it’s not supposed to be part of the public API; there is no equivalent of pub(crate) for macros.

One workaround for this is to put our macros in a module and annotate the module with #[macro_use]. The module should be the first module declaration in the crate root. This ensures that the macros can be used everywhere in our crate, but not outside of the crate. Not the most elegant solution, but it works.

Why are modules declared explicitly?

I promised to explain why modules have to be declared explicitly. There are a few reasons:

  1. With the #[path] attribute, a module can be located in a different directory, or have a different name than the file.

  2. Module declarations can have a visibility, e.g. pub(crate) mod foo;

  3. Sometimes there are files which you don’t want to include in the module tree.

    For example, a crate with both a library and a binary target usually contains a lib.rs file for the library and a main.rs file for the binary. Submodules are stored in the same directory, but some modules are only needed by the library, and some only by the binary. By specifying the modules explicitly, you can include only the necessary modules in each file.

Fin

I hope you liked this post! Please let me know if you found this article useful; were there any things that were unclear or confusing? I’ll try to improve it over time.

Discussion on Reddit. You can also open an issue in the issue tracker.

Until next time!


1. A crate can also have multiple targets (library, binary, example, test, and benchmark targets), in which case each target has its own root. You can read more about this here.
2. This is called “textual scoping”. Actually, macros can have both a textual scope (like local variables) and a path-based scope (like items); the rules for this are quite complicated.