Error handling in Rust

Some error handling strategies are more equal than others.

Type-erased errors

Box<dyn std::error::Error>¹ certainly has its uses. It's very convenient if the API consumer² genuinely does not care what an error was, only that there was an error. If the reactive action the API consumer needs to perform when an error occurs is exactly the same regardless of what the error actually was, then Box<dyn Error> is perfectly fine. It's easy to reach for Box<dyn Error> because getting ? to work on all error types in your function for free is very attractive from a convenience standpoint. However, once you need to do something specific when a specific error occurs, you should no longer be using Box<dyn Error>.

In order to handle errors inside Box<dyn Error>, you must know the exact type signature of the error you intend to handle. With generic code, this can sometimes be difficult, especially if you're new to Rust. The compiler can provide almost no useful diagnostics about whether you're trying to downcast to the right type. This also means you have to keep track of which concrete errors are inside the Box<dyn Error> yourself so that you don't accidentally try to handle an error that will never occur there. The loss of (potential for) exhaustive error handling makes it difficult to have confidence in a program's robustness because Box<dyn Error> does not encode descriptions of a function's failure modes in the type system, which makes it extremely easy to overlook easily-handled errors, instead turning them into fatal problems for program functionality.

Something else that came to my attention was the question of what to do in the situation where you're writing a library whose errors are caused by errors defined in your dependencies/crates you don't control. One school of thought is to never expose the concrete type of your dependency's error type, instead preferring to return an enum variant containing simply Box<dyn Error>. If your users care about the specifics of that downstream error, they can add your dependency to their dependencies, then downcast to the concrete type in the code using your library. This way, if you ever make a semver-incompatible upgrade to that package, you don't break downstream compilations.

I think this is an incredibly bad idea, because while it doesn't break compiletime, it does break runtime. The downcast will no longer work, since types from two versions of the same crate are not the same type. This sort of behavior seems antithetical to Rust; we have a borrow checker for a reason. Specifically, if your error handling path was important, and needs to be run whenever that error occurs, this could be extremely costly in terms of data corruption or lost capital or just time spent trying to debug why the heck your code stopped working when you changed nothing (other than running cargo update, which may even be done automatically by your CI pipeline). (The alternative I propose is to simply not do this, and instead just expose the concrete type directly.)

Strongly typed errors

The³ alternative to Box<dyn Error> is to create a custom enum with a variant for each error type. For example, you might have variants like Deserialize(serde_json::Error), Http(reqwest::Error), and maybe an "unknown" variant⁴ if exhaustiveness is infeasible. If you're an API consumer, the bar for "infeasible" is as low as "I don't need to handle this anywhere so I'm not going to make a variant for it". But, as soon as you do need to handle it, you need to make a variant for it. If you're not an API consumer, you should aim to be exhaustive. There are some cases where this is unreasonable, but those situations are rare and, as such, this exception likely does not apply to you.

With error enums, knowing which errors are possible is now absolutely trivial, all you need to do is look at the variants of the enum. The compiler can also give vastly more helpful diagnostics this way, since it will be able to follow the type system around to ensure that you're handling all cases, and that you're not inventing cases that will never actually happen. No longer do you need to rely on possibly-stale manual documentation or have to read the entire call tree to determine what the failure modes are⁵.

Another advantage of using enums is allowing multiple errors of the same type to have different semantics. Maybe you need to load two files (std::io::Error), but you need to do something different based on which file failed to load. With enums, you can simply create two variants, one for each behavior. With Box<dyn Error>, this is not possible⁶.

Now, the downside: you must manually implement std::error::Error for your new custom error enum. This means Display, Debug, and all the From impls so ? is still ergonomic. Luckily, the thiserror crate provides a procedural macro that allows you derive all of those traits. When you use the #[from] attribute to generate From impls, it even correctly implements std::error::Error::source() for you! This makes acquiring detailed error messages (e.g. for logging) using nothing but (ostensibly) the standard library very easy.

Custom error types

There are some rules about creating custom error types that you should follow in order to create the best possible error messages for your users and fellow developers.

If your error has an inner error, or your error is caused by another error, you must pick exactly one of the following options for each inner error:
1. Return the inner error when Error::source() is called.
  
  With thiserror, this means using the #[from] or #[source] attributes. Generally, reach for #[from] first unless it fails to compile, and in that situation, switch to #[source]. Without a From impl, you can use Result<T, E>::map_err() to convert the inner error into your custom error type.
2. Include the inner error's message as part of your own error's.
  
  With thiserror, this means using {0} in your #[error("...")], assuming the variant is a tuple variant with the inner error stored in the zeroth tuple item.
This prevents an error message from containing needlessly duplicated information. These options are also listed in the order that you should prefer to do them, the first one being much more common. If you're not sure which to do, just pick number 1.
The human-readable part of your error message should not include the : character. What I mean by "human-readable part" is that if, for example, your error message happens to include JSON, then don't worry about it. Just don't use : in #[error("...")] strings, basically.

The reason for this is that : is commonly used to indicate causality on a single line, for example:

failed to create user: failed to execute SQL statement: invalid SQL

All three of these would be a separate concrete error type, each of which being wrapped inside an enum variant of the preceding message.
Display impls for std::error::Error implers should not start with a capital letter, except for special cases like if it begins with an initialism. For example, this looks inconsistent and gross:

failed to create user: Failed to execute SQL statement: Invalid SQL
Don't use sentence-ending punctuation in error messages. Your error may not be the last one in the chain.

Displaying error messages

Stick the following code into src/error.rs in your project and add thiserror = "1" to your dev dependencies:

#![allow(unused)]
fn main() {
use std::{
    error::Error,
    fmt::{self, Display, Formatter},
    iter,
};

/// Wraps any [`Error`][e] type so that [`Display`][d] includes its sources
///
/// # Examples
///
/// If `Foo` has a source of `Bar`, and `Bar` has a source of `Baz`, then the
/// formatted output of `Chain(&Foo)` will look like this:
///
/// ```
/// # use crate::error::Chain; // YOU WILL NEED TO CHANGE THIS
/// # use thiserror::Error;
/// # #[derive(Debug, Error)]
/// # #[error("foo")]
/// # struct Foo(#[from] Bar);
/// # #[derive(Debug, Error)]
/// # #[error("bar")]
/// # struct Bar(#[from] Baz);
/// # #[derive(Debug, Error)]
/// # #[error("baz")]
/// # struct Baz;
/// # fn try_foo() -> Result<(), Foo> { Err(Foo(Bar(Baz))) }
/// match try_foo() {
///     Ok(foo) => {
///         // Do something with foo
///         # drop(foo);
///         # unreachable!()
///     }
///     Err(e) => {
///         assert_eq!(
///             format!("foo error: {}", Chain(&e)),
///             "foo error: foo: bar: baz"
///         );
///     }
/// }
/// ```
///
/// [e]: Error
/// [d]: Display
#[derive(Debug)]
pub(crate) struct Chain<'a>(pub &'a dyn Error);

impl<'a> Display for Chain<'a> {
    fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
        write!(f, "{}", self.0)?;

        let mut source = self.0.source();

        source
            .into_iter()
            .chain(iter::from_fn(|| {
                source = source.and_then(Error::source);
                source
            }))
            .try_for_each(|source| write!(f, ": {}", source))
    }
}
}

This representation is most useful for logging, but could be easily adapted for other uses as well. Once the result of this tracking issue lands in stable, this should be even more ergonomic. Especially if they implement this person's suggestion, which is almost exactly what's written above.

A note about panicking

Panicking is not an error handling strategy, panicking is panicking. You should only resort to panicking when an illegal state has been reached, or for convenience when it's not possible to prove that this state will never occur with the type system. If there is no possible way to recover and further program execution wouldn't make any sense, is dangerous, or would invoke undefined behavior, then you can panic. If you have proven out-of-band that this state is unreachable, then you can panic. This also applies to working with Option, Result, and other "maybe a value" types: you should only unwrap (aka get the value you want and panic if it's not there) if the alternative is an illegal state. I could probably fill another blog post about specific examples of when to or when not to unwrap maybe types, so for now I'm just going to leave it at that.

Footnotes

Henceforth Box<dyn Error>. I'm eliding extra constraints like + Send + Sync + 'static since they're not strictly relevant. Also, you can/should substitute in its other analogs such as the anyhow crate.

The "API consumer" is the person who will need to handle the error. If you are writing an application, you are an API consumer. If you are writing a library (at least, the public interface), you are not an API consumer. If your codebase is large and you work with other people, you might also want to consider the people you're working with as API consumers when writing new fallible code.

An? I genuinely can't think of a third option.

⁴

If you're an API consumer, using an Unknown(Box<dyn Error>) variant straight up is fine as long as you've taken the rest of this blog post into consideration. If you're not an API consumer, this is somewhat more complicated. If your code is the one creating the error and an API consumer is expected to handle it, you may need either Unknown, Unknown(Box<dyn Error>), or some other such variant with an opaque inner type. If you're defining the contract for a function (say, with a trait, or something to do with closures), you should use either T directly (no enum) or an Other(T) variant (where T is a generic parameter) so that your API consumer can decide what to do, instead of having their hand be forced.

⁵

Like you would with a language with an explicit concept of exceptions that either don't require type annotations and/or cannot be type-annotated (*cough* Python), or poorly written Rust.

⁶

Unless you abuse the newtype pattern. But isn't the whole point of using Box<dyn Error> to not care about errors? Having to create a new type sounds like you care about errors. Also, as a new Rust user, good luck figuring out why you need to downcast to two different types that aren't the type you want to get the type you want.

Charles' Blog