Using verifiers in tests considered harmful

Don't test X when you care about Y.


A verifier is something that allows the inspection of (parts of) the call stack of a function-under-test, to assert things like "X function was called Y times", "X function was called with Y and Z arguments (and returned W)", and so on. They are generally combined with mocks, but can be used without them.

For example, let's say a test needs to be written for a function that does the equivalent of mkdir -p foo/bar/baz. Someone using verifiers would first need to mock the language's standard library's "create directory" function, to avoid the side effect of actually creating directories. Then, they might assert that the "create directory" function was called 3 times, perhaps with the expected arguments.

Later on, it turns out that creating directories is a major bottleneck in the application for whatever reason, and so this logic needs to be rewritten using io_uring. This probably means pulling in a new dependency that provides its own io_uring-aware "create directory" function. Once the refactoring is completed, the test written above is now failing even though it still has the exact same behavior. Unfortunately, the mock and verifier applied in the original test are not aware of this new way of creating directories, and so the test fails because it was reliant on implementation details of the original code.

This situation is both frustrating and avoidable. I don't have any advice for handling the frustration, but I do have advice for avoiding inflicting it: instead of testing that the language's standard library's "create directory" function was called 3 times (mocked or not), one should instead run the function-under-test without mocking away its side effects and then read back the directories on disk and assert that they were created as desired. This way, the exact methodology for creating these side effects is free to change as needed, and the tests will only and always pass as long as the desired observable behavior is maintained.

The important differences between these two ways of writing this test are that the verifiers method:

  • becomes a false-negative and a maintenance burden when the implementation changes
  • does not guarantee that the function-under-test causes the desired (side) effect

While the other method:

  • is resilient to changes in the implementation details
  • does guarantee that the function-under-test causes the desired (side) effect