Izaroj

Making high test coverage mandatory might be harmful

We can often hear people suggest that making high test coverage mandatory will make a better product.

From what I've seen and heard, I don't agree with that idea. Here is why.

The myth of high test coverage making the product better

The idea of high test coverage making the product better is quite straightforward to get to: We test the product to ensure that the product works as expected. If we are under 100% test coverage, the product is not fully covered by tests and there may be bugs in the areas that are not covered.

This, I can agree on. What I do not agree with is the next seemingly logical leap: By having 100% test coverage, we are sure that there are no bugs.

If this last idea was true, it would be unprofessional to make anything under 100% coverage. Unfortunately, this leap in thinking has a major flaw.

What test coverage really is

Test coverage is a tool to surface what parts of the code have, or have not, been run when the tests have been executed.

We are sure that the code that has not been covered by tests is not tested. In that sense, yes, there might be bugs in this code.

The problem is that there might as well be bugs in the code that is covered by tests. Knowing that the code has been run during the tests is not enough to know that it's bug free.

To illustrate the point, I will take an extreme example.

An extreme example

Imagine that we are tasked to enforce the quality of the product. After reading a bit on the topic, we fall in the trap of thinking that test coverage is a good metric for this. So we enforce a test coverage of 95%, otherwise the build break and the teams are blocked.

And we don't want to hear them complain. We already give them the lower limit we find acceptable: 95%. We are not enforcing our goal of 100%.

To face this arbitrary limit, the team now has to reach this coverage to be able to deliver its work. So they do unit tests for every function to cover every line of code and every branch. But they do not check what the function returns.

The problem is that, even if the coverage is now over 95%, the product might still not work as expected. All the code has been run, but the tests are useless.

This is an extreme example. We can argue that this is not something that would happen, or that the team doing this should be fired.

A more probable example

Most of the time, the teams would not try to game the system as much and would try to do their best to actually test the product.

Imagine the same scenario, but the team checks what is returned by every function in the tests. This still does not guarantee that the product works as expected.

If we do that, the only thing we know is that all functions do what they are expected to do. Nothing guarantees that they do what they are expected to do when plugged together. But still, the coverage is 100%.

Even with integration tests, there might be branches that are run but not tested. Without it being a problem in itself, because the test might really test what it is intended for. There might just be code that is inevitably run and is not what is intended to be tested. Nevertheless, even if not tested, the code is marked as run and increase the coverage. If this code is not tested by another test, it makes part of the code not tested but marked as covered.

The point is: Test coverage only tells us if code has been run, not that it works as intended.

So isn't test coverage useless?

I actually think that test coverage is a great tool. But I also think that it's a really poor metric.

As a tool, it gives the information if parts of the code have not been run during tests. This gives us the opportunity to question if there are missing tests or not, and add them if necessary.

But as a metric, I think that it gives a counterproductive incentive. It can push us to write some tests more poorly, just to meet the threshold when in a rush to deliver.

And poor tests are worse than no test at all. They make the test coverage useless as a tool, making it good only as a poorly chosen metric: Because there are poor tests, we do not even know what parts of the product are not covered anymore.

Thoughts? Leave a comment