On tests and testing

Hello, gentle reader. Please allow me to rock your entire human face by dropping this truth bomb:

Tests are about more than testing.

“Obviously!” you yell, your riposte ringing out triumphantly against what is seemingly a rather anemic statement. The truth bomb has failed to detonate. Your entire human face remains unrocked.

If you’ve ever practiced TDD you know that tests are a great context to write constraints, requirements, and other bits and bobs that help you drive the design of your production code.

If you’ve ever had to make a change in a legacy system, you know that a test can represent a safety harness, allowing you to cordon off a section of code and work within it reasonably comfortably, safe from misstep.

If you’ve ever dabbled in BDD you know that tests can represent full business requirements, human-readable and robust in their description, allowing you to better communicate with stakeholders and shorten time to validation.

These are all solid examples of tests whose utility extends beyond simple “testing”, and many people use tests in this way.

But! Have you ever considered… that how you structure your tests could impact production code habitability?

The journey to mistrusting mocks

I first encountered a facet of this gem a few years ago when I read that using mocks in tests is a smell. It was an article by Arlo Belshee and I would link to it but for this particular argument, it’s more noise than signal. I rejected the idea because obviously mocks are important (they aren’t) and Arlo was obviously being intentionally provocative (maybe although he had a point) but the idea stuck with me, occasionally poking its head out of my subconscious and asking “why the mock there, chief?”

The next facet I encountered was an article by Uncle Bob[1] called TDD Harms Architecture, which I recommend you read in its entirety. The crux is that some developers will default to a 1-to-1 mapping (in Object Oriented space) of class to test. One class file, one test file, with the name of one directly mappable to the name of the other.

“What’s wrong with that?” you ask, your face slightly more rocked than it was previously. Nothing. You won’t have to go to confession (or whatever brand of absolution-seeking ritual is your go-to) and declare your tightly coupled test structure to object structure. “Three Hail Mary’s and do the Gilded Rose kata in Clojure.”

But there are … “reactions” that need to be taken into account. I put “reactions” in quotes because I was originally going to use “consequences”, but that word felt too judgey and that’s not what I’m going for here. The primary reaction is that your tests will likely be infused with mocks.

Which may sound innocuous, but…

The actual problem with mocks

When testing at the unit level and using mocks, you’re essentially imbuing a unit of code with some level of importance or permanence. Unless your team has established a very clear norm of impermanence and granted everyone explicit permission to delete tests/units as necessary, the simple presence of that unit test creates some discomfort around altering or eliminating the unit[2]. additionally, you’re establishing that This Unit is dependent on This Other Unit (which is provided as a mock). You’re codifying a fake constraint.

“‘Fake constraint’ is both a loaded term and a great band name!” you retort with a shaky voice. That human face of yours is starting to look somewhat rocked. You’re right, of course, but it doesn’t invalidate my point. The test has introduced the idea of the dependency between two units of code. If someone changes the relationship of those two units, they will have test failures.

I’ll say that again: anyone who modifies the relationship between two units will encounter test failures.

And that’s the actual problem. Any non-trivial refactoring is going to break tests, and not in a way that’s usually super easy to deal with (depending on stack, mocking framework, etc.). Some of our Java tests using Mockito have some really twisty stuff around expectations and verifications, and understanding those things is often difficult enough without moving dependencies around as part of a refactor.

Let’s say you have a super busy class with five or six dependencies (because “legacy”) and you recognize that it also has two very clear responsibilities – you can separate this class into two classes! That’s great! … except first you have to look through the class’s test file and figure out the intent behind the mock interactions, which might not be straightforward, then break your test file into two separate test files to mimic the 1-to-1 testing structure, discern which tests reflect which responsibilities (with potential decoupling there too), and …

That is a lot of overhead. I disengaged just writing that paragraph – how can I be expected to remain focused long enough to actually refactor that nonsense?

How often do I pass up a refactoring opportunity due to that complexity? A scarier question: how often do I do that without noticing I’m doing it?

The final facet

After reading Uncle Bob’s article above I massaged my own testing strategy toward more of an integration-heavy approach. In our Spring middleware application, for example, I’ll use MockMvc to call into our real REST endpoints under test, which are wired through a real service layer (among other concerns) all the way to a real broker that calls out to another REST dependency. In the test, the external REST service is mocked with MockRestServiceServer, and those are the only two mocks. One representing the entity calling into our app and one representing the external dependency. Everything in between is the Real Deal.

Our integration tests represent vertical slices of our application, such as “I want to create an account” or “I want to change my password”.

This arrangement has given us immense freedom in refactoring, and has lead to some of the cleanest code I’ve written in years. I always feel good walking away from these integration-style tests.

And that was where I stopped. I evangelized the approach within the team and the local development ecosystem but I never really voiced my opinion more broadly.

The article that prodded me into being more vocal was written by Erik Dietrich late in 2017, and its title captured my attention at the time. The short version is that he chose 100 fairly active GitHub repositories and did some analysis. The higher the coverage metrics were for a codebase, the more that habitability metrics like “cyclomatic complexity” and “method length” increased.

Paraphrasing, there is a negative correlation between number of unit tests and code habitability.

When I read his findings (with a grain of salt, as it was one study with a small sample size) it didn’t take much introspection to find that, yes, I was hesitant to refactor code in heavily mocked codebases because the barrier to change was way out of proportion to the benefits received at the micro level. If a teeny-tiny (yet non-trivial) refactor takes so much effort, why would I make small adjustments? And by the time the adjustment size offsets the cost of dealing with the mocks, who could justify that kind of investment from a business perspective?

And with that unconscious bias, thus gradually went habitability.

Our biggest offenders were our JavaScript codebases, because the community we were working with valued locked down implementations that were 100% covered and (in some cases) mutation tested. Scores of mocks. If you didn’t get it right the first time the cost of refactoring was so high that teams typically just wrote a new thing from scratch.

Brighter, less coupled future

But it’s not a bleak picture. Here are my takeaways:

  1. Test structure impacts how comfortable you are refactoring code, and your comfort in refactoring impacts how habitable your code is and will continue to be.
  2. If you’re using too many mocks for internal dependencies you should evaluate whether you’re testing yourself into a locked down implementation
  3. If your team is hesitant to ditch a 1-to-1 production/test structure, listen to their concerns. If they’re worried about coverage[3], look into adding a coverage analysis tool to your build pipeline. If they’re worried about easily finding tests that correspond with a given class, look at establishing a vertical code organization scheme[4] and making it clear which tests impact which verticals.
  4. Just try one integration-level test, then take an hour and refactor the production code, undo, refactor a different way, undo, etc. — just to experience the freedom.

What about the unit tests I’ve already written? Well, if they’re useful, I keep them until they break. If they’re broken, I delete them and focus on integration-level testing. If I feel that a particular unit is useful enough and stable enough to warrant injecting it as a dependency everywhere, it still might not be worth keeping it around. On the rare event that its contract changes I’ll have to manually update the tests everywhere I mocked it. If I avoid the mock and use the object directly (especially if its usage cross-cuts several verticals) and its contract changes, I’ll have plenty of Real Legitimate Test Failures to guide me toward a habitable resolution.

So, get out there and experiment. Figure out what works for you in your context. And you might want to get your human face looked at by a face doctor, as it seems quite thoroughly rocked.


Oh, and here’s Arlo’s article. Click around, he’s got some great stuff.

  1. After I wrote an initial draft and shopped it around I saw a spate of tweets between Robert Martin (Uncle Bob) and some others in the software development community about the term “craftsman”, a word that Uncle Bob clings to and states is “gender neutral”. Given how abysmal our industry is at representing and empowering marginalized persons I would have hoped that Martin would recognize that he doesn’t control how the word is received and that it may be time for a more inclusive term. I apologize to those who would rather not see sources from Martin and I’m working to discourage people from using “craftsmanship” to describe what we do. That said, the linked article was one that helped me see that 1-to-1 mapping of tests to source files can be extraordinarily harmful to long-term codebase health. ↩︎
  2. Our Director of Software Engineering and I have been having an intermittent conversation about integration-vs-unit, and even he, with his years of experience and explicit TDD training in which the instructor said it’s okay and even expected to delete tests, still feels uncomfortable deleting tests. The struggle is real. ↩︎
  3. I have been scolded before in code reviews because the new class I created didn’t have a corresponding test file, thus it must not be tested. Code coverage metrics can show that those lines of production code are indeed covered by tests, which may allay that specific fear. ↩︎
  4. Vertical code organization is all about feature-specific grouping instead of layer-specific grouping. For example, in my web-based middleware application we might have a /controllers directory or a /services directory. A better organization might involve /addUser or /forgotPassword directories, each with controllers and services as necessary. ↩︎