Saturday, January 9, 2021

Let's talk about unit testing...

Since I've wandered back to the employment market, I've had to go through a lot of interview processes. From the very (ridiculously) large to the small, I'm basically being deluged with a slew of new acronyms that I was not deluged with a decade ago when I was last interviewing. And what that basically reinforces for me is that software development continues to be a hype-driven field, with everyone tightly embracing the latest buzzword, because obviously software used to be hard because we weren't doing it this way...

Personally, it would be nice if, instead of thinking of a cute new buzzword for something we've all known for 40 years already, people would just devote energy to writing better code. Education, practice, peer collaboration -- these create better code. Not pinning notecards around the office and telling everyone you're an Aglete now.

And why do we want to create better code? I sort of feel like this message is often lost - and without it you do have to wonder exactly WHY you are building a house of cards out of floppy disks once a week (although people don't when they have the cool buzzword of Habitatience to direct them). But the reason to create better code is so that we spend less time making the code work. It's about making software reliable, and to at least minimal degrees, predictable - and these are things that bugs are not. 

Anyway, unit testing is still pretty big, though of course the only right way to unit test is to use someone else's unit test framework, and write standalone blocks of code that run and pass the tests automatically. If you aren't using FTest, you clearly aren't testing at all.

Let me be clear up front - these little test functions are valuable, just rarely in the way that the proponents think. So let's just over-simplify first what I'm talking about.

Basically, the idea is that the developer writes test functions that can be executed one-by-one via a framework. These test functions are intended to exercise the code that has been written, and verify the results are correct. When you're done, you usually get a nice pretty report that can be framed on your wall or turned in to your teacher for extra marks. They show you did the due diligence and prove that your code works!

Or do they? Did you catch the clue? Encyclopedia Brown did.

The creator of the unit tests for a piece of code is usually the developer of that piece of code. Indeed, for some low level functions nobody else could. (Although outside of the scope of this rant, it would be very reasonable for the designer or even the test group to create high level unit tests to verify /function/... but this never happens.) Anyway, the problems with this are several:

First, the developer is testing their own understanding of what the function does. They are not necessarily testing what the function is supposed to do. Indeed, they usually write code that tests that the code they wrote does what they wrote it to do -- in essence they are testing the compiler, not the program. Modern compilers are not infallible, but they are generally good enough that we don't need to test their code generation as a general rule.

Secondly, this is a huge opportunity for a rookie trap. Novice programmers usually only test that a function does what the function is supposed to do. That is, they don't think to test if the function correctly handles bad situations, like invalid inputs. This is a huge hole and often means that half the function is unexercised -- or that the function has no error handling at all. But it will still pass the unit test.

Thirdly, this becomes a sort of a black box test. Similar to the comment above, there's no way to verify that every line of code in the function has been exercised. In fact, it's not even certain that the function behaved the way the developer intended -- only that the output, whatever it is, matched whatever criteria the unit test developer asked for. (And this can range from detailed to very, very basic, but it's still restricted only to the final output.) Correct result for a single input doesn't guarantee correct operation. There is such a thing as dumb luck!

But there is value to these tests. Because they can be (and usually are) run by automatic build scripts, they are fantastic high level validations that a code change didn't fundamentally break anything. Of course, for this to be true, unit tests need to be peer reviewed and they need to include as many cases as are necessary to test ALL paths within the function being tested.

But what about the third point? While meeting the second point more or less addresses it, there is a variable not taken into account: time. What do I mean by that? I mean that in any project large enough that the developers are using automatic build tools with unit tests, that the code is not static. It is being changed, often rapidly. That's why the automatic tools are trying to help.

However, once created by person A, person B modifying a function that already exists rarely goes looking to update the unit test -- particularly if they did not change the function's purpose. However, the unit test was created so that the inputs passed tested all code paths. Now there are new code paths. You no longer know that the unit test is testing everything.

"Well, we'll just tell people to update the unit tests," you exclaim. "Case dismissed, nice try, but that's it."

Hah, I reply. Hah. Good luck.

Look, nobody sets out to be a sloppy or lazy developer, not even many of the cases I've inferred in my rants. But people forget things, they are usually on a tight schedule, and the most heinous of all, their manager usually tells them to "worry about that later". After all, the unit test exists, so that box is checked, and there's no point spending more money on updating it after it already exists. What are we supposed to do, fill in the box? It's already checked!

So look, just assume that your automated tests are going to fall out of date until you hire a new gung-ho intern who finds it, or the original dev adds a new feature and goes to update the unit test they wrote. They are still useful as a regression test - in fact awesomely do. Having unit tests on complex code I've written has saved me a few times. But what do you do between gung-ho interns?

Even if you don't have an automated build tool or haven't got around to implementing your unit test framework yet, the developers can still perform manual unit testing. Stop grinding your teeth - it's not as bad as you think. You have Visual Studio, Eclipse, or GDB, right? Quit your whining. In my day we did unit tests by changing the screen color and we liked it.

It's actually really simple. The developer simply steps through the new code. Modern debuggers allow you to set the program counter and both observe and change variables in real time -- meaning that a developer can walk through all the possible paths of their new function in a matter of minutes without even needing to simulate the real world cases that would trip up every case. This is especially helpful when some of the cases are technically "impossible" (a programmer should never write "impossible" without quotation marks, hardware is involved). Inputs can be changed, the code can be walked through, and then the program counter can be set right back to the last branch and tried again.

It's true that this can take a while if a lot of code is written, and naturally you still need to run the real world tests (to see if it actually works, as opposed to theoretically works), but this is guaranteed to be faster than writing and testing the unit tests. Oh yeah, you missed that part, didn't you? You also have to test your unit tests actually work.

The worst unit test case I ever saw tested a full library of conversion functions by passing 0 to the base one and verifying that 0 came back out. As one might expect, 0 was a special case in this function. The other conversions actually contained off-by-one in about half the cases (and confused bits for bytes in several others - this was hardware based). But the unit test checkbox was marked, verifying that the software was correct, and more importantly, the unit test passed. It wasn't till we tried to use it that things went wrong.

So, I recommend both. Have the developer step through their code. Let's call it the Stepalicious Step. Then after it works, write unit tests as regression tests so that your build server feels like it's contributing. But make sure unit tests are considered first tier code, and go through your usual peer review phase, to avoid only checking the easy case.

"Oh yes, we do Agile, Regression testing, and Stepalicious." Oh, it's no dumber sounding than trusting your source code to a Git...