Monday, June 14, 2021

Technical Debt is...

 Every time you ignore an error message and say "I'll worry about that if I see it again", you've added technical debt.

Every time you don't check a function's error return and say "We'll add that after it's all working", you've added technical debt.

Every time a function doesn't work, and you just add a comment that says "This doesn't work" instead of investigating why, you've added technical debt.

Every time you fail to test a function at all, and don't check if it's actually working, you've added technical debt.

Saw all of this tonight... ;)

A lot of people think that technical debt is a function of long term code, of systems that have become weighted down with hacks and patches. Some people think it's a deliberate choice, and therefore you can come back around and clean it up easily enough.

But you can't. It's all lost. You've no idea when that error will pop up again, but it will probably be in front of a customer when you think everything's working. The system will malfunction on Thursdays but you completely forgot that the main interface library that you used had an error return "FAIL_ITS_THURSDAY" that you didn't add a handler for -- now you have to launch a full debug process to find what's wrong (and since you didn't start troubleshooting till Friday, it'll take you a week.) You can't figure out why the system isn't processing fractals because you forgot that the fractal generate function has a tiny little comment that says "// This doesn't work, come back to it later (mb)" - hope you can find it!

Technical debt costs twice what you paid to put it into the system in the first place. And worse, it's often passed on to someone who wasn't there for the original design, so they've never heard of the library that hates Thursdays, only ones that hate Mondays. They'll need three times as long to find the issue. The customer doesn't care that you meant to come back to investigate that error code, they only care that it didn't work. And everyone at E3 was waiting to see your cool new fractal generator, not a grey box that pulses slightly.

Why would you want to deliberately pay all that? Most of the time, the developer of these debts could have fixed them in minutes. Pay the minutes, don't get so caught up in the excitement that you pay the debts later.

Wednesday, May 26, 2021

You're Doing It Wrong - AI

 AI will definitely destroy society. But not the way you think.

Disclaimer: I'm not a machine learning expert, and I know some will disagree strongly. Get your own blog, ya smarmy bastard. ;)

AI, or more correctly in most cases Machine Learning, is increasingly being used in difficult, abstract problems to give us a yes/no answer to questions even an expert would have trouble with.

There are also, as a result, countless stories of how AFTER training, it was discovered that these machines were skewed. They examined irrelevant details to come up with the answer, or they revealed biases in the dataset (which, honestly, people should have seen long before even starting.)

So machine learning, to give a very simple and mostly wrong description, is the act of wiring up a set of inputs (say, pixels of an image) to a set of outputs (say, "cat", "dog", "martian", "amoeba") through a chain of configurable evaluators. These evaluators, which are not called that by anyone with training in the field, are analogous to neurons in your brain.

The idea is, you show the inputs a picture of a cat, and tell it "cat". The machine tries a few combinations of settings and decides which one gave it "cat" most consistently. You show it a "dog" and it does the same thing, trying to remember the settings for "cat". Repeat for "martian" and "amoeba". Then repeat the whole process a couple of million times with different pictures randomly selected from the internet. The neurons slowly hone in on a collection of settings that generally produce the right output from all on those millions of inputs.

So you're done! You fed your electronic brain five million images, and it classified them with 99% accuracy! Hooray!

Now you give it a picture of a Martian it has never seen before. "Cat", it tells you confidently.

Well... um.. okay, cats have four legs and Martians only three, but, we're only 99% perfect. How about this lovely photo of an amoeba devouring a spore?

"Cat. 99.9% certainty."

"That's not a cat," you reply. 

You offer up a beautiful painting made in memorial of a lost canine friend. "Amoeba."

Frustrated, you offer up a cheezburger meme. "Cat," the AI correctly responds.

Relieved, you sit back and accidentally send it a set of twelve stop signs and one bicycle. "Martian."

So what the heck is going on?

Well, first off, you got your 5 million photos from the internet, so it was 80% cats. Thus the AI ended up with a configuration set that favors cats. It decided that abstract blobs and unrealistic strokes, much like the brush strokes in your canine painting, looked a lot like the background of slides on which amoeba were found - it didn't learn anything about amoeba themselves. And tall, thin objects were clearly Martians, since you didn't teach it about anything else that was tall and thin.

Now, machine learning, even in the primitive form we have today, has some value. In very narrow fields it's possible to give a machine enough information that the outputs start to make sense. But the problem is that these narrow field successes have led to trying over and over to apply it to broader questions - questions which are often difficult even for human experts with far more reasoning power.

There are two big problems with machine learning. The first is that in real life, you would never actually know why it made those mistakes. The neuron training sequence is relatively opaque and there are few opportunities to debug incorrect answers. It's a big opaque box even to the people who built it.

The second is data curation. When you create such large datasets, it's very hard - nearly impossible - to ensure it's a good data set. There must be NO details that you don't want the AI to look at. If you are differentiating species, then no backgrounds, no artistic details, even different lightning can be locked in as a differentiator. The AI has NO IDEA what the real world looks like, so it doesn't unconsciously filter out details like we do. To the machine EVERY detail is critical. If I give it a cat on a red background and a dog on a blue background, it is very likely to determine that all animals on a red background are cats, because that is easier to determine than the subtle shape difference. 

The dataset must also be all-encompassing. If you leave anything out, than that anything does not exist to the AI, and so providing it that anything automatically means it must be one of the other things. The brain can not choose "never seen before"... at least with most traditional training methods. At best you might get a low confidence score.

Finally, the dataset must be appropriately balanced. There may be cases where a skew is the right answer... for instance, a walking bird in the Antarctic is more likely a penguin than an emu, but if you are classifying people then you need to make sure the dataset contains a good representation in equal proportions of everyone. Sounds pretty hard, doesn't it? Yeah, that's the whole point. It's hard.

And that's a point I've made over and over again. Computing good hard like grammar. People are always looking for shortcuts, and they never work as well as expected. Not only is machine learning being seen as a huge shortcut to hard problems, but people are taking shortcuts creating the machine, and getting poor results. This shouldn't be a surprise. If you know the dataset is incomplete, why are you surprised that the machine doesn't work right? You're supposed to be smart. ;)

The real problem of all this is that people still think if a computer says it, it must be true. This is despite the daily experience with their cell phones, smart TVs, game systems and PCs all being buggy, malfunctioning pieces of crap, somehow the big mainframes at the mega-corporations (which generally don't exist anymore and the ones you are thinking of have less power than your smart watch), somehow those machines get it right.

So as machine learning continues to be used to classify people for risk, recognize people on the street, call out people for debt, etc, people are going to be negatively impacted by the poor training the machines received.

Computers are stupid. They are stupider than the stupidest person you've ever had to work with. They are stupider than your neighbor's yappy dog down the street who barks at the snow. They are stupider than those dumb ants who walk right into the ant trap over and over again. Computers do not understand the world and have no filter for what is relevant and what is not. Don't trust them to tell you what's true.

Friday, January 29, 2021

Complexity - or - You're DEFINITELY Doing it Wrong

Hey, I'm employed again! You know what that means - MORE RANTS!

Any new position is always connected to learning about new systems that you didn't see before - or in this case - that I deliberately steered away from before. And the base takeaway of the last couple weeks is "people love complexity".

From layering systems on top of Git to creating an ecosystem with an entire glossary of new terminology, some people just feel a system isn't worth doing if it isn't layered with system on top of system on top of system. Unfortunately Linux as a platform heavily endorses this approach with a huge library of easily obtainable layers.

I once made the joke that building a project under Linux is like playing an old graphical Sierra adventure game. You need to get a magic potion to save the Princess, but the witch demands you bring her an apple. The orchard can't give you an apple unless you bring them some fertilizer for the trees. The farmer has fertilizer, and he'll trade you for a new lamp for his barn. The lamp maker would love to help, but he's all out of kerosene... and so on for the duration of the quest. Much the same under Linux, just replace the quest items with the next package you need, which depends on the next package, which depends on the next package... sometimes I wonder if anyone wrote any actual code, or if they all just call each other in an infinite loop until someone accidentally reaches Linus' original 286 kernel, which does all the actual work...

So anyway, yeah, if you need to invent a glossary of terms to describe all the new concepts you are introducing to the world of computing, then you are probably not a revolutionary - you are probably over-complicating something we've all been doing for better than half a century. Do you really need 1GB of support tools to generate an HTML page?

It's one thing I loved about embedded, it hadn't reached the point of being powerful enough to support all these layers yet. But those days are rapidly ending. The Raspberry PI Pico is a $4 embedded board powerful enough to generate digital video streams by bitbanging IO. Memory and performance isn't much of a concern anymore.

But let me end on a positive note - unusual for me, I know. Some of these packages produce amazing results and even I'm glad to see them out there. But for Pete's sake, consider whether you really need to add another layer on top of those packages - what are you actually adding? Seriously, poor Pete.

... if I could add a bit from Hitchhiker's Guide to the Galaxy...

"Address the chair!"
"It's just a rock!"
"Well, call it a chair!"
"Why not call it a rock?"

Saturday, January 9, 2021

Let's talk about unit testing...

Since I've wandered back to the employment market, I've had to go through a lot of interview processes. From the very (ridiculously) large to the small, I'm basically being deluged with a slew of new acronyms that I was not deluged with a decade ago when I was last interviewing. And what that basically reinforces for me is that software development continues to be a hype-driven field, with everyone tightly embracing the latest buzzword, because obviously software used to be hard because we weren't doing it this way...

Personally, it would be nice if, instead of thinking of a cute new buzzword for something we've all known for 40 years already, people would just devote energy to writing better code. Education, practice, peer collaboration -- these create better code. Not pinning notecards around the office and telling everyone you're an Aglete now.

And why do we want to create better code? I sort of feel like this message is often lost - and without it you do have to wonder exactly WHY you are building a house of cards out of floppy disks once a week (although people don't when they have the cool buzzword of Habitatience to direct them). But the reason to create better code is so that we spend less time making the code work. It's about making software reliable, and to at least minimal degrees, predictable - and these are things that bugs are not. 

Anyway, unit testing is still pretty big, though of course the only right way to unit test is to use someone else's unit test framework, and write standalone blocks of code that run and pass the tests automatically. If you aren't using FTest, you clearly aren't testing at all.

Let me be clear up front - these little test functions are valuable, just rarely in the way that the proponents think. So let's just over-simplify first what I'm talking about.

Basically, the idea is that the developer writes test functions that can be executed one-by-one via a framework. These test functions are intended to exercise the code that has been written, and verify the results are correct. When you're done, you usually get a nice pretty report that can be framed on your wall or turned in to your teacher for extra marks. They show you did the due diligence and prove that your code works!

Or do they? Did you catch the clue? Encyclopedia Brown did.

The creator of the unit tests for a piece of code is usually the developer of that piece of code. Indeed, for some low level functions nobody else could. (Although outside of the scope of this rant, it would be very reasonable for the designer or even the test group to create high level unit tests to verify /function/... but this never happens.) Anyway, the problems with this are several:

First, the developer is testing their own understanding of what the function does. They are not necessarily testing what the function is supposed to do. Indeed, they usually write code that tests that the code they wrote does what they wrote it to do -- in essence they are testing the compiler, not the program. Modern compilers are not infallible, but they are generally good enough that we don't need to test their code generation as a general rule.

Secondly, this is a huge opportunity for a rookie trap. Novice programmers usually only test that a function does what the function is supposed to do. That is, they don't think to test if the function correctly handles bad situations, like invalid inputs. This is a huge hole and often means that half the function is unexercised -- or that the function has no error handling at all. But it will still pass the unit test.

Thirdly, this becomes a sort of a black box test. Similar to the comment above, there's no way to verify that every line of code in the function has been exercised. In fact, it's not even certain that the function behaved the way the developer intended -- only that the output, whatever it is, matched whatever criteria the unit test developer asked for. (And this can range from detailed to very, very basic, but it's still restricted only to the final output.) Correct result for a single input doesn't guarantee correct operation. There is such a thing as dumb luck!

But there is value to these tests. Because they can be (and usually are) run by automatic build scripts, they are fantastic high level validations that a code change didn't fundamentally break anything. Of course, for this to be true, unit tests need to be peer reviewed and they need to include as many cases as are necessary to test ALL paths within the function being tested.

But what about the third point? While meeting the second point more or less addresses it, there is a variable not taken into account: time. What do I mean by that? I mean that in any project large enough that the developers are using automatic build tools with unit tests, that the code is not static. It is being changed, often rapidly. That's why the automatic tools are trying to help.

However, once created by person A, person B modifying a function that already exists rarely goes looking to update the unit test -- particularly if they did not change the function's purpose. However, the unit test was created so that the inputs passed tested all code paths. Now there are new code paths. You no longer know that the unit test is testing everything.

"Well, we'll just tell people to update the unit tests," you exclaim. "Case dismissed, nice try, but that's it."

Hah, I reply. Hah. Good luck.

Look, nobody sets out to be a sloppy or lazy developer, not even many of the cases I've inferred in my rants. But people forget things, they are usually on a tight schedule, and the most heinous of all, their manager usually tells them to "worry about that later". After all, the unit test exists, so that box is checked, and there's no point spending more money on updating it after it already exists. What are we supposed to do, fill in the box? It's already checked!

So look, just assume that your automated tests are going to fall out of date until you hire a new gung-ho intern who finds it, or the original dev adds a new feature and goes to update the unit test they wrote. They are still useful as a regression test - in fact awesomely do. Having unit tests on complex code I've written has saved me a few times. But what do you do between gung-ho interns?

Even if you don't have an automated build tool or haven't got around to implementing your unit test framework yet, the developers can still perform manual unit testing. Stop grinding your teeth - it's not as bad as you think. You have Visual Studio, Eclipse, or GDB, right? Quit your whining. In my day we did unit tests by changing the screen color and we liked it.

It's actually really simple. The developer simply steps through the new code. Modern debuggers allow you to set the program counter and both observe and change variables in real time -- meaning that a developer can walk through all the possible paths of their new function in a matter of minutes without even needing to simulate the real world cases that would trip up every case. This is especially helpful when some of the cases are technically "impossible" (a programmer should never write "impossible" without quotation marks, hardware is involved). Inputs can be changed, the code can be walked through, and then the program counter can be set right back to the last branch and tried again.

It's true that this can take a while if a lot of code is written, and naturally you still need to run the real world tests (to see if it actually works, as opposed to theoretically works), but this is guaranteed to be faster than writing and testing the unit tests. Oh yeah, you missed that part, didn't you? You also have to test your unit tests actually work.

The worst unit test case I ever saw tested a full library of conversion functions by passing 0 to the base one and verifying that 0 came back out. As one might expect, 0 was a special case in this function. The other conversions actually contained off-by-one in about half the cases (and confused bits for bytes in several others - this was hardware based). But the unit test checkbox was marked, verifying that the software was correct, and more importantly, the unit test passed. It wasn't till we tried to use it that things went wrong.

So, I recommend both. Have the developer step through their code. Let's call it the Stepalicious Step. Then after it works, write unit tests as regression tests so that your build server feels like it's contributing. But make sure unit tests are considered first tier code, and go through your usual peer review phase, to avoid only checking the easy case.

"Oh yes, we do Agile, Regression testing, and Stepalicious." Oh, it's no dumber sounding than trusting your source code to a Git...