Tuesday, September 7, 2021

Finally Breaking Through Is Always Nice!

I've been fighting a small problem for the last two days, and I finally just figured it out. It's such a relief to finally have it make sense that I decided to write it up before I even coded the fix, which it was still fresh. It was a small issue, but in a very large project, which made debugging difficult.

Back in the beginning of this whole pandemic I published version 2 of my VGM compression toolset, this one much more flexible and able to import many more formats. Last month someone finally used it. Yes, retro development is sure rewarding! Anyway...

This guy ported all the tunes from the NES version of Smurfs over to the TI, and released individual programs with my Quickplayer, which wraps a tune with a standalone player and lets you add some text. It was sweet, but awkward... and since he asked if anyone could make a cartridge, I wrote a tool that would take a folder full of program images, and build a loader cart with a menu.

This was a good start, but I decided that I also wanted to take all the visualizers I've done over the years (four of them), and make them available in the Quickplayer too. I adapted all of them to fit in the 8k low memory block on the TI, and ported them to Coleco as well. Then I coded a couple of flags that could be externally set that could be used to find a selection program for chaining (with random play as my intent.) It took some fiddling, but that all came together eventually.

So before diving into the issues that the random player gave me, let's recap...

First, we have the music player libraries themselves.

They are wrapped around a song in a standalone player program -- there are FIVE player programs (four with visualizers, and one just text like I originally released). Plus the program that does the actual wrapping and generates the final code.

Then we have a tool that packages these programs up, and auto-generates a loader program for them (so, the tool, and the loader program).

And we have a menu that selects which loader will be executed. So we're up to 9 programs, and then I wrote the randomizer.

I expected this part to be easy, because I had carefully compartmentalized everything. To explain a bit, the TI file system loads program files in individual 8kb chunks. In addition, the TI's memory map splits the RAM - an 8k block at 0x2000 is where I store the player, and a 24k block at 0xA000 gets the music data. So, the player was always in a separate copy block to the data.

The autogenerated code thus always looks a lot like this:

>  60B6  0200  li   R0,>6004 Ball
   60BA  0201  li   R1,>6000              
   60BE  0202  li   R2,>2000              
   60C2  0203  li   R3,>0cb4              
   60C6  06A0  bl   @>619c      (Copy data)          

   60CA  0200  li   R0,>6006    Data      
   60CE  0201  li   R1,>6000              
   60D2  0202  li   R2,>a000              
   60D6  0203  li   R3,>058a              
   60DA  06A0  bl   @>619c                

   60DE  06A0  bl   @>600c      Trampoline to start          
   60E2  2000  data >2000 Start address

There's a bit of an extra wrinkle in there. The TI cartridge memory space is only 8k, so we're also paging 8k banks. So, R0 is loaded with a page reference, R1 is a source address, R2 is a destination address, and R3 is the number of words to copy. 619C is a subroutine that banks the page in and does the copy. The first destination address is also the start address, and so it's also stored in the data at the end.

This all worked pretty well, so when I did the randomizer, I decided to have it pull the copy data for the first part, which is always the program, do the copy itself, and then randomly jump to the data copy for a different one. This would allow the music and visualizers to mix and match, so to speak.

It tended to work... sometimes. It was very random. Of course, I did introduce two random numbers. I was able to try hard coded values through the debugger and generate combinations that worked and combinations that didn't.

If you have already worked out why, damn you. You should have told me yesterday and saved me some time! ;)

If not, my next clue was that when it didn't work, it was jumping to incorrect addresses - addresses that tended to correlate with different programs than the one that appeared to be loaded in RAM. But out of all five player programs, I only used two start addresses, >2000 and >2100. So there was always a chance of it working.

A sensible fellow would have sorted it at this point, but it took me just a little longer!

I took a lot of notes about start and load addresses, observed patterns, stepped through the code, and probably the fifth time I stepped through a broken start it suddenly clicked.

This is your last chance to tell me what it was! ;)

So yeah. The program is copied in the FIRST block, and the FIRST block contains the start address. JUMPING to the data copy in the last block means that we would use the start address contained at the end there -- which may or may not be for the code we actually loaded, since we are playing mix and match.

To clarify, here's the loader for Piano:

   616E  0200  li   R0,>600e   Piano <-- no page increment 
   6172  0201  li   R1,>6b0e              
   6176  0202  li   R2,>2100              
   617A  0203  li   R3,>0883              
   617E  06A0  bl   @>619c                

   6182  0200  li   R0,>6010   Data         
   6186  0201  li   R1,>6000              
   618A  0202  li   R2,>a000              
   618E  0203  li   R3,>058a              
   6192  06A0  bl   @>619c                

   6196  06A0  bl   @>600c     Tramp      
   619A  2100  data >2100      Start address

Note the different start address!! So, if we copy the Ball program, then jump to the data copy for Piano, we'll end up starting in the middle of the Piano code!

Of course.. the "right" way to do this wouldn't be hacking around your own code... you'd store the player programs in one place, and the music in another... but I wanted to be able to work with programs anyone generated -- plus I expected it to be faster to do. ;) (It probably wasn't!)

The lesson is - beware of red herrings. Validate your assumptions. And for goodness sake, pay attention when you're doing hacky things like running your own code out of order, even if it was on purpose!

Monday, June 14, 2021

Technical Debt is...

 Every time you ignore an error message and say "I'll worry about that if I see it again", you've added technical debt.

Every time you don't check a function's error return and say "We'll add that after it's all working", you've added technical debt.

Every time a function doesn't work, and you just add a comment that says "This doesn't work" instead of investigating why, you've added technical debt.

Every time you fail to test a function at all, and don't check if it's actually working, you've added technical debt.

Saw all of this tonight... ;)

A lot of people think that technical debt is a function of long term code, of systems that have become weighted down with hacks and patches. Some people think it's a deliberate choice, and therefore you can come back around and clean it up easily enough.

But you can't. It's all lost. You've no idea when that error will pop up again, but it will probably be in front of a customer when you think everything's working. The system will malfunction on Thursdays but you completely forgot that the main interface library that you used had an error return "FAIL_ITS_THURSDAY" that you didn't add a handler for -- now you have to launch a full debug process to find what's wrong (and since you didn't start troubleshooting till Friday, it'll take you a week.) You can't figure out why the system isn't processing fractals because you forgot that the fractal generate function has a tiny little comment that says "// This doesn't work, come back to it later (mb)" - hope you can find it!

Technical debt costs twice what you paid to put it into the system in the first place. And worse, it's often passed on to someone who wasn't there for the original design, so they've never heard of the library that hates Thursdays, only ones that hate Mondays. They'll need three times as long to find the issue. The customer doesn't care that you meant to come back to investigate that error code, they only care that it didn't work. And everyone at E3 was waiting to see your cool new fractal generator, not a grey box that pulses slightly.

Why would you want to deliberately pay all that? Most of the time, the developer of these debts could have fixed them in minutes. Pay the minutes, don't get so caught up in the excitement that you pay the debts later.



Wednesday, May 26, 2021

You're Doing It Wrong - AI

 AI will definitely destroy society. But not the way you think.

Disclaimer: I'm not a machine learning expert, and I know some will disagree strongly. Get your own blog, ya smarmy bastard. ;)

AI, or more correctly in most cases Machine Learning, is increasingly being used in difficult, abstract problems to give us a yes/no answer to questions even an expert would have trouble with.

There are also, as a result, countless stories of how AFTER training, it was discovered that these machines were skewed. They examined irrelevant details to come up with the answer, or they revealed biases in the dataset (which, honestly, people should have seen long before even starting.)

So machine learning, to give a very simple and mostly wrong description, is the act of wiring up a set of inputs (say, pixels of an image) to a set of outputs (say, "cat", "dog", "martian", "amoeba") through a chain of configurable evaluators. These evaluators, which are not called that by anyone with training in the field, are analogous to neurons in your brain.

The idea is, you show the inputs a picture of a cat, and tell it "cat". The machine tries a few combinations of settings and decides which one gave it "cat" most consistently. You show it a "dog" and it does the same thing, trying to remember the settings for "cat". Repeat for "martian" and "amoeba". Then repeat the whole process a couple of million times with different pictures randomly selected from the internet. The neurons slowly hone in on a collection of settings that generally produce the right output from all on those millions of inputs.

So you're done! You fed your electronic brain five million images, and it classified them with 99% accuracy! Hooray!

Now you give it a picture of a Martian it has never seen before. "Cat", it tells you confidently.

Well... um.. okay, cats have four legs and Martians only three, but, we're only 99% perfect. How about this lovely photo of an amoeba devouring a spore?

"Cat. 99.9% certainty."

"That's not a cat," you reply. 

You offer up a beautiful painting made in memorial of a lost canine friend. "Amoeba."

Frustrated, you offer up a cheezburger meme. "Cat," the AI correctly responds.

Relieved, you sit back and accidentally send it a set of twelve stop signs and one bicycle. "Martian."

So what the heck is going on?

Well, first off, you got your 5 million photos from the internet, so it was 80% cats. Thus the AI ended up with a configuration set that favors cats. It decided that abstract blobs and unrealistic strokes, much like the brush strokes in your canine painting, looked a lot like the background of slides on which amoeba were found - it didn't learn anything about amoeba themselves. And tall, thin objects were clearly Martians, since you didn't teach it about anything else that was tall and thin.

Now, machine learning, even in the primitive form we have today, has some value. In very narrow fields it's possible to give a machine enough information that the outputs start to make sense. But the problem is that these narrow field successes have led to trying over and over to apply it to broader questions - questions which are often difficult even for human experts with far more reasoning power.

There are two big problems with machine learning. The first is that in real life, you would never actually know why it made those mistakes. The neuron training sequence is relatively opaque and there are few opportunities to debug incorrect answers. It's a big opaque box even to the people who built it.

The second is data curation. When you create such large datasets, it's very hard - nearly impossible - to ensure it's a good data set. There must be NO details that you don't want the AI to look at. If you are differentiating species, then no backgrounds, no artistic details, even different lightning can be locked in as a differentiator. The AI has NO IDEA what the real world looks like, so it doesn't unconsciously filter out details like we do. To the machine EVERY detail is critical. If I give it a cat on a red background and a dog on a blue background, it is very likely to determine that all animals on a red background are cats, because that is easier to determine than the subtle shape difference. 

The dataset must also be all-encompassing. If you leave anything out, than that anything does not exist to the AI, and so providing it that anything automatically means it must be one of the other things. The brain can not choose "never seen before"... at least with most traditional training methods. At best you might get a low confidence score.

Finally, the dataset must be appropriately balanced. There may be cases where a skew is the right answer... for instance, a walking bird in the Antarctic is more likely a penguin than an emu, but if you are classifying people then you need to make sure the dataset contains a good representation in equal proportions of everyone. Sounds pretty hard, doesn't it? Yeah, that's the whole point. It's hard.

And that's a point I've made over and over again. Computing good hard like grammar. People are always looking for shortcuts, and they never work as well as expected. Not only is machine learning being seen as a huge shortcut to hard problems, but people are taking shortcuts creating the machine, and getting poor results. This shouldn't be a surprise. If you know the dataset is incomplete, why are you surprised that the machine doesn't work right? You're supposed to be smart. ;)

The real problem of all this is that people still think if a computer says it, it must be true. This is despite the daily experience with their cell phones, smart TVs, game systems and PCs all being buggy, malfunctioning pieces of crap, somehow the big mainframes at the mega-corporations (which generally don't exist anymore and the ones you are thinking of have less power than your smart watch), somehow those machines get it right.

So as machine learning continues to be used to classify people for risk, recognize people on the street, call out people for debt, etc, people are going to be negatively impacted by the poor training the machines received.

Computers are stupid. They are stupider than the stupidest person you've ever had to work with. They are stupider than your neighbor's yappy dog down the street who barks at the snow. They are stupider than those dumb ants who walk right into the ant trap over and over again. Computers do not understand the world and have no filter for what is relevant and what is not. Don't trust them to tell you what's true.


Friday, January 29, 2021

Complexity - or - You're DEFINITELY Doing it Wrong

Hey, I'm employed again! You know what that means - MORE RANTS!

Any new position is always connected to learning about new systems that you didn't see before - or in this case - that I deliberately steered away from before. And the base takeaway of the last couple weeks is "people love complexity".

From layering systems on top of Git to creating an ecosystem with an entire glossary of new terminology, some people just feel a system isn't worth doing if it isn't layered with system on top of system on top of system. Unfortunately Linux as a platform heavily endorses this approach with a huge library of easily obtainable layers.

I once made the joke that building a project under Linux is like playing an old graphical Sierra adventure game. You need to get a magic potion to save the Princess, but the witch demands you bring her an apple. The orchard can't give you an apple unless you bring them some fertilizer for the trees. The farmer has fertilizer, and he'll trade you for a new lamp for his barn. The lamp maker would love to help, but he's all out of kerosene... and so on for the duration of the quest. Much the same under Linux, just replace the quest items with the next package you need, which depends on the next package, which depends on the next package... sometimes I wonder if anyone wrote any actual code, or if they all just call each other in an infinite loop until someone accidentally reaches Linus' original 286 kernel, which does all the actual work...

So anyway, yeah, if you need to invent a glossary of terms to describe all the new concepts you are introducing to the world of computing, then you are probably not a revolutionary - you are probably over-complicating something we've all been doing for better than half a century. Do you really need 1GB of support tools to generate an HTML page?

It's one thing I loved about embedded, it hadn't reached the point of being powerful enough to support all these layers yet. But those days are rapidly ending. The Raspberry PI Pico is a $4 embedded board powerful enough to generate digital video streams by bitbanging IO. Memory and performance isn't much of a concern anymore.

But let me end on a positive note - unusual for me, I know. Some of these packages produce amazing results and even I'm glad to see them out there. But for Pete's sake, consider whether you really need to add another layer on top of those packages - what are you actually adding? Seriously, poor Pete.

... if I could add a bit from Hitchhiker's Guide to the Galaxy...

"Address the chair!"
"It's just a rock!"
"Well, call it a chair!"
"Why not call it a rock?"


Saturday, January 9, 2021

Let's talk about unit testing...

Since I've wandered back to the employment market, I've had to go through a lot of interview processes. From the very (ridiculously) large to the small, I'm basically being deluged with a slew of new acronyms that I was not deluged with a decade ago when I was last interviewing. And what that basically reinforces for me is that software development continues to be a hype-driven field, with everyone tightly embracing the latest buzzword, because obviously software used to be hard because we weren't doing it this way...

Personally, it would be nice if, instead of thinking of a cute new buzzword for something we've all known for 40 years already, people would just devote energy to writing better code. Education, practice, peer collaboration -- these create better code. Not pinning notecards around the office and telling everyone you're an Aglete now.

And why do we want to create better code? I sort of feel like this message is often lost - and without it you do have to wonder exactly WHY you are building a house of cards out of floppy disks once a week (although people don't when they have the cool buzzword of Habitatience to direct them). But the reason to create better code is so that we spend less time making the code work. It's about making software reliable, and to at least minimal degrees, predictable - and these are things that bugs are not. 

Anyway, unit testing is still pretty big, though of course the only right way to unit test is to use someone else's unit test framework, and write standalone blocks of code that run and pass the tests automatically. If you aren't using FTest, you clearly aren't testing at all.

Let me be clear up front - these little test functions are valuable, just rarely in the way that the proponents think. So let's just over-simplify first what I'm talking about.

Basically, the idea is that the developer writes test functions that can be executed one-by-one via a framework. These test functions are intended to exercise the code that has been written, and verify the results are correct. When you're done, you usually get a nice pretty report that can be framed on your wall or turned in to your teacher for extra marks. They show you did the due diligence and prove that your code works!

Or do they? Did you catch the clue? Encyclopedia Brown did.

The creator of the unit tests for a piece of code is usually the developer of that piece of code. Indeed, for some low level functions nobody else could. (Although outside of the scope of this rant, it would be very reasonable for the designer or even the test group to create high level unit tests to verify /function/... but this never happens.) Anyway, the problems with this are several:

First, the developer is testing their own understanding of what the function does. They are not necessarily testing what the function is supposed to do. Indeed, they usually write code that tests that the code they wrote does what they wrote it to do -- in essence they are testing the compiler, not the program. Modern compilers are not infallible, but they are generally good enough that we don't need to test their code generation as a general rule.

Secondly, this is a huge opportunity for a rookie trap. Novice programmers usually only test that a function does what the function is supposed to do. That is, they don't think to test if the function correctly handles bad situations, like invalid inputs. This is a huge hole and often means that half the function is unexercised -- or that the function has no error handling at all. But it will still pass the unit test.

Thirdly, this becomes a sort of a black box test. Similar to the comment above, there's no way to verify that every line of code in the function has been exercised. In fact, it's not even certain that the function behaved the way the developer intended -- only that the output, whatever it is, matched whatever criteria the unit test developer asked for. (And this can range from detailed to very, very basic, but it's still restricted only to the final output.) Correct result for a single input doesn't guarantee correct operation. There is such a thing as dumb luck!

But there is value to these tests. Because they can be (and usually are) run by automatic build scripts, they are fantastic high level validations that a code change didn't fundamentally break anything. Of course, for this to be true, unit tests need to be peer reviewed and they need to include as many cases as are necessary to test ALL paths within the function being tested.

But what about the third point? While meeting the second point more or less addresses it, there is a variable not taken into account: time. What do I mean by that? I mean that in any project large enough that the developers are using automatic build tools with unit tests, that the code is not static. It is being changed, often rapidly. That's why the automatic tools are trying to help.

However, once created by person A, person B modifying a function that already exists rarely goes looking to update the unit test -- particularly if they did not change the function's purpose. However, the unit test was created so that the inputs passed tested all code paths. Now there are new code paths. You no longer know that the unit test is testing everything.

"Well, we'll just tell people to update the unit tests," you exclaim. "Case dismissed, nice try, but that's it."

Hah, I reply. Hah. Good luck.

Look, nobody sets out to be a sloppy or lazy developer, not even many of the cases I've inferred in my rants. But people forget things, they are usually on a tight schedule, and the most heinous of all, their manager usually tells them to "worry about that later". After all, the unit test exists, so that box is checked, and there's no point spending more money on updating it after it already exists. What are we supposed to do, fill in the box? It's already checked!

So look, just assume that your automated tests are going to fall out of date until you hire a new gung-ho intern who finds it, or the original dev adds a new feature and goes to update the unit test they wrote. They are still useful as a regression test - in fact awesomely do. Having unit tests on complex code I've written has saved me a few times. But what do you do between gung-ho interns?

Even if you don't have an automated build tool or haven't got around to implementing your unit test framework yet, the developers can still perform manual unit testing. Stop grinding your teeth - it's not as bad as you think. You have Visual Studio, Eclipse, or GDB, right? Quit your whining. In my day we did unit tests by changing the screen color and we liked it.

It's actually really simple. The developer simply steps through the new code. Modern debuggers allow you to set the program counter and both observe and change variables in real time -- meaning that a developer can walk through all the possible paths of their new function in a matter of minutes without even needing to simulate the real world cases that would trip up every case. This is especially helpful when some of the cases are technically "impossible" (a programmer should never write "impossible" without quotation marks, hardware is involved). Inputs can be changed, the code can be walked through, and then the program counter can be set right back to the last branch and tried again.

It's true that this can take a while if a lot of code is written, and naturally you still need to run the real world tests (to see if it actually works, as opposed to theoretically works), but this is guaranteed to be faster than writing and testing the unit tests. Oh yeah, you missed that part, didn't you? You also have to test your unit tests actually work.

The worst unit test case I ever saw tested a full library of conversion functions by passing 0 to the base one and verifying that 0 came back out. As one might expect, 0 was a special case in this function. The other conversions actually contained off-by-one in about half the cases (and confused bits for bytes in several others - this was hardware based). But the unit test checkbox was marked, verifying that the software was correct, and more importantly, the unit test passed. It wasn't till we tried to use it that things went wrong.

So, I recommend both. Have the developer step through their code. Let's call it the Stepalicious Step. Then after it works, write unit tests as regression tests so that your build server feels like it's contributing. But make sure unit tests are considered first tier code, and go through your usual peer review phase, to avoid only checking the easy case.

"Oh yes, we do Agile, Regression testing, and Stepalicious." Oh, it's no dumber sounding than trusting your source code to a Git...