Wednesday, May 26, 2021

You're Doing It Wrong - AI

 AI will definitely destroy society. But not the way you think.

Disclaimer: I'm not a machine learning expert, and I know some will disagree strongly. Get your own blog, ya smarmy bastard. ;)

AI, or more correctly in most cases Machine Learning, is increasingly being used in difficult, abstract problems to give us a yes/no answer to questions even an expert would have trouble with.

There are also, as a result, countless stories of how AFTER training, it was discovered that these machines were skewed. They examined irrelevant details to come up with the answer, or they revealed biases in the dataset (which, honestly, people should have seen long before even starting.)

So machine learning, to give a very simple and mostly wrong description, is the act of wiring up a set of inputs (say, pixels of an image) to a set of outputs (say, "cat", "dog", "martian", "amoeba") through a chain of configurable evaluators. These evaluators, which are not called that by anyone with training in the field, are analogous to neurons in your brain.

The idea is, you show the inputs a picture of a cat, and tell it "cat". The machine tries a few combinations of settings and decides which one gave it "cat" most consistently. You show it a "dog" and it does the same thing, trying to remember the settings for "cat". Repeat for "martian" and "amoeba". Then repeat the whole process a couple of million times with different pictures randomly selected from the internet. The neurons slowly hone in on a collection of settings that generally produce the right output from all on those millions of inputs.

So you're done! You fed your electronic brain five million images, and it classified them with 99% accuracy! Hooray!

Now you give it a picture of a Martian it has never seen before. "Cat", it tells you confidently.

Well... um.. okay, cats have four legs and Martians only three, but, we're only 99% perfect. How about this lovely photo of an amoeba devouring a spore?

"Cat. 99.9% certainty."

"That's not a cat," you reply. 

You offer up a beautiful painting made in memorial of a lost canine friend. "Amoeba."

Frustrated, you offer up a cheezburger meme. "Cat," the AI correctly responds.

Relieved, you sit back and accidentally send it a set of twelve stop signs and one bicycle. "Martian."

So what the heck is going on?

Well, first off, you got your 5 million photos from the internet, so it was 80% cats. Thus the AI ended up with a configuration set that favors cats. It decided that abstract blobs and unrealistic strokes, much like the brush strokes in your canine painting, looked a lot like the background of slides on which amoeba were found - it didn't learn anything about amoeba themselves. And tall, thin objects were clearly Martians, since you didn't teach it about anything else that was tall and thin.

Now, machine learning, even in the primitive form we have today, has some value. In very narrow fields it's possible to give a machine enough information that the outputs start to make sense. But the problem is that these narrow field successes have led to trying over and over to apply it to broader questions - questions which are often difficult even for human experts with far more reasoning power.

There are two big problems with machine learning. The first is that in real life, you would never actually know why it made those mistakes. The neuron training sequence is relatively opaque and there are few opportunities to debug incorrect answers. It's a big opaque box even to the people who built it.

The second is data curation. When you create such large datasets, it's very hard - nearly impossible - to ensure it's a good data set. There must be NO details that you don't want the AI to look at. If you are differentiating species, then no backgrounds, no artistic details, even different lightning can be locked in as a differentiator. The AI has NO IDEA what the real world looks like, so it doesn't unconsciously filter out details like we do. To the machine EVERY detail is critical. If I give it a cat on a red background and a dog on a blue background, it is very likely to determine that all animals on a red background are cats, because that is easier to determine than the subtle shape difference. 

The dataset must also be all-encompassing. If you leave anything out, than that anything does not exist to the AI, and so providing it that anything automatically means it must be one of the other things. The brain can not choose "never seen before"... at least with most traditional training methods. At best you might get a low confidence score.

Finally, the dataset must be appropriately balanced. There may be cases where a skew is the right answer... for instance, a walking bird in the Antarctic is more likely a penguin than an emu, but if you are classifying people then you need to make sure the dataset contains a good representation in equal proportions of everyone. Sounds pretty hard, doesn't it? Yeah, that's the whole point. It's hard.

And that's a point I've made over and over again. Computing good hard like grammar. People are always looking for shortcuts, and they never work as well as expected. Not only is machine learning being seen as a huge shortcut to hard problems, but people are taking shortcuts creating the machine, and getting poor results. This shouldn't be a surprise. If you know the dataset is incomplete, why are you surprised that the machine doesn't work right? You're supposed to be smart. ;)

The real problem of all this is that people still think if a computer says it, it must be true. This is despite the daily experience with their cell phones, smart TVs, game systems and PCs all being buggy, malfunctioning pieces of crap, somehow the big mainframes at the mega-corporations (which generally don't exist anymore and the ones you are thinking of have less power than your smart watch), somehow those machines get it right.

So as machine learning continues to be used to classify people for risk, recognize people on the street, call out people for debt, etc, people are going to be negatively impacted by the poor training the machines received.

Computers are stupid. They are stupider than the stupidest person you've ever had to work with. They are stupider than your neighbor's yappy dog down the street who barks at the snow. They are stupider than those dumb ants who walk right into the ant trap over and over again. Computers do not understand the world and have no filter for what is relevant and what is not. Don't trust them to tell you what's true.