Thursday, October 11, 2018

I'm Doing it Wrong: Learning New Skills

We all need to learn at some point in our lives. If we're computer people, we pretty much need to learn all our lives.

Some people read, some take classes, increasingly over the years, I hit my head against a problem over and over again until the problem or my head breaks. I call this "being stupid". But it's made my head pretty hard.

Most recently, I was finally doing my first VHDL, for the Dragon's Lair cartridge that I've previously mentioned. I was pretty pleased that my understanding of the basics got the basic bank switching latch working pretty easily. And then someone on Twitter asked whether it would support GROM too, so that the cartridge would work on a 2.2 console.

I didn't see why, after all, nobody still willingly uses a 2.2 console, which was notorious in the TI-99 community as being TI's effort to prevent AtariSoft cartridges from working with their machines. Literally, they changed the machine to lockout AtariSoft. Since AtariSoft made some of the best games, and this move made them drop the TI-99 entirely, people were not impressed.

But this person noted that they grew up on a 2.2 console, and were always annoyed that none of the good games worked on their machine. Thus, seeing this new licensed title work on a v2.2 would just feel good. Given that nostalgia is the only sensible reason to still be messing with these computers 40 years after they were released, I had to agree.

I chose a really small CPLD - in fact the smallest I could get away with, so it was a tight fit. Eventually I did get reading the GROM bus, and an internal 8-bit address counter which would provide 256 bytes of GROM -- more than enough to bootstrap the ROM part of the cartridge. However, every time I put the address increment into the design, it would work in simulation but lock up the TI. Sometimes immediately, sometimes at random, but it didn't work.

As I noted - I am brand new to VHDL. I actually HAVE read a book cover to cover (a couple of times - and the smart version of me would give you the title as it was really nice, but this is the dumb version of me. Ask in a comment and I'll look it up later). But practice is different than reality. Further, searching Google for VHDL questions was even worse than searching Google for C questions. There were fewer hits and the answers were typically even less useful (though still usually of the 'why do you want to do that?' vein...)

But I was stuck. Now that I was this close, I knew it was possible and I couldn't release the idea. I WOULD have GROM support. I was prepared to go as low as 16 bytes if I needed to (I was pretty sure less than that wouldn't be enough for the header and startup code).

When you get stuck, it's good to look at things from a different point of view. I thought to myself, "I have GROM working if I don't increment. And I've always told anyone foolish enough to listen that the GPL interpreter doesn't rely on auto-increment anyway, it sets the address for every byte. So maybe I don't NEED to increment for a simple boot..."

So I looked into it. The code that builds the selection menu ("PRESS 1 FOR TI BASIC, 2 FOR DRAGONS LAIR") used the GPL 'MOVE' command, which absolutely does set the address for every single byte. But the code that scans for programs to display actually was coded in the assembly language ROM side of the system.

The cartridge header has two pointers that matter in this case. The first points to the list of programs on the cartridge, the second points to a particular program's boot address. When the ROM code builds the list of programs, it did use auto-increment on those 2-byte pointers.

After thinking about it for a while... I realized that even in 256 bytes, I had tons of room to spare. So I created pointers with repeated bytes:

CARTRIDGE BASE: 0x8000 (not a pointer, just where we are starting)
PROGRAM LIST: 0x8181 (since both bytes had to be the same, this was the earliest I could start)
BOOT ADDRESS: 0x9191 (due to only 256 bytes being decoded, this is 16 bytes later, leaving room for the program name)

So due to the address register size, the real values accessed would be 0x8081 and 0x8091, but the system didn't need to know that!

Then I wrote a quick little GPL program that cleared the screen, loaded the real cartridge ROM vector, and jumped to it.

I was thrilled when I rebooted the TI and my menu option appeared perfectly! Then I selected it, and the screen filled with a strange character and the system locked up.

It didn't take much digging to see what was wrong. The first command in GPL for clearing the screen was "ALL 32", which means to fill the screen with character 32, which is of course, the space. The opcode for ALL is 7. Running in the emulator showed that the screen was filled with character 7, and a dig through the source code to the GPL interpreter (one nice thing about 40 years later is how much has been documented!), and I learned that my assertion about the interpreter not using increment was ALMOST right. Increment WAS used for fetching the arguments to instructions.

Well bugger... I spent an evening trying to work around this. I looked for exploits in the interpreter, I looked for clever opcode abuse that might let me get assembly control, I looked for single byte opcodes that might be useful and I even looked for pairs of bytes that might be interpreted as useful opcodes. Ultimately, I had to conclude that this approach was a dead end, and the only way was to make the CPLD work correctly.

At this point I began to hammer on the code. Change, upload, test, observe. Change, upload, test, observe - for hours at a time. I attached my logic analyzer to various combinations, I ran simulations (which always worked, of course), all to squeeze out just a single bit of new information for my head.

I envision this stage much like grinding in a dungeon crawler game. I keep going up against the boss, and he keeps sending me back to the village inn to recover, but each time I get a tiny bit of XP. I come back a little smarter, and keep trying.

At one point - I got it! Almost. The increment was working but the address was off by one. Confident that I knew what I was doing, I decided to try a different architecture to solve that. The new architecture failed but in my over-confidence, I had failed to backup the almost working code. Sure enough, I had forgotten what I did. Stupid.

Another week of hammering on it led to no results, and so finally I asked for help. Another stupid, and I was worried about showing off my HDL, so I should have asked earlier. Anyway, my friend didn't spot anything immediately wrong, but we ran through some examples on the whiteboard. I gained double-digit XP for that session.

Then I looked at a similar project (but on a larger device). This one did things slightly differently from me, and I had shied away from the approach because it took more logic than what I had been trying (and didn't fit). But I decided to declare a loss, and just port his project.

So I did, stripping out all the pieces that didn't apply to me. This compiled and even fit, but when I tried it - it locked up the console just the same. (I still need to test if that person's full project works on my troubled console, if not, I have some XP I can share with him now...) Anyway, because it didn't "just work", I went back to mine.

By this point I had been starting to learn how to optimize a little bit, and I reworked my code to use a similar increment concept to what this other project did. This was costing me in gates, so I disabled the ROM side to allow me space to increment (removing 14 latched bits saves some amount of logic!) And after a few iterations, GROM was working! Fully! I was so thrilled, but the boss was not defeated yet!

I added the ROM side back in, and sure enough, I was over. But only by ONE macrocell. As we all know, every program can be optimized by one instruction (and has one bug, meaning every program can be optimized to a single instruction which doesn't work). Surely every circuit can be optimized by one component! (No, but bear with me). I noticed the toolchain complaining about the complexity of my delay mechanism...

I'm proud of this, cause this one part earned me the last XP I needed to beat the boss and get a fully working CPLD.

I had a simple mechanism to control my gating of the GROM to the TI bus, because it didn't seem to like it when I was too quick on OR off the bus. I had two signals controlled by the GROM clock... when the TI asked for GROM, first one went high, then on the next cycle the other one did (and gated the memory). When the TI released the bus, they released in the reverse order. This seemed to keep GROM stable. (My previous GROM work used an 8MHz AVR. For all its speed, it's still slow compared to hardware.)

Anyway, I did this using a sequence of IF statements, one for up and one for down. I realized that the tool might be creating a fairly complex circuit to support that exact behaviour, and wondered if I could simplify it. I drew a truth table and realized that sure enough, there was a combitorial relationship. I replaced the two if...then...else blocks with two simple AND/OR statements (still in the clock process). This freed up the one macrocell that I needed.

So that's it.. (Hopefully - my other stupid is declaring success before testing it on other consoles!) I can finally move on to the software. Ask for help early, don't be dumb like me. Even if the help itself can't be provided, you gain knowledge by both asking smart questions and listening to the answers. Eventually that's how you get to the end.

I posted a video of it running here: https://www.youtube.com/watch?v=W4yCPdeB4nc