Monday, February 13, 2017

Object Oriented Programming - How We Got Here

While reading "The Mythical Man Month" (which I didn't know was about programming), I was struck by the number of valid points that this book, written to help manage software products in 1975 -- which are still ignored or blatently flaunted today.
 
I have a bigger project where I hope to distill the most important points I see from that. Later.
 
But the book I picked up is a later edition with a 1986 addendum titled "No Silver Bullet". And in this article is a section entitled "Object-Oriented Programming - Will a Brass Bullet Do?" And in that section, is a single paragraph that enlightened me entirely as to how we got where we are with Object Oriented Programming.
 
I'd like to comment on that paragraph.
 
Now this was written in the infancy of the very concept of Object Oriented Programming, and it's musing about why the concept had not yet caught on as well as the author thought it should. And so he attempts to describe some of the goals of Object Oriented Programming. And it's this one paragraph summarizing "one view" that I can see the germs of thought that have today mutated into rampaging plagues across most of software development. Seriously, I used to wonder.
 
One view of object-oriented programming is that it is a discipline that enforces modularity and clean interfaces. A second view emphasizes encapsulation, the fact that one cannot see, much less design, the inner structure of the pieces. Another view emphasizes inheritance, with its concomitant hierarchical structure of classes, with virtual functions. Yet another view emphasizes strong abstract data-typing, with its assurance that a particular data-type will be manipulated only by operations proper to it.
 
Every one of those features is actually pretty good in the original intent - that is - that it is used where it makes sense. The problem is that these guidelines have been mutated in many programs into absolute laws. You absolutely may not access data inside another class. You don't need to see how a class was written, let alone have the right to modify it. Everything is inherited from something else - whether it makes any sense or not (the number of times I've had a basic data type with an inheritance chain six or more classes deep is no longer amusing to me, but rather depressing). And I've literally worked on a project where I was not allowed to store public data in a central database because the database, which existed in the software already, didn't support strong data typing. That was the reason.
 
The point of Object Oriented Programming was to make it faster and easier to develop pieces of software and bring those pieces together.
 
Modularity exists so that a component can be developed and tested in isolation. It makes no sense whatever to make a class modular if you still need other classes to make it work. That's not modular anymore, and you probably should consider whether those should be merged into one object, rather than an incestuous mess. And for what it's worth, bool is not a modular class. Don't wrap bool.
 
Encapsulation is a tricky one to grasp. It's stated so plainly - one cannot see the inner structure of the pieces. But good encapsulation requires two things: a good design and enough runtime to prove that the design actually is good. If you enforce encapsulation to the point of "nobody looks at the code and therefore nobody can change the code" from day one, all that will happen is you will end up with workarounds for missing, obtuse, or broken functionality. Worse, you'll probably try to code for every conceivable case, most of which aren't what people actually want to use, in hopes no changes will be needed. The project will be more complicated and less stable. I've seen people enforce this rule to the point where they are doing this with their own objects. Encapsulation is for stable code, not development code. And you don't need to encapsulate bool. Don't wrap bool.
 
Inheritance is one of the most powerful features of Object-Oriented Programming and frankly, one of the few features I actually really like. But you inherit where it makes sense. In most cases your inheritance chain should not be any more complicated than the example in most text books -- that being a base class extended to one level. In rare cases you may need two levels for certain objects (but certainly not all of them) and in equally rare cases it may make sense to have multiple inheritance (but certainly not all of them). Good planning goes a long way here. Going nuts with inheritance leads to complicated, incestuous code that is difficult to debug, difficult to modify (without breaking something else), difficult to implement and difficult to document. It's also poorly performing in many cases and in cases where it's not, harder to predict what the code will do. You don't need to start with basic classes like a wrapper around bool and inherit from there. Don't wrap bool.
 
Strong Abstract Data-Typing was meant to get away from the admittedly sloppy practice of casting objects in C and hoping you got it right. This feature alone is a good reason to port C code to C++, even if nothing else changes (you'll be surprised where you screwed up but it worked anyway ;) ). But it doesn't mean you need to wrap every type of data you want to use in a custom object just so the data-typing will protect your function calls. (In fact, in many cases passing different types of data around is a better job for classes with a common base class and utilizing inheritance...). But simply put, if you have several true or false items, you don't need to wrap bool in different classes to make sure you pass the right kind of bool to the right function. Bool is a bool. Don't wrap bool.
 
That's all I really wanted to say. I learned a bit about when modern programming missed the left turn in Albuquerque. It was roughly thirty-one years ago. We have GPS now, let's figure out where North is and start getting back on track.
 
 
 
 
 
 

Monday, December 19, 2016

C++ "MetaProgramming" and Why C++ Should Die

I saw a pretty awesome accomplishment on Twitter today - this fellow write a little raytracer that does all its calculations at compile time.
 
 
Pretty impressive, even if it takes a while. It's all done with templates and "metaprogramming", which is a fancy term used to excuse the complexity of programming both the computer AND the compiler.
 
No, I don't like it. And it's a great example of why C++ is done and needs to die.
 
I've loved C++ for a long time. I've encouraged friends learning in University without realizing the true pain they were experiencing. You see, I've been blissfully ignorant of how far things had gone for a long time. For the most part I ignored C++11 and C++14, until a recent project forced me into the deep end, and I got the O'Reilly book out and started reading.
 
I was pretty horrified, in general, but we'll focus on this particular aspect.
 
So the article above is about this fellow learning these new concepts with an ambitious and fairly impressive task, inspired more or less by this example:
 
Take this example:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
template<int base,int power>
struct Pow
{
const static int result = Pow<base,power-1>::result*base;
};
 
template<int base>
struct Pow<base,0>
{
const static int result = 1;
};
 
int main(int argc,char *argv[])
{
return Pow<5,2>::result;
}
 
If we look into the assembly file produced we just see the constant 25 being written to a register.
 
 
 
mov eax, 25
 
Your formatting is crap, Blogger.
 
Anyway, so what happens there is that main() invokes a template (Pow<5,2>), which generates a recursive chain of structures, each containing the next power, until the power is zero and the final template is invoked. The compiler runs through all this, and the final result is that single assembly instruction that generates a single const value "25" (5^2 = 25).
 
Fans of this style of programming point at the amazing efficiency of this resulting code as a major win. It's so much faster, they will tell you, than running the code the old fashioned way. But I call bullshit. Because in "the old way" we wouldn't have done that anyway. Not if performance mattered. Do you know what'd we do? Hell, this goes all the way back to "C", not even one plus!
 
const static int result = 25;
 
"Oh! Oh! Oh!" cries the peanut gallery. "But the computer didn't calculate that for you!!"
 
Of course it did. We did it offline. Or we did it in a separate program. Or we used a calculator. Or we did it at startup and cached the value. Or in the worst case, maybe we used a code generator. (Deliberately ignoring the fact that this example is very simple and didn't need code at all).
 
But! Isn't using a code generator for the most complex cases exactly what we did here? It's just built into the language now, isn't it?
 
Well, yes and no.
 
Yes, you essentially used a code generator to calculate the problem and reduce the code to the important single constant. But this is about the most complicated, difficult to debug way of doing it that I could have imagined.
 
First of all, you just littered your namespace with three different Pow structures. You didn't need the other two, but the compiler did, and they exist. It was a lot more expensive for the compiler to calculate all other structures and then decide what was really needed than just about any other technique would have been, which means your compile time is increased (substantially, in fact, depending on how many Pow's you need and how deep they have to go!) And suppose you typoed in the base,0 template? Well, then your error code is going to have to reflect the entire chain. In this case, it's a short chain of just three entities, and the error is a single line per entity, since it's very simple.
 
$ gcc test.cpp -otest
test.cpp:10: error: `into' does not name a type
test.cpp: In instantiation of `Pow<5, 1>':
test.cpp:4:   instantiated from `Pow<5, 2>'
test.cpp:15:   instantiated from here
test.cpp:4: error: `result' is not a member of `Pow<5, 0>'
 
But real life templates tend not to be so simple. And because of the nature of the templates, you can trigger errors simply by specifying the wrong type of argument to a parameter (for instance, forgetting to std::move can break some templates). The result can be pages of template chain errors, making troubleshooting difficult. And indeed, that is what our experimenter found:
 
This was the first time I had tinkered with metaprogramming so it was pretty hard at first, basically once one template evaluation fails you get a huge chain of thousands of failures.
 
The entire direction of the language's development seems to have shifted towards programming the compiler to generate constants for you at compile time. This is a good thing and we've often done it in the past, but it's not ALWAYS the right answer, and today it's being taken to ridiculous extremes. Some of the things I've seen mean that the compile-time ray tracer isn't even that outrageous to me, I've seen things attempted that feel on the same level of complexity. And I don't believe that we should be doing that.
 
Why not?
 
Well, this impacts you in several ways:
 
-compile time is longer. How many of those complicated template chains result in the single constant that the example above shows? You've built the code hundreds of times and never changed that value, have you? Make it a damned constant and save the time.
-typos in the code are MUCH harder to understand. If you've done std or boost template programming, you already know what I mean. If you haven't yet, you will. If you're a god who never makes mistakes, go back to cartoon land. This costs time - a simple typo that may be as simple as a missing modifier goes from a 10 second change to a minute or more, just to determine what line the error actually occurred on. I know people who switch to a different, non-production compiler for testing, just because the error messages are less verbose (meaning an entire compile phase is wasted). This time adds up substantially.
-learning time is longer. If you're using your own complex template chains (in addition to std, boost, or other common ones), then you have a larger and deeper codebase for a new developer to have to come to terms with -- and the two issues above are not going to help with that. Since most developers on most projects are thrown in with little more than an incomplete wiki and a promise to get around to guidance, you'd think that simple, easy to follow code would have some value.
 
I'm reminded of an old quote by DadHacker (http://www.dadhacker.com/blog/):
 
The future of computing is its own past, mashed-up and remixed by young'uns who have yet to fear the dark corners, the places where us old farts went in with similar bushy-tailed attitudes and came out with ashen-faced, eyes barn-door wide and with fifty new words for "pucker." Heed us. The stove is hot if you touch it. The stove is not only hot, it will incinerate your soul. At some point you will want to make pancakes or wash dishes for a living rather than run another build or merge another check-in or fix another bug...
-Dadhacker
 

Wednesday, November 23, 2016

Rock Band Guitar Overdrive Update

I'd been having some trouble with Overdrive on my very abused Rock Band guitars -- to the point where one of them (an original RB1 unit) all but quit working altogether.
 
I took it apart to see what could be done, and was surprised to see that the tilt sensor actually used little metal balls in a can (this is why the guitars rattle when you shake them). It used two of the sensors wired in series, probably to better filter false positives caused by vibration.
 
Sensors used were similar to this: https://www.adafruit.com/product/173
 
 
Some testing suggested that one of the sensors was barely responding at all anymore, so to get it going, I shorted one of the sensors out, so that only the other one was needed to trigger. This only sort of worked and got us through the evening.
 
I ordered a set of mercury tilt switches to replace them. I got a set of 10 little ones from Amazon for $6, so I could install two in each of my guitars. https://www.amazon.com/gp/product/B00M1PNBTE/ref=oh_aui_detailpage_o01_s01?ie=UTF8&psc=1
 
 
Since this is the internet, obligatory warning. Mercury is a toxic metal that can be absorbed. If you break one of these, it will be hard to safely clean up. Don't bother if you don't know why that matters.
 
I had two guitars I needed to update - a Rock Band 1 and a Rock Band 2 -- there are obvious external differences, and a number of internal ones. I'm only interested in the tilt switches here. In this case, Rock Band 2 upgraded the tilt sensor. It still uses the ball bearing type of sensor, but it used two larger sensors (heavier balls), and it wired them in parallel instead of series (so that EITHER switch could trigger it). It then went one further than that and added a port to the side of the guitar so you could plug in an external foot switch for overdrive - this is also wired in parallel.
 
Testing showed that the ball-based tilt sensors worked, but the connection was iffy. It was bouncy and imperfect. The mercury tilt switch, by comparison, works by immersing two contacts in conductive liquid metal. The connection, compared to the ball switches, was pretty much perfect and very low resistance with no bounce.
 
So for the Rock Band 1 guitar, it was a straight remove-and-replace. With the switches being so much better I wanted to keep the series connection so that just shaking or bouncing the guitar was less likely to generate an accidental overdrive. I then bent the leads to get the approximate angle I wanted the switches to trip at.
 
For Rock Band 2's guitar, the switches were wired in parallel. Again, Because I liked the idea of the series connection providing resistance against false triggers, I tied two of the leads together and wired that up, after insulating the PCB to prevent shorts. This gave me a series connection on that guitar as well.
 

When I went to install them into the guitar, I hit another small snag... the normal orientation of the guitar meant that the switches lay flat instead of tilting, which made them trigger too easily.
 
Fortunately, the boards mount by means of a slot and are held in with screws with a very wide head. Friction meant I could just lay the board flat on the mount and screw it into place like that -- this worked fine.
 
 
And there we go! Hooked it all up and it seems to be working just fine! I probably should have replaced the reed switches while I was in there, but we'll do that next time. ;)
 
 

Sunday, November 6, 2016

The Programmer is not an end user

As I continue to modernize my skillset into C++11 and C++14, as well as pick up side toys like Unity, I'm more and more noticing a really disturbing trend. It took me a while to figure out what it was that was bugging me so much, and tonight I finally realized - the programmer is being treated like an end-user.

Phrases like "you don't need to know about that" and "don't worry about that" are backed up with massively complex templates that hide the actual behavior of an object. I've been going through the O-Reilly book on C++11/14, and the number of times it warns that the same line of code can have drastically different effects due to context (because of the complex templates and code features backing them) has exceeded my capacity to remember. Even the simple code editor is getting into the game with concepts like code folding, whose sole job it is to hide code not currently being worked on from the programmer doing the work.

Why? Is it really that distracting? (Actually the conspiracy theorist in me suspects it was first done so that really long files could be managed in really poorly written editors... but that's a different rant!)

The programmer is the first and often the ONLY person who can view the code and tell you what it's going to do. He or she is your first AND last line of defense. Why in the name of all things Boolean would you hide details from that person and leave them in the state of "I don't know"?

It was code collapsing that triggered me to write tonight. I've known about it forever, I even know people who use it. I chose not to, because I prefer to understand what I'm working on. But only tonight did I realize it was probably the simplest and most insidious of this deliberate attempt to "dumb down" the act of writing code, and why it was bad.

So why is that? Even if you are not working on a piece of code, the act of reading through it as you skim past can catch bugs. This has happened to me many times - especially in group projects. You see it, it catches your eye, you go "oh my god!" and you fix it, BEFORE the customer finds it.

Or, you collapse all the code except the one little function you've chosen to put your blinders onto, and the bug goes unnoticed until it takes out the internet because nobody ever bothered to look at it, despite hundreds of eyes passing over that function. Yeah, tell me that's never happened. ;)

People: LOOK AT THE CODE. That's your JOB as a programmer, for pete's sake. Keep things simple enough to actually understand, and TEST that code. Don't say "oh, that's Test Group's problem". No, Test Group's job is black box testing - to make sure the ultimate product as a whole passes a reasonable set of tests and does what is REQUIRED. YOUR job, as programmer, is to test every code path you write to ensure it does what you INTENDED. The two are not always the same, that's why you HAVE a different group.

Rant rant rant...

Saturday, September 17, 2016

Atari Jaguar Programming Causes Brain Damage - Confirmation

Well, not necessarily. But after a few hours last night I had a crazy headache that I ended up getting out of bed to take advil for. ;)
 
Anyway, I took a break from work and from my pending TI project, both of which are breaking my self-esteem at alarming rates, to wrap up a support project for my Atari Jaguar cartridge boards. When I first tested them, I attempted to burn a slightly modified version of Tempest (just some text string edits) -- only to find it didn't boot.
 
I realized after thinking it through that Jaguar carts are tested against an MD5 sum (to avoid modification and bad contacts). So the MD5 hash would need to be updated too. That's not unusual, many game systems had a checksum or such to confirm the game would work. The problem on the Jaguar is that the hash is buried in the proprietarily encrypted portion of the boot header - so I'd need to re-encrypt it too.
 
That's not the end of the world... the tools were discovered years ago and are out there. To date I'd used the Atari ST encryption tool to create a "fast boot" header for the Skunkboard... and learned a bit there. One interesting thing about that project was the discovery that the Jaguar CD subverted the boot process - our first pass didn't even work plugged into a Jag CD.
 
Atari Jaguar cartridges start with an encrypted boot header, broken into 65 byte blocks. Each block is encrypted with a full 520-bit key (I kind of wonder if that didn't violate export restrictions back in the mid-90s? I could Google but I won't...). It takes about half a second to decrypt one block, and the normal cartridge has ten of them. The code is decrypted into the GPU, where it is then executed. The code runs an MD5 hash on the cartridge, compares it to the one that was embedded, and if all looks good, it writes the magic value 0x03D0DEAD to the first GPU RAM address and exits. On exit, the BIOS checks for the magic value, and boots the cart if it sees it, or red-screens if it doesn't.
 
There is a small complication in altering this code in that the Encryption tool writes several values to fixed addresses before it encrypts the boot - specifically it stores the MD5 hash, some state information, and the first and last address of the cart. So our Skunkboard boot needed to be tolerant of that (we just left the areas unused).
 
00F035AC: MOVEI   $00F03566,R00    (9800) ; address to pass ROM check
00F035B2: MOVEI   $03D0DEAD,R04    (9804) ; magic value to unlock 68k
00F035B8: JUMP    (R00)            (D000) ; go do it!
00F035BA: NOP                      (E400) ; delay slot
 
That was literally it - we just wrote the magic value and jumped back to the startup code to handle the return. I got the code encrypted by hex-editing the Atari ST program and re-running it.
 
The CD unit works a little differently. Intentionally or just because they could (it's not clear to me why), they leave the GPU busy on a little VLM demo (courtesy of Yak!), and decrypt these blocks into the DSP instead. The DSP is a nearly-identical processor to the GPU, so okay, that's cute. But then the CD BIOS makes several hard-coded fix-ups to absolute addresses in the decrypted code (without checking if it's the code that it expects). Then it jumps past the first part of the decrypted code into a later entry point to do the MD5 hash.
 
So when we updated the Skunkboard boot for the CD unit, first we had to add a second block (because the jump point is past the first block), meaning we went from half a second to a full second boot. Then we had to document and avoid the manual patch areas. Finally, we had to be able to run on both the GPU and the DSP, despite them having different address bases. But, our code was extremely simple, so it wasn't hard to make it fit. We borrowed some Atari code to handle the device independence (although, since we duplicated the work, it ended up not being necessary), and it was fine.
 
 MOVEI #$00FFF000,R1  ; AND mask for address
 MOVEI #$00000EEC,R2  ; Offset to chip control register
 MOVEI #$03D0DEAD,R4  ; magic value for proceeding
 MOVE PC,R0           ; get the PC to determine DSP or GPU
 AND R1,R0            ; Mask out the relevant bits
 STORE R4,(R0)        ; write the code
 SUB R2,R0            ; Get control register (G_CTRL or D_CTRL)
 MOVEQ #0,R3          ; Clear R3 for code below
GAMEOVR:
 JR GAMEOVR           ; wait for it to take effect
 STORE R3,(R0)        ; stop the GPU
 
; Need an offset of $48 - this data is overwritten by the encrypt tool
; with the MD5 sum.
 NOP
 NOP
 MOVEI #$0,R0
 MOVEI #$0,R0
 MOVEI #$0,R0
 MOVEI #$0,R0
 MOVEI #$0,R0
 MOVEI #$0,R0
 
; JagCD entry point (same for now)
Main:
 ; There is a relocation at $4A that we can't touch
 MOVEI #$0,R0         ; dummy value
 ; real boot starts here
 MOVEI #$00FFF000,R1  ; AND mask for address
 MOVEI #$0,R0         ; This movei is hacked by the encryption tool
 MOVEI #$0,R0         ; This movei is hacked by the encryption tool
 MOVEI #$00000EEC,R2  ; Offset to chip control register
 MOVEI #$03D0DEAD,R4  ; magic value for proceeding
 MOVE PC,R0           ; get the PC to determine DSP or GPU
 AND R1,R0            ; Mask out the relevant bits
 STORE R4,(R0)        ; write the code
 SUB R2,R0            ; Get control register (G_CTRL or D_CTRL)
 MOVEQ #0,R3          ; Clear R3 for code below
GAMEOVR2:
 JR GAMEOVR2          ; wait for it to take effect
 STORE R3,(R0)        ; stop the DSP
Despite the extra code for device independence (which, as I noted, was unnecessary in the end), it's still pretty much the same. I actually released a 'makefastboot' tool which would prepend this header to any cart to make it boot in 1 second instead of 5.
 
So, we come back to today. I wanted to update the above tool to not only add the fast boot to my new carts, but to add a simple checksum that could be externally updated, disabled, etc. I figured "how much code can a checksum take? Should fit easily." I updated the patching tool to calculate and write the checksum, as well as let me set the cartridge width and speed parameters all in one.
 
Well... it turns out that when you have only 20-30 bytes available, and a true RISC instruction set (not one of those fully populated instruction sets that people call RISC today because it didn't have 3D acceleration 30 years ago), a checksum function gets a little tight!
 
But, after some tight maneuvering and some lessons from the local contortionist, I got the code to fit. The winning realization was when I realized that I did not need to store the 0x03D0DEAD value manually in the memory, OR test if the checksum worked. I just added that magic value to the checksum in the header itself. That way, all the loop had to do was subtract bytes from the checksum, and if everything was good, it would be left with 0x03D0DEAD. I just wrote the result and exited - the Jaguar BIOS would check if it passed or not! The code itself jumped around a bit, over the hash blocks and into the second area, but it looked good. The CD entry point wasted a few bytes jumping into the GPU entry point, but I was satisfied.
 
Unfortunately, it crashed. And then I ran into the second problem with Atari Jaguar programming... there are no decent debug tools. Especially to debug GPU code that needs to be decrypted before it can even be examined and that the BIOS wipes out of paranoia after success or failure.
 
I dug out an old emulator that I'd used for debugging back in the day, and after a little time poking around I'd found some of the hooks I'd put in to aid with debugging. Fortunately, it included a disassembler, so I hacked in a GPU run-trace that executed when my encrypted code started to execute.
 
When my code started, I found to my surprise that the relative jumps were going all over the place, but certainly not where I intended. After some cursing, head scratching, and pacing, I finally decided to RTFM. Which informed me that relative jumps have a range of -15/+16 words. My jumps were far larger. The MD5 block itself was 20 words long. And a non-relative jump requires an address in a register, meaning that some how I'd need 6 bytes for a long plus the 2 for the jump. Some counting confirmed that was true for all but one of my jumps. Because of all the crap I had to squeeze around, that I didn't even use, I was out of space. Time for a change of approach.
 
I went hunting through my archives and, on the AtariHQ CD, I found source for a PC version of the encryption tool. It took a little more digging to find the private and public keys (and two versions of the public key... I had to just test to see which one was right). But after a little fiddling I had a working version of the tool.
 
With the ability to control the encryption now under my command, I was able to add a mode that did NOT patch the code before encryption. No MD5 hash, no patching the start and stop addresses. Now all I had to worry about were the little fixups that the CD BIOS did. I rewrote the code to not skip over the MD5 hash, meaning the first block was entirely free to use (65 whole bytes! yes!), and had lots of room for the CD entry to load up a register and jump to the same entry point.
 
(Sorry I'm not showing the code bits here... I didn't save the intermediaries. We'll talk about proper use of source control another day...)
 
So I fired it up, and made the emulator disassemble all of GPU RAM when it started, so I could verify that all looked good. I had a bit of trouble with the decryption... although only 65 bytes of each block are /used/, the actual code works on multiples of 32-bits, meaning I needed to preserve 132 bytes of the encrypted data for it to actually work. Anyway, with the dump I was able to compare to the source code and prove that the decryption was working.
 
And it failed. I tried tracing the entire run, but it was a 2 megabyte cartridge, and the debugger output was slowing things down immensely. Finally I tweaked it up to just output the last 10 checksum steps. From that I could see that the result was way off.
 
I did a little inverse math and calculated what the Jaguar thought the checksum should be, and injected that into the header. It booted! Excellent. So now I knew that the checksum code worked.
 
I went back to my patching tool, and poked around trying to figure out why it was failing to generate the same checksum. I thought at first it was off by one (reading one too many or one too few words). Turned out to be off by 8192! How? Well.. there's an 8k overall header on the cart, and so my checksum code just skips over that (it's the decrypted area or unused, so technically already proven). However, when I /calculated/ the checksum, I forgot to skip it. ;) Fixing that fixed the checksum.
 
And now the cart booted! Great! I started packaging it up.. then reluctantly decided I better test the CD entry point. "What could go wrong?" I asked. "It's just a jump."
 
I had previously hacked some really limited support for the CD BIOS into my emulator for this very reason - for proving the Skunkboard. But the DSP code in the emulator was a lot different than the GPU, the author had attempted pipeline emulation and it had less debug help. This took a long time to just get into a state where I could prove it was even RUNNING, let alone what it was doing.
 
Ultimately, though, and back and forth testing with older carts and proven systems, I was able to prove it, and, surprise bloody surprise, it didn't work.
 
This caught me off guard. The Skunkboard boot was still working! So again, disassembled all memory when it started up and had a look-see. What did I see but a huge block of 0xff bytes right in the middle of my code, suspiciously aligned with the MD5 sum that I wasn't using anymore.
 
Yep, that's right, the CD BIOS scribbled over that memory to obfuscate the MD5 sum before jumping to the later entry point. My new entry point which just jumps back to the main code. Never had a problem on the Skunkboard code because both entry points just ran in their own space. %$@$#@.
 
I still had a lot of room in the CD entry area, but after counting bytes, there was not enough to duplicate the whole function. The GPU version of the code used more than 65 bytes, so spilled into the CD area. So what to do...?
 
Ultimately, I noticed that the checksum routine itself was intact, it was just the post-checksum that got overwritten. So, I got clever and implemented the checksum code as a subroutine. The DSP version could call it and then handle it's own post-checksum code, and the GPU version could do the same on its side. It didn't matter if the GPU-specific code was overwritten when the DSP was running, since it'd never be used. Subroutines are a bit messy on the Jaguar RISC, instead of anything traditional you just store the PC in a register. But I was able to use that later for the device independent code (which, again, probably ended up unnecessary, but at this point the headache was starting... ;) )
 
While I was tracing the DSP version to understand the failures, I also realized that I needed to take into account the different address of the DSP RAM versus GPU RAM (they don't use local memory addresses, but system global addresses), and THEN I discovered that they don't even load to the same relative offset within RAM... meaning the code needed to be fully position-independent (except for the one jump.) (You can see the offset in the "SHARED+" line at the CD entry point.)
 
I present to you the final version of the code, which works on both GPU and DSP boots:
 
; Jaguar cart encrypted boot.; We need to deal with two entry points, one for Console and one for CD.
; This uses my custom version of the encryption tool that doesn't overwrite
; huge blocks, so the only patches we need to watch out for are the CD's.
; this frees up a lot of space for code.
; However, the CD overwrites a lot of data, particularly the MD5 hash,
; meaning you can't use that space after all if you want to work with the CD.
; This code works, but I don't necessarily have ALL the black-out areas marked.
; (Note, too, the CD unit runs this code on the DSP, not the GPU.)
.gpu
 .org $00F035AC
; entry point for console
; before we unlock the system, do a quick checksum of the cart
; unlike the official code, we take the data for the checksome
; from unencrypted memory after the boot vector:
;
; 0x400 - boot cart width (original)
; 0x404 - boot cart address (original)
; 0x408 - flags (original)
; 0x410 - start address to sum
; 0x414 - number of 32-bit words to sum
; 0x418 - 32-bit checksum + 0x03D0DEAD (this is important! that's the wakeup value!)
;
; By adding the key to the checksum, and subtracting as we go through
; the cart, we can just write the result to GPU RAM and not spend any code
; on comparing it - the console will do that for us. That really helped this fit.
; Be careful.. SMAC does NOT warn when a JR is out of range. Only got 15 words!
; from here we have 32 bytes until the DSP program blocks out
ENTRY:
 JR gpumode           ; skip over
 nop

SHARED:
 movei #$800410,r14   ; location of data
 load (r14),r15       ; get start
 load (r14+1),r2      ; get count - offset is in longs
 load (r14+2),r8      ; get desired checksum + key
chklp:
 load (r15),r4        ; get long
 subq #1,r2           ; count down
 addqt #4,r15         ; next address
 jr NZ,chklp          ; loop if not done on r2
 sub r4,r8            ; (delay) subtract value from checksum, result should be 0x3D0DEAD at end
 jump (r6)            ; back to caller
 nop
; 30/32
 
; have to break it up, cause the CD unit wipes the hash memory... GPU is okay though
gpumode:
 move pc,r6           ; set up for return
 addq #14,r6
 movei #SHARED,r0
 jump (r0)
 nop

 MOVEI #$00FFF000,R1  ; AND mask for address
 AND R1,R6            ; mask out the relevant bits of PC

 MOVEQ #0,R3          ; Clear R3 for code below

 MOVEI #$00000EEC,R2  ; Offset to chip control register
 STORE R8,(R6)        ; write the code (checksum result, hopefully 3d0dead)
 SUB R2,R6            ; Get control register (G_CTRL or D_CTRL)
GAMEOVR:
 JR GAMEOVR           ; wait for it to take effect
 STORE R3,(R6)        ; stop the GPU/DSP
; 68
 
; .org $f035f4 <-- doesn't work, we'll just pad then
 dc.w $5475,$7273
 
; JagCD entry point
; should be at f035f4 (72/$48)
; we should have 50 bytes here
dspmode:
 ; There is a CD relocation at $4A that we can't touch, the MOVEI covers it
 MOVEI #$12345678,R9  ; this movei is hacked by the CD boot

 MOVE PC,R6           ; prepare for subroutine
 ADDQ #14,R6
 MOVEI #SHARED+$180B8,R0  ; prepare for long jump (DSP offset included)
 JUMP (R0)            ; go do it
 NOP                  ; delay slot

 MOVEI #$00FFF000,R1  ; AND mask for address
 AND R1,R6            ; mask out the relevant bits of PC

 MOVEQ #0,R3          ; Clear R3 for code below

 MOVEI #$00000EEC,R2  ; Offset to chip control register
 STORE R8,(R6)        ; write the code (checksum result, hopefully 3d0dead)
 SUB R2,R6            ; Get control register (G_CTRL or D_CTRL)
DGAMEOVR:
 JR DGAMEOVR          ; wait for it to take effect
 STORE R3,(R6)        ; stop the GPU/DSP
; 44/50

 END
 
Anyway... it's nice to have it finally done. I'll post a comment below when I get it up on Github, I'm late for a meeting right now. ;)