Monday, December 19, 2016

C++ "MetaProgramming" and Why C++ Should Die

I saw a pretty awesome accomplishment on Twitter today - this fellow write a little raytracer that does all its calculations at compile time.
 
 
Pretty impressive, even if it takes a while. It's all done with templates and "metaprogramming", which is a fancy term used to excuse the complexity of programming both the computer AND the compiler.
 
No, I don't like it. And it's a great example of why C++ is done and needs to die.
 
I've loved C++ for a long time. I've encouraged friends learning in University without realizing the true pain they were experiencing. You see, I've been blissfully ignorant of how far things had gone for a long time. For the most part I ignored C++11 and C++14, until a recent project forced me into the deep end, and I got the O'Reilly book out and started reading.
 
I was pretty horrified, in general, but we'll focus on this particular aspect.
 
So the article above is about this fellow learning these new concepts with an ambitious and fairly impressive task, inspired more or less by this example:
 
Take this example:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
template<int base,int power>
struct Pow
{
const static int result = Pow<base,power-1>::result*base;
};
 
template<int base>
struct Pow<base,0>
{
const static int result = 1;
};
 
int main(int argc,char *argv[])
{
return Pow<5,2>::result;
}
 
If we look into the assembly file produced we just see the constant 25 being written to a register.
 
 
 
mov eax, 25
 
Your formatting is crap, Blogger.
 
Anyway, so what happens there is that main() invokes a template (Pow<5,2>), which generates a recursive chain of structures, each containing the next power, until the power is zero and the final template is invoked. The compiler runs through all this, and the final result is that single assembly instruction that generates a single const value "25" (5^2 = 25).
 
Fans of this style of programming point at the amazing efficiency of this resulting code as a major win. It's so much faster, they will tell you, than running the code the old fashioned way. But I call bullshit. Because in "the old way" we wouldn't have done that anyway. Not if performance mattered. Do you know what'd we do? Hell, this goes all the way back to "C", not even one plus!
 
const static int result = 25;
 
"Oh! Oh! Oh!" cries the peanut gallery. "But the computer didn't calculate that for you!!"
 
Of course it did. We did it offline. Or we did it in a separate program. Or we used a calculator. Or we did it at startup and cached the value. Or in the worst case, maybe we used a code generator. (Deliberately ignoring the fact that this example is very simple and didn't need code at all).
 
But! Isn't using a code generator for the most complex cases exactly what we did here? It's just built into the language now, isn't it?
 
Well, yes and no.
 
Yes, you essentially used a code generator to calculate the problem and reduce the code to the important single constant. But this is about the most complicated, difficult to debug way of doing it that I could have imagined.
 
First of all, you just littered your namespace with three different Pow structures. You didn't need the other two, but the compiler did, and they exist. It was a lot more expensive for the compiler to calculate all other structures and then decide what was really needed than just about any other technique would have been, which means your compile time is increased (substantially, in fact, depending on how many Pow's you need and how deep they have to go!) And suppose you typoed in the base,0 template? Well, then your error code is going to have to reflect the entire chain. In this case, it's a short chain of just three entities, and the error is a single line per entity, since it's very simple.
 
$ gcc test.cpp -otest
test.cpp:10: error: `into' does not name a type
test.cpp: In instantiation of `Pow<5, 1>':
test.cpp:4:   instantiated from `Pow<5, 2>'
test.cpp:15:   instantiated from here
test.cpp:4: error: `result' is not a member of `Pow<5, 0>'
 
But real life templates tend not to be so simple. And because of the nature of the templates, you can trigger errors simply by specifying the wrong type of argument to a parameter (for instance, forgetting to std::move can break some templates). The result can be pages of template chain errors, making troubleshooting difficult. And indeed, that is what our experimenter found:
 
This was the first time I had tinkered with metaprogramming so it was pretty hard at first, basically once one template evaluation fails you get a huge chain of thousands of failures.
 
The entire direction of the language's development seems to have shifted towards programming the compiler to generate constants for you at compile time. This is a good thing and we've often done it in the past, but it's not ALWAYS the right answer, and today it's being taken to ridiculous extremes. Some of the things I've seen mean that the compile-time ray tracer isn't even that outrageous to me, I've seen things attempted that feel on the same level of complexity. And I don't believe that we should be doing that.
 
Why not?
 
Well, this impacts you in several ways:
 
-compile time is longer. How many of those complicated template chains result in the single constant that the example above shows? You've built the code hundreds of times and never changed that value, have you? Make it a damned constant and save the time.
-typos in the code are MUCH harder to understand. If you've done std or boost template programming, you already know what I mean. If you haven't yet, you will. If you're a god who never makes mistakes, go back to cartoon land. This costs time - a simple typo that may be as simple as a missing modifier goes from a 10 second change to a minute or more, just to determine what line the error actually occurred on. I know people who switch to a different, non-production compiler for testing, just because the error messages are less verbose (meaning an entire compile phase is wasted). This time adds up substantially.
-learning time is longer. If you're using your own complex template chains (in addition to std, boost, or other common ones), then you have a larger and deeper codebase for a new developer to have to come to terms with -- and the two issues above are not going to help with that. Since most developers on most projects are thrown in with little more than an incomplete wiki and a promise to get around to guidance, you'd think that simple, easy to follow code would have some value.
 
I'm reminded of an old quote by DadHacker (http://www.dadhacker.com/blog/):
 
The future of computing is its own past, mashed-up and remixed by young'uns who have yet to fear the dark corners, the places where us old farts went in with similar bushy-tailed attitudes and came out with ashen-faced, eyes barn-door wide and with fifty new words for "pucker." Heed us. The stove is hot if you touch it. The stove is not only hot, it will incinerate your soul. At some point you will want to make pancakes or wash dishes for a living rather than run another build or merge another check-in or fix another bug...
-Dadhacker
 

Wednesday, November 23, 2016

Rock Band Guitar Overdrive Update

I'd been having some trouble with Overdrive on my very abused Rock Band guitars -- to the point where one of them (an original RB1 unit) all but quit working altogether.
 
I took it apart to see what could be done, and was surprised to see that the tilt sensor actually used little metal balls in a can (this is why the guitars rattle when you shake them). It used two of the sensors wired in series, probably to better filter false positives caused by vibration.
 
Sensors used were similar to this: https://www.adafruit.com/product/173
 
 
Some testing suggested that one of the sensors was barely responding at all anymore, so to get it going, I shorted one of the sensors out, so that only the other one was needed to trigger. This only sort of worked and got us through the evening.
 
I ordered a set of mercury tilt switches to replace them. I got a set of 10 little ones from Amazon for $6, so I could install two in each of my guitars. https://www.amazon.com/gp/product/B00M1PNBTE/ref=oh_aui_detailpage_o01_s01?ie=UTF8&psc=1
 
 
Since this is the internet, obligatory warning. Mercury is a toxic metal that can be absorbed. If you break one of these, it will be hard to safely clean up. Don't bother if you don't know why that matters.
 
I had two guitars I needed to update - a Rock Band 1 and a Rock Band 2 -- there are obvious external differences, and a number of internal ones. I'm only interested in the tilt switches here. In this case, Rock Band 2 upgraded the tilt sensor. It still uses the ball bearing type of sensor, but it used two larger sensors (heavier balls), and it wired them in parallel instead of series (so that EITHER switch could trigger it). It then went one further than that and added a port to the side of the guitar so you could plug in an external foot switch for overdrive - this is also wired in parallel.
 
Testing showed that the ball-based tilt sensors worked, but the connection was iffy. It was bouncy and imperfect. The mercury tilt switch, by comparison, works by immersing two contacts in conductive liquid metal. The connection, compared to the ball switches, was pretty much perfect and very low resistance with no bounce.
 
So for the Rock Band 1 guitar, it was a straight remove-and-replace. With the switches being so much better I wanted to keep the series connection so that just shaking or bouncing the guitar was less likely to generate an accidental overdrive. I then bent the leads to get the approximate angle I wanted the switches to trip at.
 
For Rock Band 2's guitar, the switches were wired in parallel. Again, Because I liked the idea of the series connection providing resistance against false triggers, I tied two of the leads together and wired that up, after insulating the PCB to prevent shorts. This gave me a series connection on that guitar as well.
 

When I went to install them into the guitar, I hit another small snag... the normal orientation of the guitar meant that the switches lay flat instead of tilting, which made them trigger too easily.
 
Fortunately, the boards mount by means of a slot and are held in with screws with a very wide head. Friction meant I could just lay the board flat on the mount and screw it into place like that -- this worked fine.
 
 
And there we go! Hooked it all up and it seems to be working just fine! I probably should have replaced the reed switches while I was in there, but we'll do that next time. ;)
 
 

Sunday, November 6, 2016

The Programmer is not an end user

As I continue to modernize my skillset into C++11 and C++14, as well as pick up side toys like Unity, I'm more and more noticing a really disturbing trend. It took me a while to figure out what it was that was bugging me so much, and tonight I finally realized - the programmer is being treated like an end-user.

Phrases like "you don't need to know about that" and "don't worry about that" are backed up with massively complex templates that hide the actual behavior of an object. I've been going through the O-Reilly book on C++11/14, and the number of times it warns that the same line of code can have drastically different effects due to context (because of the complex templates and code features backing them) has exceeded my capacity to remember. Even the simple code editor is getting into the game with concepts like code folding, whose sole job it is to hide code not currently being worked on from the programmer doing the work.

Why? Is it really that distracting? (Actually the conspiracy theorist in me suspects it was first done so that really long files could be managed in really poorly written editors... but that's a different rant!)

The programmer is the first and often the ONLY person who can view the code and tell you what it's going to do. He or she is your first AND last line of defense. Why in the name of all things Boolean would you hide details from that person and leave them in the state of "I don't know"?

It was code collapsing that triggered me to write tonight. I've known about it forever, I even know people who use it. I chose not to, because I prefer to understand what I'm working on. But only tonight did I realize it was probably the simplest and most insidious of this deliberate attempt to "dumb down" the act of writing code, and why it was bad.

So why is that? Even if you are not working on a piece of code, the act of reading through it as you skim past can catch bugs. This has happened to me many times - especially in group projects. You see it, it catches your eye, you go "oh my god!" and you fix it, BEFORE the customer finds it.

Or, you collapse all the code except the one little function you've chosen to put your blinders onto, and the bug goes unnoticed until it takes out the internet because nobody ever bothered to look at it, despite hundreds of eyes passing over that function. Yeah, tell me that's never happened. ;)

People: LOOK AT THE CODE. That's your JOB as a programmer, for pete's sake. Keep things simple enough to actually understand, and TEST that code. Don't say "oh, that's Test Group's problem". No, Test Group's job is black box testing - to make sure the ultimate product as a whole passes a reasonable set of tests and does what is REQUIRED. YOUR job, as programmer, is to test every code path you write to ensure it does what you INTENDED. The two are not always the same, that's why you HAVE a different group.

Rant rant rant...

Saturday, September 17, 2016

Atari Jaguar Programming Causes Brain Damage - Confirmation

Well, not necessarily. But after a few hours last night I had a crazy headache that I ended up getting out of bed to take advil for. ;)
 
Anyway, I took a break from work and from my pending TI project, both of which are breaking my self-esteem at alarming rates, to wrap up a support project for my Atari Jaguar cartridge boards. When I first tested them, I attempted to burn a slightly modified version of Tempest (just some text string edits) -- only to find it didn't boot.
 
I realized after thinking it through that Jaguar carts are tested against an MD5 sum (to avoid modification and bad contacts). So the MD5 hash would need to be updated too. That's not unusual, many game systems had a checksum or such to confirm the game would work. The problem on the Jaguar is that the hash is buried in the proprietarily encrypted portion of the boot header - so I'd need to re-encrypt it too.
 
That's not the end of the world... the tools were discovered years ago and are out there. To date I'd used the Atari ST encryption tool to create a "fast boot" header for the Skunkboard... and learned a bit there. One interesting thing about that project was the discovery that the Jaguar CD subverted the boot process - our first pass didn't even work plugged into a Jag CD.
 
Atari Jaguar cartridges start with an encrypted boot header, broken into 65 byte blocks. Each block is encrypted with a full 520-bit key (I kind of wonder if that didn't violate export restrictions back in the mid-90s? I could Google but I won't...). It takes about half a second to decrypt one block, and the normal cartridge has ten of them. The code is decrypted into the GPU, where it is then executed. The code runs an MD5 hash on the cartridge, compares it to the one that was embedded, and if all looks good, it writes the magic value 0x03D0DEAD to the first GPU RAM address and exits. On exit, the BIOS checks for the magic value, and boots the cart if it sees it, or red-screens if it doesn't.
 
There is a small complication in altering this code in that the Encryption tool writes several values to fixed addresses before it encrypts the boot - specifically it stores the MD5 hash, some state information, and the first and last address of the cart. So our Skunkboard boot needed to be tolerant of that (we just left the areas unused).
 
00F035AC: MOVEI   $00F03566,R00    (9800) ; address to pass ROM check
00F035B2: MOVEI   $03D0DEAD,R04    (9804) ; magic value to unlock 68k
00F035B8: JUMP    (R00)            (D000) ; go do it!
00F035BA: NOP                      (E400) ; delay slot
 
That was literally it - we just wrote the magic value and jumped back to the startup code to handle the return. I got the code encrypted by hex-editing the Atari ST program and re-running it.
 
The CD unit works a little differently. Intentionally or just because they could (it's not clear to me why), they leave the GPU busy on a little VLM demo (courtesy of Yak!), and decrypt these blocks into the DSP instead. The DSP is a nearly-identical processor to the GPU, so okay, that's cute. But then the CD BIOS makes several hard-coded fix-ups to absolute addresses in the decrypted code (without checking if it's the code that it expects). Then it jumps past the first part of the decrypted code into a later entry point to do the MD5 hash.
 
So when we updated the Skunkboard boot for the CD unit, first we had to add a second block (because the jump point is past the first block), meaning we went from half a second to a full second boot. Then we had to document and avoid the manual patch areas. Finally, we had to be able to run on both the GPU and the DSP, despite them having different address bases. But, our code was extremely simple, so it wasn't hard to make it fit. We borrowed some Atari code to handle the device independence (although, since we duplicated the work, it ended up not being necessary), and it was fine.
 
 MOVEI #$00FFF000,R1  ; AND mask for address
 MOVEI #$00000EEC,R2  ; Offset to chip control register
 MOVEI #$03D0DEAD,R4  ; magic value for proceeding
 MOVE PC,R0           ; get the PC to determine DSP or GPU
 AND R1,R0            ; Mask out the relevant bits
 STORE R4,(R0)        ; write the code
 SUB R2,R0            ; Get control register (G_CTRL or D_CTRL)
 MOVEQ #0,R3          ; Clear R3 for code below
GAMEOVR:
 JR GAMEOVR           ; wait for it to take effect
 STORE R3,(R0)        ; stop the GPU
 
; Need an offset of $48 - this data is overwritten by the encrypt tool
; with the MD5 sum.
 NOP
 NOP
 MOVEI #$0,R0
 MOVEI #$0,R0
 MOVEI #$0,R0
 MOVEI #$0,R0
 MOVEI #$0,R0
 MOVEI #$0,R0
 
; JagCD entry point (same for now)
Main:
 ; There is a relocation at $4A that we can't touch
 MOVEI #$0,R0         ; dummy value
 ; real boot starts here
 MOVEI #$00FFF000,R1  ; AND mask for address
 MOVEI #$0,R0         ; This movei is hacked by the encryption tool
 MOVEI #$0,R0         ; This movei is hacked by the encryption tool
 MOVEI #$00000EEC,R2  ; Offset to chip control register
 MOVEI #$03D0DEAD,R4  ; magic value for proceeding
 MOVE PC,R0           ; get the PC to determine DSP or GPU
 AND R1,R0            ; Mask out the relevant bits
 STORE R4,(R0)        ; write the code
 SUB R2,R0            ; Get control register (G_CTRL or D_CTRL)
 MOVEQ #0,R3          ; Clear R3 for code below
GAMEOVR2:
 JR GAMEOVR2          ; wait for it to take effect
 STORE R3,(R0)        ; stop the DSP
Despite the extra code for device independence (which, as I noted, was unnecessary in the end), it's still pretty much the same. I actually released a 'makefastboot' tool which would prepend this header to any cart to make it boot in 1 second instead of 5.
 
So, we come back to today. I wanted to update the above tool to not only add the fast boot to my new carts, but to add a simple checksum that could be externally updated, disabled, etc. I figured "how much code can a checksum take? Should fit easily." I updated the patching tool to calculate and write the checksum, as well as let me set the cartridge width and speed parameters all in one.
 
Well... it turns out that when you have only 20-30 bytes available, and a true RISC instruction set (not one of those fully populated instruction sets that people call RISC today because it didn't have 3D acceleration 30 years ago), a checksum function gets a little tight!
 
But, after some tight maneuvering and some lessons from the local contortionist, I got the code to fit. The winning realization was when I realized that I did not need to store the 0x03D0DEAD value manually in the memory, OR test if the checksum worked. I just added that magic value to the checksum in the header itself. That way, all the loop had to do was subtract bytes from the checksum, and if everything was good, it would be left with 0x03D0DEAD. I just wrote the result and exited - the Jaguar BIOS would check if it passed or not! The code itself jumped around a bit, over the hash blocks and into the second area, but it looked good. The CD entry point wasted a few bytes jumping into the GPU entry point, but I was satisfied.
 
Unfortunately, it crashed. And then I ran into the second problem with Atari Jaguar programming... there are no decent debug tools. Especially to debug GPU code that needs to be decrypted before it can even be examined and that the BIOS wipes out of paranoia after success or failure.
 
I dug out an old emulator that I'd used for debugging back in the day, and after a little time poking around I'd found some of the hooks I'd put in to aid with debugging. Fortunately, it included a disassembler, so I hacked in a GPU run-trace that executed when my encrypted code started to execute.
 
When my code started, I found to my surprise that the relative jumps were going all over the place, but certainly not where I intended. After some cursing, head scratching, and pacing, I finally decided to RTFM. Which informed me that relative jumps have a range of -15/+16 words. My jumps were far larger. The MD5 block itself was 20 words long. And a non-relative jump requires an address in a register, meaning that some how I'd need 6 bytes for a long plus the 2 for the jump. Some counting confirmed that was true for all but one of my jumps. Because of all the crap I had to squeeze around, that I didn't even use, I was out of space. Time for a change of approach.
 
I went hunting through my archives and, on the AtariHQ CD, I found source for a PC version of the encryption tool. It took a little more digging to find the private and public keys (and two versions of the public key... I had to just test to see which one was right). But after a little fiddling I had a working version of the tool.
 
With the ability to control the encryption now under my command, I was able to add a mode that did NOT patch the code before encryption. No MD5 hash, no patching the start and stop addresses. Now all I had to worry about were the little fixups that the CD BIOS did. I rewrote the code to not skip over the MD5 hash, meaning the first block was entirely free to use (65 whole bytes! yes!), and had lots of room for the CD entry to load up a register and jump to the same entry point.
 
(Sorry I'm not showing the code bits here... I didn't save the intermediaries. We'll talk about proper use of source control another day...)
 
So I fired it up, and made the emulator disassemble all of GPU RAM when it started, so I could verify that all looked good. I had a bit of trouble with the decryption... although only 65 bytes of each block are /used/, the actual code works on multiples of 32-bits, meaning I needed to preserve 132 bytes of the encrypted data for it to actually work. Anyway, with the dump I was able to compare to the source code and prove that the decryption was working.
 
And it failed. I tried tracing the entire run, but it was a 2 megabyte cartridge, and the debugger output was slowing things down immensely. Finally I tweaked it up to just output the last 10 checksum steps. From that I could see that the result was way off.
 
I did a little inverse math and calculated what the Jaguar thought the checksum should be, and injected that into the header. It booted! Excellent. So now I knew that the checksum code worked.
 
I went back to my patching tool, and poked around trying to figure out why it was failing to generate the same checksum. I thought at first it was off by one (reading one too many or one too few words). Turned out to be off by 8192! How? Well.. there's an 8k overall header on the cart, and so my checksum code just skips over that (it's the decrypted area or unused, so technically already proven). However, when I /calculated/ the checksum, I forgot to skip it. ;) Fixing that fixed the checksum.
 
And now the cart booted! Great! I started packaging it up.. then reluctantly decided I better test the CD entry point. "What could go wrong?" I asked. "It's just a jump."
 
I had previously hacked some really limited support for the CD BIOS into my emulator for this very reason - for proving the Skunkboard. But the DSP code in the emulator was a lot different than the GPU, the author had attempted pipeline emulation and it had less debug help. This took a long time to just get into a state where I could prove it was even RUNNING, let alone what it was doing.
 
Ultimately, though, and back and forth testing with older carts and proven systems, I was able to prove it, and, surprise bloody surprise, it didn't work.
 
This caught me off guard. The Skunkboard boot was still working! So again, disassembled all memory when it started up and had a look-see. What did I see but a huge block of 0xff bytes right in the middle of my code, suspiciously aligned with the MD5 sum that I wasn't using anymore.
 
Yep, that's right, the CD BIOS scribbled over that memory to obfuscate the MD5 sum before jumping to the later entry point. My new entry point which just jumps back to the main code. Never had a problem on the Skunkboard code because both entry points just ran in their own space. %$@$#@.
 
I still had a lot of room in the CD entry area, but after counting bytes, there was not enough to duplicate the whole function. The GPU version of the code used more than 65 bytes, so spilled into the CD area. So what to do...?
 
Ultimately, I noticed that the checksum routine itself was intact, it was just the post-checksum that got overwritten. So, I got clever and implemented the checksum code as a subroutine. The DSP version could call it and then handle it's own post-checksum code, and the GPU version could do the same on its side. It didn't matter if the GPU-specific code was overwritten when the DSP was running, since it'd never be used. Subroutines are a bit messy on the Jaguar RISC, instead of anything traditional you just store the PC in a register. But I was able to use that later for the device independent code (which, again, probably ended up unnecessary, but at this point the headache was starting... ;) )
 
While I was tracing the DSP version to understand the failures, I also realized that I needed to take into account the different address of the DSP RAM versus GPU RAM (they don't use local memory addresses, but system global addresses), and THEN I discovered that they don't even load to the same relative offset within RAM... meaning the code needed to be fully position-independent (except for the one jump.) (You can see the offset in the "SHARED+" line at the CD entry point.)
 
I present to you the final version of the code, which works on both GPU and DSP boots:
 
; Jaguar cart encrypted boot.; We need to deal with two entry points, one for Console and one for CD.
; This uses my custom version of the encryption tool that doesn't overwrite
; huge blocks, so the only patches we need to watch out for are the CD's.
; this frees up a lot of space for code.
; However, the CD overwrites a lot of data, particularly the MD5 hash,
; meaning you can't use that space after all if you want to work with the CD.
; This code works, but I don't necessarily have ALL the black-out areas marked.
; (Note, too, the CD unit runs this code on the DSP, not the GPU.)
.gpu
 .org $00F035AC
; entry point for console
; before we unlock the system, do a quick checksum of the cart
; unlike the official code, we take the data for the checksome
; from unencrypted memory after the boot vector:
;
; 0x400 - boot cart width (original)
; 0x404 - boot cart address (original)
; 0x408 - flags (original)
; 0x410 - start address to sum
; 0x414 - number of 32-bit words to sum
; 0x418 - 32-bit checksum + 0x03D0DEAD (this is important! that's the wakeup value!)
;
; By adding the key to the checksum, and subtracting as we go through
; the cart, we can just write the result to GPU RAM and not spend any code
; on comparing it - the console will do that for us. That really helped this fit.
; Be careful.. SMAC does NOT warn when a JR is out of range. Only got 15 words!
; from here we have 32 bytes until the DSP program blocks out
ENTRY:
 JR gpumode           ; skip over
 nop

SHARED:
 movei #$800410,r14   ; location of data
 load (r14),r15       ; get start
 load (r14+1),r2      ; get count - offset is in longs
 load (r14+2),r8      ; get desired checksum + key
chklp:
 load (r15),r4        ; get long
 subq #1,r2           ; count down
 addqt #4,r15         ; next address
 jr NZ,chklp          ; loop if not done on r2
 sub r4,r8            ; (delay) subtract value from checksum, result should be 0x3D0DEAD at end
 jump (r6)            ; back to caller
 nop
; 30/32
 
; have to break it up, cause the CD unit wipes the hash memory... GPU is okay though
gpumode:
 move pc,r6           ; set up for return
 addq #14,r6
 movei #SHARED,r0
 jump (r0)
 nop

 MOVEI #$00FFF000,R1  ; AND mask for address
 AND R1,R6            ; mask out the relevant bits of PC

 MOVEQ #0,R3          ; Clear R3 for code below

 MOVEI #$00000EEC,R2  ; Offset to chip control register
 STORE R8,(R6)        ; write the code (checksum result, hopefully 3d0dead)
 SUB R2,R6            ; Get control register (G_CTRL or D_CTRL)
GAMEOVR:
 JR GAMEOVR           ; wait for it to take effect
 STORE R3,(R6)        ; stop the GPU/DSP
; 68
 
; .org $f035f4 <-- doesn't work, we'll just pad then
 dc.w $5475,$7273
 
; JagCD entry point
; should be at f035f4 (72/$48)
; we should have 50 bytes here
dspmode:
 ; There is a CD relocation at $4A that we can't touch, the MOVEI covers it
 MOVEI #$12345678,R9  ; this movei is hacked by the CD boot

 MOVE PC,R6           ; prepare for subroutine
 ADDQ #14,R6
 MOVEI #SHARED+$180B8,R0  ; prepare for long jump (DSP offset included)
 JUMP (R0)            ; go do it
 NOP                  ; delay slot

 MOVEI #$00FFF000,R1  ; AND mask for address
 AND R1,R6            ; mask out the relevant bits of PC

 MOVEQ #0,R3          ; Clear R3 for code below

 MOVEI #$00000EEC,R2  ; Offset to chip control register
 STORE R8,(R6)        ; write the code (checksum result, hopefully 3d0dead)
 SUB R2,R6            ; Get control register (G_CTRL or D_CTRL)
DGAMEOVR:
 JR DGAMEOVR          ; wait for it to take effect
 STORE R3,(R6)        ; stop the GPU/DSP
; 44/50

 END
 
Anyway... it's nice to have it finally done. I'll post a comment below when I get it up on Github, I'm late for a meeting right now. ;)
 

Saturday, September 3, 2016

Carts Resolved

As I posted on my Twitter... the issue was that the Chinese handheld clone I use (a 'PocketGame') was not compatible with Ecco Jr or Tides of Time, leading me to believe I'd made an error.
 
I measured all the voltages for all three games, and they were correct.
 
Then I desoldered the banking chips and manually jumpered the games, again, the same result.
 
I fixed my hack of Gens to support the new scheme - the code all worked fine (on the plus side I fixed some sprite corruption I'd seen - the VDP DMA didn't take banking into account.)
 
I assumed bad burn and burnt new EPROMs. This time I tried the first one in my test cart, and when Ecco Jr again failed, I got very suspicious. I went and got the original PCB for Ecco Jr, which I had thought defective (and was the catalyst for me to do all this...) It didn't work in the handheld.
 
So, I went and got my real Genesis. Then I needed to find a screen that took S-Video, since my Genesis is modded (I /like/ S-Video, damn it!) Turns out I only own one, but that was enough. And Ecco Jr worked fine there. Tried my test cart, worked.
 
Since I had new EPROMs burned, I repaired the first cart AND finished a second. I then put Ecco Jr in a new shell (I had destroyed the original label by covering it with my 3-way label, so that's disappointing. Hopefully I can download a nice copy of the label and reprint it). Both carts are working fine... and I am done for the moment.
 
If anyone wants a Thunder Force or an Ecco, I have three TF and 1 Ecco spare... they're $30 plus shipping to cover my costs. I can also make Jaguar repros since I have some spare PCBs there, but those will cost a bit more (because Jaguar) and you need to provide your own shells.
 
 
 

Friday, September 2, 2016

Carts, carts, carts...

Might have gone a little overboard here, but maybe that's good. I tried some new things and they worked out...
 
First off I went back and made a label for the Thunder Force III Multicart I made a while back. This was my labor of love to the series - full of patches to unlock everything I wanted from the series. ;) It was my first cartridge PCB from scratch, and it worked okay. I still have a few to get rid of though...
 
 
After that, a guy online followed my guide and created a Shinobi cart. He did a great job on the menu, and gave me a copy to help finish it. He wanted to use 4MB EPROMs instead of 2MB, so I modified the board for him. It still takes 2, but the whole multicart fits in just one now, so it's a bit cheaper to build, too.
 
 
 
But, after building that test cart, I saw a few things that still were not quite perfect, like the top left IC still bumping the posts inside the cart... so I tweaked that up. And while tweaking it up, I got to thinking that the one other series I wanted to do was Ecco the Dolphin. The only issue with Ecco... two of the games are 1MB, like my cart supports, but one of them is 2MB.
 
After some thought, I think I came up with a design that puts the second game on the second chip, so it can have the larger range, while keeping the first two on the first chip. Because I'm dumb, I decided to use 2MB chips. ;) (But, I used my last one, so one is a 4MB chip doubled up anyway). I went ahead and modified the PCB, but, I didn't order any at the moment. I just used jumpers for the build...
 
 
 
That's still burning, which is why I'm not in bed yet! Almost. ;) But the new layout lets me jumper for 2MB or 4MB chips (although Ebay suggests there's no reason to get 2MB anymore)... it also lets me jumper the second chip for 1MB or 2MB banks, so my Ecco cart works more easily. ;)
 
Ecco was a software challenge too, though. Since I needed to fit two 1MB games in the first 2MB chip, I had to figure out the menu software. Turns out Ecco 1 has about 34k of unused space at the end of the cart. A little compression wasn't quite enough to get my 44k menu down, but stripping out a lot of the unused BASIC runtime was. ;)
 
Finally... earlier this week I put together a Jaguar cartridge PCB. It was the first time I did a shape, and the first task for my new calipers. ;) I made a few small mistakes, but nothing too critical, and it seems to work fine. Games normally run 32-bits wide on the Jaguar, but changing the header lets most run at 16 bits (as we learned on the Skunkboard). For my test cartridge I forgot to increase the configured bus speed to make up for the narrower width, but that didn't seem to impact it anyway. I have some sockets coming to make testing easier.
 
Anyway, I really disliked the original layout, and since I needed to fix it anyway, I redrew the lines a little more curvy, cause curvy PCBs are rare and fun. I also added the missing save EEPROM, since it turns out those are not too hard to get after all. This cart also jumpers for 2MB or 4MB, but since that's the size of a single Jaguar game I never intended multicart on this one. Just cheap single carts. ;)
 
 
Parts are funny... people sell empty Jag shells for $3 to $4 each. But for the Genesis, just grab entire boxes of sports titles for 25 cents a pop (as a bonus, a lot of sports games have 32k RAM chips I can put into my TIs. ;) ).
 
Well... picture layout isn't very good here, but at least it's easy. Looks like the burn is done, let me go finish that cart and see what happens.
 
 

Saturday, August 13, 2016

Tonight's project - Voice

After messing around all day with finicky CAN hardware, it was nice to settle back and finish off a project I've had too little time for.
 
Well, I say finish, but there's still lots I /want/ to do with it. The important thing is that this is all I /need/ to do with it. ;)
 
Some time ago over in the ColecoVision forums at AtariAge I learned the user artrag had created a voice converter for the MSX that played back voice samples at 60Hz (so, digitized voice without needing to freeze the game). It sounded really good, and knowing the MSX audio hardware was similar, I reached out to him to see if we could port it.
 
He was nice enough to adapt the code, a MATLAB script, for our sound chip. We lose two channels in the process, which is too bad because it makes an audible difference. But most voice clips sound decent, especially if they are clear with no background noise. It's certainly understandable.
 
So tonight I finally finished the ColecoVision playback code, ran the Coleco and TI code through some tests, wrote up a VGM converter (to work with my VGM compressor, of course), and wrote up some basic documentation on how to use it.
 
That's all up at my web page, though I'm still waiting on the OK to post it at AA. ;) http://harmlesslion.com/software/artvoice
 
Of course, there's also the YouTube video, giving a nice (unintentional) cross section of good and bad samples.
 
The MATLAB requirement is the hardest part... it slows it down a bit, and bloats it a lot, and requires a 370MB runtime (and the exact specific version of runtime, too, which bit me). I started porting it to C, but there are a few functions I need to get past (in addition to needing time). But I think it will be worth it.
 
Such a tool might also finally be the stepping stone we need to get some modern software for converting to the Speech Synthesizer (maybe... not sure exactly how the coded parameters work on that yet. Brute force might still be easier. ;) ).
 
The one improvement this tool could use is noise detection, so I hope to play with that sometime too. Right now it doesn't try. 'T' sounds are okay, but longer hisses like 'S', or even sound effects like booms, turn into random frequencies instead of playing as noise (and maybe being able to grab actual tones on top of it). But this is non-trivial, I tried a number of tricks in my own converter months ago. My converter got close, you could hear high pitched voices like Pinkie's, but I never solved the noise floor detection and eventually decided maybe it just wasn't going to work with three voices. (Testing had suggested more worked better). That's why I was pleased that this tool actually worked. :)
 
Anyway, I hope to use this for something... if not, it's something I always thought /should/ be possible! So it's good to see!
 

Friday, August 12, 2016

32k TI Carts

The folks over at AtariAge are going wild over 32kb cartridges, thanks to the "FlashROM99", which is a RAM-based cartridge that loads ROMs up to 32kb from SD card. One of the members attempted to convert my old Super Space Acer game (http://harmlesslion.com/sofware/space), but failed as the utility they used wasn't able to deal with numerous requirements of the program, like a loader that it required, or data files it used.

I decided to go ahead and take a stab at it myself... I found that there was a lot involved:

-The main program was 22.4k
-The sprite data was 3.75k
-The end program was 5k
-The title picture was 12k
-And the runtime library was 1.75kb

This was not going to fit in 32k, obviously, so the first task was compression. Some time ago I extracted the compression code from my VGM compressor (http://harmlesslion.com/software/vgm) in order to see if it worked standalone (and it did, though nowhere near as well as zip. However, it's much faster). I did a quick test, and found it could compress everything down from 45k to 27k. That seemed good enough.

After packing the files, I laid out my intention for the cartridge:

Every bank has a header through to >6040

Bank0 - >6000
    0040 boot, unpacker (1k)
    0440 SSE.pack to >A000, >13FA bytes
    14C0 SSD.pack to VDP >0800, >0F00 bytes

Bank1 - >6002
    0040 ssa.pack to >A000, >1FFA bytes
    14C0 acer_c.pack to VDP >2000, >1800 bytes
    1840  demq.pack to >2000, >066C bytes

Bank2 - >6004
    0040  ssb.pack to >BFFA, >1FFA bytes

Bank3 - >6006
    0040 ssc.pack to >DFF4, >1BFA bytes
    1330 acer_p.pack to VDP >0000, >1800 bytes

With the layout in place, I was able to create a quick script that would copy the files and pad as appropriate for the above structure. Then I wrote a quick set of functions to unpack each file into the correct place that it went.

The main catch, however, was dealing with the dynamically loaded files. I realized that all the loads ran through the runtime, a function called DSRLNK. I wrote a little replacement function that checked the calling address and then simply unpacked the appropriate data directly into memory, then returned. Since all but one of the files load at startup, it was pretty easy.

Getting the end loader was slightly trickier, but I just had to enable cheats, then enable overdrive on the emulator, and then get to the end of the game. ;)

 AORG >22B2

* on entry, WP is >209A
* copy return address to our workspace
  MOV R14,@>8300
  LWPI >8300
 
* now figure out who to call - there are 4 options
* R14    File    Vector
*-----------------------
* A830 - SSD     2702
* DA0E - SSE     2704
* A656 - ACER_C  2706
* A638 - ACER_P  2708

 LI R1,>2702
 LI R2,TABLE
LP
  MOV *R2+,R3
  JEQ DONE
  C R3,R0
  JEQ MATCH
  INCT R1
  JMP LP
 
MATCH
  MOV *R1,R0
  BL *R0
* back to caller, remember to increment R14
DONE
  LWPI >209A
  INCT R14
  RTWP
TABLE
  DATA >A830,>DA0E,>A656,>A638,>0000
 
In order for the unpacker to be able to run while switching banks on the cartridge, it needs to load to RAM. So the cartridge header (which is on all banks to ensure any startup state is okay) starts by copying the unpacker to RAM, then branching to it. The unpacker then unpacks the main program and the runtime, and jumps to THAT. The main program then requests the title screen and the sprites, and these requests are redirected to the appropriate routines in the unpack code.

Works pretty well, overall! and now it's time for bed. ;)

http://atariage.com/forums/topic/253095-flashrom-99-image-repository-8102016/?p=3570209