Monday, October 12, 2020

VGMComp2 - Looking Back

 Many years ago, I undertook a project to come up with a simple compression format for music files on the TI-99/4A. My goals were simple, and somewhat selfish. There was a music format called VGM that supported the chip, and music from platforms that used it, like the Sega Master System, was easily obtainable. However, the files recorded every write to the sound chip along with timing information, and tended to be very large.

I built a system that stripped out the channel-specific data and moved all timing to separate streams - thus this four channel sound chip now had 12 streams of data: tone, volume, and timing. With all the streams looking the same, I implemented a combination of RLE and string compression and got them down to a reasonable size. There were a number of hacks for special cases I noticed, but ultimately it was working well enough to release. It was, in fact, used in a number of games and even a demo for the TI, so it was a success.

But it always bothered me. Why did I need the hacks? Why did it use so much CPU time? Could I do better? I spent a fair bit of time, on and off, coming up with ways to improve it. And finally, I convinced myself that I could. The new scheme was similar, but reduced the four time streams to just one, and changed out one of the lessor-used compression flags for a different idea. My thinking was that even if all else was equal, going from 12 streams down to 9 would buy me 25% CPU back.

But it didn't. In fact, the new playback code barely performed as well as the old. Even after recoding it in assembly, and heavy optimization, it was still reporting only about 10% better CPU usage than the old one. It took a lot of debugging to understand why, and what I finally realized was that the old format was simply better at determining when NO work was needed - it simply checked the four time streams. The new format needed to check the timestream and the four volume channels. This means that the best case (no work at all) was slightly faster on the old player than the new one. But the new one was markedly better in the worst case (all channels need work), just because the actual work per channel was simplified some.

Compression itself didn't really give me the wins I hoped for either. After creating specific test cases and walking through each decompress case (and so debugging them), compression was better, but not amazingly so. The best cases, true, were about 25% smaller than the old compressor, but the worst cases were pretty much on par, and that only with the most rigorous searches.

What I finally had to admit to myself, in both cases, was that the years of hacks and tricks and outright robberies in the original compressor had created something that was pretty hard to beat. But, it was also impossible to maintain, rather locked in the features it could support, and most importantly, I did beat it. Maybe not by much, but 10% on a slow computer is not a bad win.

And that, really, was something else I had to admit to myself. The TI is a slow computer. Even back in the day it was not terribly speedy. I tend to forget sometimes, working on my 3GHz computer that the 3MHz clock of the TI is a thousand times slower than my modern PC. And that's ignoring all the speedups that modern computers enjoy. (It's kind of a shame how much of that power modern OS's steal, but I guess that's a different rant.) Anyway, the point is that even writing all 8 registers on the sound chip every frame takes almost 1% of the system's CPU. And that's just writing the same value to all of them. That I can decompress and playback complex music in an average of 10-20% CPU is maybe not as awful as I felt when I first realized it.

There's of course another advantage to this new version. It was a goal to also support the second sound chip used in the ColecoVision Phoenix - the AY-8910. Borrowed from the MSX to make porting games from it simpler, this became a standard of sorts in the Coleco SGM add-on from OpCode, and so supporting it, at least in a casual manner, seemed worthwhile. This goal expanded when a member of the TI community announced that he'd be ressurecting the SID Blaster - a SID add-on card for the TI-99/4A. So, I made the toolchain support both of these chips -- although I cheated. A lot.

In the case of the AY, it wasn't so bad. I just ignored the envelope generator and treated it like another SN with a limited noise channel but better frequency range. The SID was trickier. I still did the same abuse - I ignored the envelope generator and treated it like another SN, but with only three channels. Unfortunately, the SID required some trickery because the envelope generator was necessary to set the volume. Fortunately for me, the trickery appeared to work. ;)

I have to admit that I'm not convinced that using both chips together will be acceptable, performance wise. 20% doesn't sound bad -- but that's on average. If both chips experience a full load on the same frame, it could be more than double that. On the other hand, if you can get away with running the tunes at 30hz and alternate the sound chips, that would be fine. That would likely be what I'd do.

Anyway, there was yet one more goal, and that was a robust set of tools to surround the new players. In the end, I created nearly 50 separate tools. And being very silly, many of them look Windows specific (but they are all just console apps and will port trivially, someday). But we have player libraries for the ColecoVision and the TI, a reference player for the PC, a dozen sample applications, 10 audio conversion tools (including from complex sources such as MOD and YM2612), and over 20 simple tools for shaping and manipulating the intermediate sound data. I have no doubt it's very intimidating, but short of tracking the data yourself (which, frankly, is a better route than converting), I believe there's no better toolset for getting a tune playing on this hardware.

Of course, if you can track it yourself, you can still use this toolset to get from tracker to hardware. ;)

I do intend to use this going forward, of course. The first user will probably be Super Space Acer, as that's near the top of my list (Classic99 is ahead of it). Though that game is nearly done, it will benefit from the improvements, and I need to finish it and port it around. With luck, once people have a chance to figure out the new process, they'll use it as well. I'll have to do some videos.

Anyway, the toolset is up at Github, and eventually on my website too, once I get that updated. 

https://github.com/tursilion/vgmcomp2

(BTW: I very, very, very rarely log into the Github website. Using the ticket system and sending me notes there is all well and good, but generally I just push my project and move on. That's why I use Git in the first place, because SIMPLE. My point is - expect turnaround times to be really slow if that's how you reach out to me. I'm not ignoring you. I just haven't seen it yet. I say this because logging in to get the URL there, I noticed some stuff waiting for me. ;) )