Saturday, August 13, 2016

Tonight's project - Voice

After messing around all day with finicky CAN hardware, it was nice to settle back and finish off a project I've had too little time for.
Well, I say finish, but there's still lots I /want/ to do with it. The important thing is that this is all I /need/ to do with it. ;)
Some time ago over in the ColecoVision forums at AtariAge I learned the user artrag had created a voice converter for the MSX that played back voice samples at 60Hz (so, digitized voice without needing to freeze the game). It sounded really good, and knowing the MSX audio hardware was similar, I reached out to him to see if we could port it.
He was nice enough to adapt the code, a MATLAB script, for our sound chip. We lose two channels in the process, which is too bad because it makes an audible difference. But most voice clips sound decent, especially if they are clear with no background noise. It's certainly understandable.
So tonight I finally finished the ColecoVision playback code, ran the Coleco and TI code through some tests, wrote up a VGM converter (to work with my VGM compressor, of course), and wrote up some basic documentation on how to use it.
That's all up at my web page, though I'm still waiting on the OK to post it at AA. ;)
Of course, there's also the YouTube video, giving a nice (unintentional) cross section of good and bad samples.
The MATLAB requirement is the hardest part... it slows it down a bit, and bloats it a lot, and requires a 370MB runtime (and the exact specific version of runtime, too, which bit me). I started porting it to C, but there are a few functions I need to get past (in addition to needing time). But I think it will be worth it.
Such a tool might also finally be the stepping stone we need to get some modern software for converting to the Speech Synthesizer (maybe... not sure exactly how the coded parameters work on that yet. Brute force might still be easier. ;) ).
The one improvement this tool could use is noise detection, so I hope to play with that sometime too. Right now it doesn't try. 'T' sounds are okay, but longer hisses like 'S', or even sound effects like booms, turn into random frequencies instead of playing as noise (and maybe being able to grab actual tones on top of it). But this is non-trivial, I tried a number of tricks in my own converter months ago. My converter got close, you could hear high pitched voices like Pinkie's, but I never solved the noise floor detection and eventually decided maybe it just wasn't going to work with three voices. (Testing had suggested more worked better). That's why I was pleased that this tool actually worked. :)
Anyway, I hope to use this for something... if not, it's something I always thought /should/ be possible! So it's good to see!

No comments:

Post a Comment