Sunday, November 4, 2018

One step forward, two steps back

I spent some time on the video player tonight, finally going in to make the timing adjustments that I've been putting off for so long. After hours of cycle counting I finally got it going, and instead of 10% slow, it was running about 3% slow. Unfortunately the main copy loop is as tight as it's going to get, so I had to decide whether that was acceptable or not.

Finally, as an experiment, I did the math. The video was converted for a frame rate of 8.9132 fps, based on the theoretical loop times. Of course, since doing that, I learned that the sound chip halts the TI's CPU when writes happen, and writes happen pretty often. The difference only brought it down to 8.6458 fps, so I decided to see how that looked.

It took a bit of tweaking to update my toolchains, but I got the old Spaceballs video converted again, and then I needed to find a working console with an original video chip, because I hadn't ported the F18A playback code yet. That also took some doing...

But finally I was able to run it up. Both in Classic99 and on real hardware, it sounded almost perfect and the runtime was close enough that I couldn't differentiate from the source video with a manual stopwatch. Close enough for government work!

It was also interesting to compare to the old video - it looked a lot better. The converter updates over the last few years made a difference. I guess I'll go post a YouTube capture...

Of course, the audio is running at 13khz, and one might say that it doesn't need to run that high a frequency - why not turn down the audio and get back some video performance? I think that the eyes are much more forgiving than the ears, and the higher frequency playback helps downplay some of the many things against one playing back digital audio on the TI. It sounds pretty good, even on hardware, although it's fairly soft. People WILL have to turn up their volume a bit to hear it.

The difference of 1/4 of a frame per second seems to be worth it... it was great to hear the audio at the correct pitch. But it means I'll have to re-encode the Dragon's Lair video again, and re-sequence all the frames again, because the indexes will change. At least I got that extra daylight savings time hour!


Saturday, October 27, 2018

Dragon's Lair for the TI-99/4A

Talking to a friend of mine who is not into tech or games about this project made me consider why it is worth doing. After all, it's mostly grunt work at this point, and by modern standards, there's not a lot of gameplay there. So why?

Well, it comes down largely to "what might have been", and an almost-nostagia. I'm not sure what it would be called - wish fulfillment, maybe? To see that something widely considered impossible was actually possible after all. To see the machine do something, 40 years later, that it never did before.

Dragon's Lair was released in the summer of 1983 to arcades, and it was unlike anything ever seen before. Most arcade games were still 8-bit systems with custom hardware for every game, low resolution displays with limited colors, small, 4-color sprites, and beeps and boops were still very much the norm. If you were lucky, you got some parallax scrolling (well, if you were lucky enough to scroll).

Then in strides Dirk the Daring, full screen, fully-animated by Don Bluth's new studio (himself fresh out of Disney Animation!), challenging you to help him through beautifully drawn (and often creepy) dungeons and traps to save the Princess Daphne (who is teased to be more than a little attractive) from an enormous and clever dragon. It has also been widely credited with being the first game to cost 50 cents a play rather than 25 cents. (It was the first one I saw.)

The play style was brand new too. The video would play out, and at an appropriate point, you'd be cued to make an input. You had a four way joystick and a sword button, and at any given point only one input would be correct. Get it right, and Dirk would continue. Get it wrong - or respond too slowly - and Dirk would suffer a horrible, sometimes amusing, fully animated death.

Or you could play Nintendo's Mario Bros (which was a very good game, don't get me wrong).

             

In fact, let's see what else came out in '83, and these are also amazingly good games.

Vector games were state of the art and the best 3D available, and nothing beat the crisp lines, excellent license, and straight-up thrills of Atari's Star Wars:

Image result for star wars arcade vector

Big sprites were also starting to happen, and Nintendo's Punch-Out shone in that respect, even if the characters were heavily scaled up to get there (using two monitors to enhance the experience even more - and my screenshot is pretty heavily scaled too):

Image result for punch out arcade

But, the 3D vector games were sparse (and hidden-line removal was still years off), and the bitmap games were limited by memory to how many frames of animation they could manage (and bigger graphics took more memory, but even Punch Out with its large characters and multiple opponents was less than 300K!)

Again, Dragon's Lair blew all that away. Nearly every frame was unique - very little was repeated. There was a good 20 minutes of video there (I can't find the exact count, but that's based on my captures), and at a quality level that no computer of the day could replicate. How was it possible? Laserdisc.

The Laserdisc was a video format released for movies in 1978, competing largely with VHS video tape (released in 1976). It looked very much like a pizza-sized DVD. Like VHS, it recorded NTSC composite video frames for playback (apologies for ignoring PAL and other formats). However, the laser pickup allowed a much higher bandwidth, roughly doubling the resolution of the stored image. As a result, Laserdisc was the best quality home picture you could get, and it enjoyed fairly good support.

However, it was much more expensive than the tape machines (BetaMax was in there too), and couldn't record, leading to its eventual consideration as a failed product. (Though I don't agree it was, you don't have to be number one to succeed. LaserDisc had a long, solid run with many vendors involved.)

Anyway, back to 1983, and Cinematronics, the company who had used LaserDisc to create an interactive storybook named The Secrets of the Lost Woods, decided that they could use animation to better sell their idea. Reportedly inspired by The Secret of NIMH, they hired Don Bluth's studio to do the animation. The computer would simply tell the LaserDisc player which frames to play, and since the disc could hold a good 45 minutes of video, the memory cost was minimal. (Of course, the program was still stored in ROM chips - loading code from an analog LaserDisc wasn't something anyone wanted to try. At least not until Sony did it in 1984. But Dragon's Lair's code was miniscule - only 32k.)

Dragon's Lair was certainly the first popular LaserDisc game, but it actually wasn't the first! Sega's Astron Belt had released just a year previously. This was a somewhat confusing game that overlaid a 3rd person space shooter overtop of clips from various sci-fi movies. Every so often you'd shoot the right area and the entire screen would play back an explosion, or you'd be in the wrong place and die. I played it a few times and liked it, but never really understood it.

Image result for astron belt

(It is perhaps interesting to notice how sharp the computer graphics are overtop of the LaserDisc's composite video playing back B-grade sci-fi - but this WAS high quality in 1982.)

Anyway, the concept of porting Dragon's Lair to the home computers of the time was laughable. The average machine had 32k of memory, 4-16 color graphics, took minutes to load software, and could barely draw a frame of video. However, nobody denied the massive impact of Dragon's Lair - even if later LaserDisc games proved to be far less successful. So people created games inspired by Dragon's Lair, and we ended up with games like this on the Coleco Adam and Commodore 64:

Image result for dragon's lair adamImage result for dragon's lair c64

They actually look pretty good, but in fairness these titles looked better on the box than in motion. And yet the base appeal of Dragon's Lair was so strong you wanted to play them ANYWAY. The ultimate of that direction was probably the NES version, which wanted smooth animation above all else, including gameplay:

Image result for dragon's lair nes

Coleco, in fairness, actually talked about the REAL Dragon's Lair in the home. The computer requirements of the game were actually pretty small - ANY machine of the day could have handled it. All that would be needed is an interface to a LaserDisc player with a remote control input (and many had one!) But, in the end this didn't happen, since it was considered unlikely that enough households would have a LaserDisc player AND a ColecoVision AND buy the LaserDisc adapter AND want to hook it to their LaserDisc player all for just one game with no expectation of any more. It was probably the right answer, as awesome as it would have been.

We moved quickly to the 16-bit era just a couple of years later, and suddenly motion video wasn't quite as laughable as it used to be. The 16 bit machines had fast disk drives, 512-1024K of memory, and more than enough power to update the entire screen quickly. The Atari ST and Amiga (and even PC, though I never saw that one) notably got diskette-based releases. Sure, lots was redrawn to save memory, more than half the game was missing, and you had to swap disks a lot, but you got the Dragon's Lair experience at home, and in just 4-5MB.

Image result for dragon's lair amiga


When CD hit gaming, things changed a lot. Suddenly computers and video game systems had access to 640MB of memory - much more than ever before. The PC, CD-I and Sega CD got reasonably complete ports of the direct LaserDisc video game (although the Sega CD version suffers from the Genesis' limited 64-color screen, it's not much worse than the VGA):

Image result for dragon's lair VGA CDImage result for dragon's lair CD

This was finally real, and the next advance was the release of DVD, which finally made the full quality video playable -- in some releases even on a standard DVD player, as DVD (with care given to the mastering) can match the quality of the original LaserDisc, or even exceed it. By this point, though, computers and video game systems were all more than capable of playing back the original quality video, digitized and compressed into the favorite format of the day. One advantage of playing it from digital video was that there was no need for black screens between scenes, which on the original arcade was caused by the LaserDisc seeking for the next video clip. A HD-remastered version was released for the HD era, and Dirk had never looked so sharp.

Image result for dragon's lair hd

The TI-99/4, my target for this project, was released in 1979. It was updated in 1981 to the version that makes this possible (the 4A), and again in 1983 to piss off Atari by locking them out. It was then cancelled in 1984 because it had no good games. In the base 4A unit, it contains a 16-bit missile control system TMS9900NL running at 3MHz, 256 bytes of RAM, 26K of ROM, a 4 channel tone generator, and a video chip with a 256x192 resolution in 15 colors and its own 16K of video RAM.

Technology has advanced significantly since 1979, and very large memories are now possible. In addition, computers are markedly more capable. My current PC is a relatively obsolete 16-core 64-bit Xeon running at 2.6GHz natively, numerous TB of disk (which essentially takes the place of the TI's ROM), 128GB of RAM, a more-or-less unlimited sound card, and a graphics card that can push 3840x2160 in 32-bit color several times over and has its own 2GB of video RAM

So it is of course my pleasure to put the new machine to work producing output for the older one - in this case, I spent years tuning a brute-force image converter that outputs acceptable images for the TI from nearly any source image (although due to playback performance, the TI is only actually playing at 1/2 its own resolution):



I also found a single-chip 128MB flash which is more than enough to hold the roughly 65MB of output frames. They have to be uncompressed and directly accessible because the TI is simply not fast enough to handle streaming or decompression, it needs to just copy bytes to the video RAM and sound chip as fast as it can. Even at half resolution, as fast as it can only works out to about 9 fps. But the important thing is -- it works out. This means Dragon's Lair is possible.

As so we come back around. Wouldn't it be awesome to see a 'wouldn't it have been awesome if' come true? Dragon's Lair, in a cartridge for the TI-99/4A, just like might have been possible in 1984 (I'm saying 6 months after the arcade came out).

I wanted to do it right, so I have a license from Digital Leisure to produce and release the title. Work has made it difficult to put the pieces in place, and I am running out of time, but it's really close to being real now. Close enough that I'm procrastinating by writing this instead of working on the code. ;) But it's looking really likely.

Which really leaves the final question, and the one that I secretly started this article to talk about. Was it really feasible in 1984? Well, let's look at the challenges I've faced dealing with it today.

First the video. Could we even digitize it? ComputerEyes (Digital Vision) was on the market for the Apple 2 in 1984, although the quality was far below what we wanted. I seem to remember a magazine article of the day that discussed a very cheap video capture circuit - provided you had a stable video signal. LaserDisc could do that (since each video frame had its own track, pauses were crystal clear). It's hard to find much information today about the history of digitizing video, but the concepts were understood well enough and consumer products were beginning to emerge - let's say it would have been tricky but possible. I can say this much - to capture in color would have taken a fair bit of time - even years later capturing actual moving video was still very, very expensive. But capturing stills - there are lots of ways to trade cost for speed. But let's say you could capture a frame and save it off in an average of 3 seconds (which I think is really generous). The playback rate of the TI video resulted in 10,855 frames of the original game. So, capture would take about 9 hours, which is pretty good.

Also pretty unlikely, IMO, since I did a still-frame capture of a LaserDisc in 2000 and it took about 4 hours per 45 minutes on a machine of that era. But let's remain generous - you could definitely have captured the frames.

And by the way, we didn't really have any true color graphics formats of note back then, so we're probably storing as uncompressed TIFF, at about 144k per frame, or 1.5GB of data in the end. That would have cost about $420,000 in 1984 dollars (about $900,000 today) and been impossible to attach anyway. Or maybe we were smart and decided GIF, the 256 color format released by Compuserve in 1987, was going to be the way to go and invented it before we started. Average frame compresses down to about 37k and still has enough data for the conversion. So hard disk space is reduced to 400MB, or only about $112,000.

The final option, and probably the only feasible one, is to convert each image as it's captured. The resulting frames are exactly 6k in size, and while you could compress at this point let's assume not. In that case, the resulting data will take 65MB, for about $18,000. You might even be able to attach it. the PC/AT itself was about $4000, for comparison.

(Storage prices: http://www.mkomo.com/cost-per-gigabyte)

Conversion is a pretty big deal - getting the RGB image dithered into the TI's limitations is a challenge even on my modern machine, which needs 3-5 seconds per frame. But let's start by eliminating all the options and sticking to the simpler dither, and say that in the ideal case my modern machine needs 1s per frame. It's a single-thread conversion, so the cores don't matter, but we're still running at 2.6GHz (over 3GHz, when it's busy). In 1984, PCs were clocking around 6MHz and running the brand-new 80286 processor. It's comparing Apco's to Orange's given the massive internal changes in CPU design, but let's assume clocks to clocks and say my machine is 433 times faster than that machine, meaning each frame would need 433 seconds to be converted. (And the programming wizard who pulls that off will get my hat. The truth is that a single core on this machine is probably a good thousand times faster at a particular task than that AT.)

433 seconds, times 10,855 frames, means that working full-tilt, the AT should complete the conversion in about 55 days. Better put a UPS on that bad boy.

So we've got the development system at about $22,000 (plus whatever hackery we need to get it capturing). You'd also need to buy the arcade game to get the disc and the LaserDisc player, that was about $4000 (although a replacement player and disc was about $1000 - let's go with that number). Now let's talk about the cartridge. (Box and manual has not changed much over time, and I don't care to investigate that right this moment).

I'm using a 128MB flash chip, but back in 1984 every kilobyte counted. Let's restrict the cartridge to the 65MB we actually need (although in truth, almost anyone building such a thing back then would probably discard the roughly 20% of video that makes up the mirrored scenes in order to save memory, or even build hardware flipping support to save that money. As we're about to see, megabytes were expensive. But I'll keep the 65MB number).

It's hard to get actual prices for ROM, but RAM is here: https://jcmit.net/memoryprice.htm. Here we see at the start of 1984 that RAM was about $1300/MB. Numerous hunts for mask ROM (the cheapest, if in quantity) gave no actual numbers, but a number of price quotes suggest a range of 1/2 to 1/5th the cost. Let's go with 1/4 the cost and then we can say about $325/MB. 65MB, then, would cost $21,000 (in quantity). We'd better make sure the manual is really nicely printed.

Of course, with the $21,000 worth of ROM chips you also have the small problem of space, and the slightly larger problem of power. It's going to be pretty big, so it's probably going to plug into the wall of its own accord and sit in a box behind your console while you plug a ribbon cable into the cartridge port. It costs more than your car, so it might as well.

I don't know if I'm allowed to talk about the price I paid for license, probably not, but I can tell you right now that in 1984 Dragon's Lair was the hot property. You're not going to walk away with a cheap license here. I wonder what the 8-bit ports paid? There's at least one quote that suggests Coleco paid as much as two million dollars for it. In 1984. Even Dr Evil couldn't afford that.

So, yes, technically, very technically. You'd need to win the Lottery to afford the license, you'll need about $25,000 in 1984 dollars to assemble the development gear, and two months just to prepare the data (if nothing goes wrong). Your final product will cost over $21,000 to assemble (each) and most households will probably brown out when they plug it in. But.. yes, it could have been done.







Monday, October 22, 2018

Use your tools

Why is it, when software is more accessible than ever, and you have the most powerful software creation tools in the history of mankind available to you, for free, you still release software that builds with warnings?

Thursday, October 11, 2018

I'm Doing it Wrong: Learning New Skills

We all need to learn at some point in our lives. If we're computer people, we pretty much need to learn all our lives.

Some people read, some take classes, increasingly over the years, I hit my head against a problem over and over again until the problem or my head breaks. I call this "being stupid". But it's made my head pretty hard.

Most recently, I was finally doing my first VHDL, for the Dragon's Lair cartridge that I've previously mentioned. I was pretty pleased that my understanding of the basics got the basic bank switching latch working pretty easily. And then someone on Twitter asked whether it would support GROM too, so that the cartridge would work on a 2.2 console.

I didn't see why, after all, nobody still willingly uses a 2.2 console, which was notorious in the TI-99 community as being TI's effort to prevent AtariSoft cartridges from working with their machines. Literally, they changed the machine to lockout AtariSoft. Since AtariSoft made some of the best games, and this move made them drop the TI-99 entirely, people were not impressed.

But this person noted that they grew up on a 2.2 console, and were always annoyed that none of the good games worked on their machine. Thus, seeing this new licensed title work on a v2.2 would just feel good. Given that nostalgia is the only sensible reason to still be messing with these computers 40 years after they were released, I had to agree.

I chose a really small CPLD - in fact the smallest I could get away with, so it was a tight fit. Eventually I did get reading the GROM bus, and an internal 8-bit address counter which would provide 256 bytes of GROM -- more than enough to bootstrap the ROM part of the cartridge. However, every time I put the address increment into the design, it would work in simulation but lock up the TI. Sometimes immediately, sometimes at random, but it didn't work.

As I noted - I am brand new to VHDL. I actually HAVE read a book cover to cover (a couple of times - and the smart version of me would give you the title as it was really nice, but this is the dumb version of me. Ask in a comment and I'll look it up later). But practice is different than reality. Further, searching Google for VHDL questions was even worse than searching Google for C questions. There were fewer hits and the answers were typically even less useful (though still usually of the 'why do you want to do that?' vein...)

But I was stuck. Now that I was this close, I knew it was possible and I couldn't release the idea. I WOULD have GROM support. I was prepared to go as low as 16 bytes if I needed to (I was pretty sure less than that wouldn't be enough for the header and startup code).

When you get stuck, it's good to look at things from a different point of view. I thought to myself, "I have GROM working if I don't increment. And I've always told anyone foolish enough to listen that the GPL interpreter doesn't rely on auto-increment anyway, it sets the address for every byte. So maybe I don't NEED to increment for a simple boot..."

So I looked into it. The code that builds the selection menu ("PRESS 1 FOR TI BASIC, 2 FOR DRAGONS LAIR") used the GPL 'MOVE' command, which absolutely does set the address for every single byte. But the code that scans for programs to display actually was coded in the assembly language ROM side of the system.

The cartridge header has two pointers that matter in this case. The first points to the list of programs on the cartridge, the second points to a particular program's boot address. When the ROM code builds the list of programs, it did use auto-increment on those 2-byte pointers.

After thinking about it for a while... I realized that even in 256 bytes, I had tons of room to spare. So I created pointers with repeated bytes:

CARTRIDGE BASE: 0x8000 (not a pointer, just where we are starting)
PROGRAM LIST: 0x8181 (since both bytes had to be the same, this was the earliest I could start)
BOOT ADDRESS: 0x9191 (due to only 256 bytes being decoded, this is 16 bytes later, leaving room for the program name)

So due to the address register size, the real values accessed would be 0x8081 and 0x8091, but the system didn't need to know that!

Then I wrote a quick little GPL program that cleared the screen, loaded the real cartridge ROM vector, and jumped to it.

I was thrilled when I rebooted the TI and my menu option appeared perfectly! Then I selected it, and the screen filled with a strange character and the system locked up.

It didn't take much digging to see what was wrong. The first command in GPL for clearing the screen was "ALL 32", which means to fill the screen with character 32, which is of course, the space. The opcode for ALL is 7. Running in the emulator showed that the screen was filled with character 7, and a dig through the source code to the GPL interpreter (one nice thing about 40 years later is how much has been documented!), and I learned that my assertion about the interpreter not using increment was ALMOST right. Increment WAS used for fetching the arguments to instructions.

Well bugger... I spent an evening trying to work around this. I looked for exploits in the interpreter, I looked for clever opcode abuse that might let me get assembly control, I looked for single byte opcodes that might be useful and I even looked for pairs of bytes that might be interpreted as useful opcodes. Ultimately, I had to conclude that this approach was a dead end, and the only way was to make the CPLD work correctly.

At this point I began to hammer on the code. Change, upload, test, observe. Change, upload, test, observe - for hours at a time. I attached my logic analyzer to various combinations, I ran simulations (which always worked, of course), all to squeeze out just a single bit of new information for my head.

I envision this stage much like grinding in a dungeon crawler game. I keep going up against the boss, and he keeps sending me back to the village inn to recover, but each time I get a tiny bit of XP. I come back a little smarter, and keep trying.

At one point - I got it! Almost. The increment was working but the address was off by one. Confident that I knew what I was doing, I decided to try a different architecture to solve that. The new architecture failed but in my over-confidence, I had failed to backup the almost working code. Sure enough, I had forgotten what I did. Stupid.

Another week of hammering on it led to no results, and so finally I asked for help. Another stupid, and I was worried about showing off my HDL, so I should have asked earlier. Anyway, my friend didn't spot anything immediately wrong, but we ran through some examples on the whiteboard. I gained double-digit XP for that session.

Then I looked at a similar project (but on a larger device). This one did things slightly differently from me, and I had shied away from the approach because it took more logic than what I had been trying (and didn't fit). But I decided to declare a loss, and just port his project.

So I did, stripping out all the pieces that didn't apply to me. This compiled and even fit, but when I tried it - it locked up the console just the same. (I still need to test if that person's full project works on my troubled console, if not, I have some XP I can share with him now...) Anyway, because it didn't "just work", I went back to mine.

By this point I had been starting to learn how to optimize a little bit, and I reworked my code to use a similar increment concept to what this other project did. This was costing me in gates, so I disabled the ROM side to allow me space to increment (removing 14 latched bits saves some amount of logic!) And after a few iterations, GROM was working! Fully! I was so thrilled, but the boss was not defeated yet!

I added the ROM side back in, and sure enough, I was over. But only by ONE macrocell. As we all know, every program can be optimized by one instruction (and has one bug, meaning every program can be optimized to a single instruction which doesn't work). Surely every circuit can be optimized by one component! (No, but bear with me). I noticed the toolchain complaining about the complexity of my delay mechanism...

I'm proud of this, cause this one part earned me the last XP I needed to beat the boss and get a fully working CPLD.

I had a simple mechanism to control my gating of the GROM to the TI bus, because it didn't seem to like it when I was too quick on OR off the bus. I had two signals controlled by the GROM clock... when the TI asked for GROM, first one went high, then on the next cycle the other one did (and gated the memory). When the TI released the bus, they released in the reverse order. This seemed to keep GROM stable. (My previous GROM work used an 8MHz AVR. For all its speed, it's still slow compared to hardware.)

Anyway, I did this using a sequence of IF statements, one for up and one for down. I realized that the tool might be creating a fairly complex circuit to support that exact behaviour, and wondered if I could simplify it. I drew a truth table and realized that sure enough, there was a combitorial relationship. I replaced the two if...then...else blocks with two simple AND/OR statements (still in the clock process). This freed up the one macrocell that I needed.

So that's it.. (Hopefully - my other stupid is declaring success before testing it on other consoles!) I can finally move on to the software. Ask for help early, don't be dumb like me. Even if the help itself can't be provided, you gain knowledge by both asking smart questions and listening to the answers. Eventually that's how you get to the end.

I posted a video of it running here: https://www.youtube.com/watch?v=W4yCPdeB4nc

Sunday, August 26, 2018

You're Doing it Wrong: Debug

One of the last things that gets any attention in any large software project is the debug system - which is amazing because the vast majority of time spent developing a large software project is debugging. You'd think people would want to make it easier. You'd think that EVERYONE would want to make it easier. From the bean counters who are paying ridiculous sums of money for every minute of time, down to the individual contributor who knows that if they actually reported the true number of minutes that went into the product, the company would go bankrupt.

So what does a debug system do, and why does it get so little attention?

A debug system provides the tools and information necessary to identify the causes of undesired behavior. That's it, really, and what that means may vary some from project to project, but in most cases it boils down to some kind of log file that records the state of the software as it runs. Most commonly this is a human readable file (but it doesn't have to be), and is used after the fact to determine how some undesired event occurred.

Why is it so neglected? Well, people just seem to assume that they are perfect, therefore everyone else is perfect too. Perfect people never make mistakes - they never write incorrect code, and they certainly never input bad values or click things in the wrong order. Naturally, the systems people work with are also perfect, and never fail or get viruses or suffer bad RAM or otherwise pass garbage into a program. So, since nothing can every go wrong, there's no need to check for errors or report events that occur. Obviously, people who believe this should not lead software teams.

In general, debug systems are also not on people's radar when they start a new project. They are focused on the awesome new function they want to provide, and are not thinking much about how they will troubleshoot that function when it ships.

In the case where the undesired event is an outright crash, there's usually some form of post-mortem analysis. In Linux, for instance, applications generally write 'core' dumps. Windows applications can do the same thing (in the form of 'minidumps' -- and developers who need to support Windows software would be well advised to enable these.) Both of these are binary files that capture the state of the software at the moment of failure, allowing you to load them into a debugger and examine just about any part of the system that might interest you - a moment frozen in time.

But although core dumps are a great first line of defense, they are often not enough. For instance, you can see that your pointer is NULL, and you can see that you crashed trying to deference it, but you can't figure out WHY it became NULL. This is because you only have a snapshot of the exact instant that the NULL was dereferenced. Some people would just add a NULL check right there and claim success - those people should not lead software teams.

A text log is most application's next line of defense. There's a pretty common evolution of text logging in large applications. The first evolution is just a simple dump of lines of text into a log. To avoid stressing the system or filling the disk, log is kept to a minimum or even programmatically disabled except in special cases. As log lines are added, eventually someone gets a line of text in a spinloop, which fills the disk, so filtering for duplicate lines is added. Then the increasingly complex system needs more and more debug, some of which becomes very frequent and is needed but is not spinning, so we add throttling. Then the amount of debug available starts to become unwieldy, and so it is reorganized into classifications (such as severity levels, and possibly subsystems). Then the fact that debug needs to be manually turned on and recompiled starts to be a problem, so debug is made programmatically controllable. Now people are so used to having this great information that when a problem occurs at 2am, they want the log information. But it was off, as the system can't cope with logging everything all the time. So the writer is optimized and logs are on. Finally, although this step sometimes comes much earlier, someone gets tired of the disk getting filled up, and log rotation is added. At this point, you pretty much have a basic text logging system.

So to recap, you might as well start with:
-classifications. Classify your message severity (DEBUG, INFORMATION, WARNING, ERROR, for instance) and possibly by subsystem, if appropriate.
-dynamic. Make it possible to turn logging on and off by classification. Recommend that users leave it on if possible.
-consistent. All systems should emit debug - but see below for what.
-throttling. Should be able to detect if the same debug is being emitted more frequently than is actually useful to a human reader. If you wouldn't read every single line, the computer probably shouldn't write every single line. "x lines per second" from a single statement usually works well enough, but your needs may vary. Make sure the log includes mention of throttled statements so the reader knows why it's not showing, and make sure that mention itself doesn't spam the log!
-rotation. The system should be able to track and control the amount of disk space consumed by logs, while making a reasonable amount of data available for troubleshooting. How much depends on the system and the needs.
-performant. Debug systems need to execute quickly enough to not impact the primary purpose of the system. By designing everything up front, you often get this one out of simple necessity. The simplest first pass is usually a ring buffer that debug statements are copied into, with a separate, lower priority thread writing the ring buffer out to disk. For bonus points, have your crash handler dump the ring buffer too. ;)
-information. I didn't mention this above, but your log functions should almost always be a macro so you can automatically include the filename and line number in them. ie:
#define log(LEVEL,args) writeToLog(__FILE__,__LINE__,LEVEL,args)

So what goes into a text log? Basically, anything you can't determine by any other means. If you were troubleshooting a video game, for instance, and you wondered 'what state was the boss in when he suddenly flew up into the sky?', you probably need to log changes to the boss' state.

Which leads me to the point - you want to keep your information as compact as possible. Logging changes is a pretty good tactic, since the reader can answer any state questions by reading backwards to the last state change. Likewise, if you have an if...then...else statement with two sides, in many cases you need only log one side (the LESS common one), since the reader can determine if the log doesn't appear, the other path was taken. (However, if there's related information you always need, emitting it before or after the IF statement with the data needed to determine which path was taken is also good.)

Good debug lets you read a function beside the logfile, and know exactly what that function did and why. Good debug also does not emit information that you can infer from other information, unless it's so much effort to make that inference that having the computer tell you actually saves time.

As a for instance, consider this:

log(DBG, "Pointer value is %p", ptr);
if (ptr == NULL) {
  log(ERR, "Pointer was NULL, not responding.");
  res=0;
} else {
  log(DBG, "Pointer is valid, doing something useful.");
  res=ptr->doSomethingUseful();
  log(DBG, "Got %d from doSomethingUseful()");
}
log(DBG,"Final result is %d", res);

You can certainly tell EXACTLY what this block of code did.

Pointer value is 0x00000000
Pointer was NULL, not responding.
Final result in 0

...or...

Pointer value is 0x12345678
Pointer is valid, doing something useful.
Got 42 from doSomethingUseful()
Final result is 42.

It's explicit, but your log will fill pretty fast if you're that verbose. Consider this case, which gives you all the same information:

if (ptr == NULL) {
  res=0;
} else {
  res=ptr->doSomethingUseful();
}
log(DBG,"Pointer was %p, Final result is %d", res);

Pointer was 0x0000000, Final result is 0

...or...

Pointer was 0x12345678, Final result is 42

Mind you, this does lose the ERR level message in the NULL pointer case, which if it was important, it would be just fine to keep. You can generally consider messages at different levels to be in different classes.

Which brings me to the concept of what goes into a message. In general, you should assign a certain level (such as INFO) for which that level and above is intended for your customers. Customer facing debug messages should attempt to communicate using human language (not numbers). Furthermore, high level customer messages, such as a WARNING or ERROR, must tell the customer what to do about it. If there is nothing they can do, it doesn't justify being a WARNING or an ERROR.

For instance, this is useless:

if (ptr == NULL) {
  log(ERR, "NULL ptr... about to crash.");
}

You've told the end user, and there is literally nothing they can do about it. Just make it a DBG level for the poor sap who needs to reverse engineer the crash, and maybe spew out some other relevant information if you have it. But don't tell the user. However, this might be helpful.

if (ptr == NULL) {
  log(ERR, "A failure has been detected. Please safely shut down and restart the system. Contact support to collect troubleshooting information.")
}

Human communication. Directions on what to do about it. Maybe the software should initiate its own shutdown, if that makes sense. Either way, far more useful to the person who suddenly gets the big red message.

That's it for now.