Wednesday, April 5, 2017

Sleeps Don't Scale

We've all done it - something races something else, we don't care if we lose the race, so we just throw a sleep in there and call it done for the day.

Unfortunately, as a system grows, all those buried sleeps start adding up against you. A large complex system simply cannot sustain sleeps, and there are three main reasons why not.

First: a sleep usually is a workaround for a race condition. As your system grows more complicated, the timing of that race condition will change - usually the window grows larger. However, in most cases, the duration of the sleep itself remains the same. A 10ms sleep that worked fine when you only had 5 threads is suddenly right on the edge when you have 20. It may break every time when you have a hundred (eek! But you know it happens.) Good luck finding the sleep that is suddenly causing your issues, you forgot about it months ago, cause everything was working fine.

Secondly, and this is strongly related to the first point: the sleeps will destabilize your code base. Little races that you didn't even know about will come up, because some other function that you had forgotten about and assumed was quick suddenly sits around for a while, giving your new code a chance to run and then suddenly pre-empting the middle of it. You know, when you sort of assumed your main loop was idle.

Finally, they kill performance. As your system scales up, all those sleeps start to add up. Take the example of a simple web service, for instance. It receives a message on a socket, passes it to a waiting worker, and gets the response back. Now imagine you had some dumb little race condition and threw in a 10ms sleep. When you're receiving 1 response a second, who cares about a 10ms sleep? Nobody! What happens when your service gets popular and needs to process 1000 packets a second? It's impossible - you need at least 10 seconds of sleep time to process that 1 seconds worth of traffic.

So what do you do? First off, you don't let them in there in the first place.

Oh sure, there are cases where you can let it slide. A standalone task running on its own thread with no interdependencies with the rest of your mainline code, sure, let it sleep.

Sleep might also be helpful for releasing your timeslice when you're done working, but a timer or signal would be better, since you will get more predictable wakeup times. For instance, if you want to process once every 10 milliseconds, you could do your task, then sleep for 10ms, then wake again. That will work. But if your work takes 1ms, then your cycle time that loop actually becomes 11ms. If you use a timer, you could include that work time in your 10ms cycle with minimal extra effort.

But in your mainline code? Anything that is processing on behalf of another system? Really bad idea. If you are sleeping just because you need to wait for something else to be done, you are far better off finding out what that other thing is and synchronizing with it than blindly sleeping. In long sequential tasks, a state machine approach might be better. Each cycle you can check if its time to work - if yes, do the work. If no, don't. This allows you to interleave the operation of multiple clients through that state machine, rather than blocking on the completion of each individual task.

And I'm bored. No punchline today. ;)

Monday, February 13, 2017

Object Oriented Programming - How We Got Here

While reading "The Mythical Man Month" (which I didn't know was about programming), I was struck by the number of valid points that this book, written to help manage software products in 1975 -- which are still ignored or blatently flaunted today.
 
I have a bigger project where I hope to distill the most important points I see from that. Later.
 
But the book I picked up is a later edition with a 1986 addendum titled "No Silver Bullet". And in this article is a section entitled "Object-Oriented Programming - Will a Brass Bullet Do?" And in that section, is a single paragraph that enlightened me entirely as to how we got where we are with Object Oriented Programming.
 
I'd like to comment on that paragraph.
 
Now this was written in the infancy of the very concept of Object Oriented Programming, and it's musing about why the concept had not yet caught on as well as the author thought it should. And so he attempts to describe some of the goals of Object Oriented Programming. And it's this one paragraph summarizing "one view" that I can see the germs of thought that have today mutated into rampaging plagues across most of software development. Seriously, I used to wonder.
 
One view of object-oriented programming is that it is a discipline that enforces modularity and clean interfaces. A second view emphasizes encapsulation, the fact that one cannot see, much less design, the inner structure of the pieces. Another view emphasizes inheritance, with its concomitant hierarchical structure of classes, with virtual functions. Yet another view emphasizes strong abstract data-typing, with its assurance that a particular data-type will be manipulated only by operations proper to it.
 
Every one of those features is actually pretty good in the original intent - that is - that it is used where it makes sense. The problem is that these guidelines have been mutated in many programs into absolute laws. You absolutely may not access data inside another class. You don't need to see how a class was written, let alone have the right to modify it. Everything is inherited from something else - whether it makes any sense or not (the number of times I've had a basic data type with an inheritance chain six or more classes deep is no longer amusing to me, but rather depressing). And I've literally worked on a project where I was not allowed to store public data in a central database because the database, which existed in the software already, didn't support strong data typing. That was the reason.
 
The point of Object Oriented Programming was to make it faster and easier to develop pieces of software and bring those pieces together.
 
Modularity exists so that a component can be developed and tested in isolation. It makes no sense whatever to make a class modular if you still need other classes to make it work. That's not modular anymore, and you probably should consider whether those should be merged into one object, rather than an incestuous mess. And for what it's worth, bool is not a modular class. Don't wrap bool.
 
Encapsulation is a tricky one to grasp. It's stated so plainly - one cannot see the inner structure of the pieces. But good encapsulation requires two things: a good design and enough runtime to prove that the design actually is good. If you enforce encapsulation to the point of "nobody looks at the code and therefore nobody can change the code" from day one, all that will happen is you will end up with workarounds for missing, obtuse, or broken functionality. Worse, you'll probably try to code for every conceivable case, most of which aren't what people actually want to use, in hopes no changes will be needed. The project will be more complicated and less stable. I've seen people enforce this rule to the point where they are doing this with their own objects. Encapsulation is for stable code, not development code. And you don't need to encapsulate bool. Don't wrap bool.
 
Inheritance is one of the most powerful features of Object-Oriented Programming and frankly, one of the few features I actually really like. But you inherit where it makes sense. In most cases your inheritance chain should not be any more complicated than the example in most text books -- that being a base class extended to one level. In rare cases you may need two levels for certain objects (but certainly not all of them) and in equally rare cases it may make sense to have multiple inheritance (but certainly not all of them). Good planning goes a long way here. Going nuts with inheritance leads to complicated, incestuous code that is difficult to debug, difficult to modify (without breaking something else), difficult to implement and difficult to document. It's also poorly performing in many cases and in cases where it's not, harder to predict what the code will do. You don't need to start with basic classes like a wrapper around bool and inherit from there. Don't wrap bool.
 
Strong Abstract Data-Typing was meant to get away from the admittedly sloppy practice of casting objects in C and hoping you got it right. This feature alone is a good reason to port C code to C++, even if nothing else changes (you'll be surprised where you screwed up but it worked anyway ;) ). But it doesn't mean you need to wrap every type of data you want to use in a custom object just so the data-typing will protect your function calls. (In fact, in many cases passing different types of data around is a better job for classes with a common base class and utilizing inheritance...). But simply put, if you have several true or false items, you don't need to wrap bool in different classes to make sure you pass the right kind of bool to the right function. Bool is a bool. Don't wrap bool.
 
That's all I really wanted to say. I learned a bit about when modern programming missed the left turn in Albuquerque. It was roughly thirty-one years ago. We have GPS now, let's figure out where North is and start getting back on track.