Good Practices Reduce Bugs Better than Good Brains

Mostly we think that avoiding bugs is the result of a good brain. If there’s a bug, some programmer missed something. That’s certainly one source of bugs. But in my experience many, if not most bugs can also be traced to how the code is written. The format and structure of the code can often make it hard for human eyes, even trained eyes, to see a problem right in front of us.

The latest iOS security bug is the talk of the town at the moment, and it’s a doozy. It left people extremely vulnerable to attacks, and the code was based on open source code. What makes it more poignant is that it appears to be a very small error in the code, one that is at the same time easy to understand and can be difficult to catch. It’s a good example of how the lack of simple coding standards can cause big problems because it’s easier for the problem to “hide” within the structure of the code.

Below I’ve used this as an example to show how a couple of simple coding standards can help prevent this kind of problem.

An Extraordinary Kind of Stupid is a great article about the bug. The fault is placed on some programmer who missed the fact that there were two identical lines:

OSStatus err; 

if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
    
   goto fail;
            
if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
 
   goto fail;
            
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
   
   goto fail;
            
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
     
   goto fail;
                
   goto fail;
            
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
      
   goto fail;
fail:
            
   return err;

The result of that duplicate “goto fail” is that we jump to the code after the “fail” label before we make the last error check. All the articles I’ve seen about this problem essentially say “the programmer should have noticed the identical line, which is always executed because it’s actually outside of the if statement”.

True. But this is like saying that a driver should have avoided an accident even after the traffic signs were wrong and the side mirror was broken. Yes, you might have avoided that accident if you had paid special attention. Maybe. But the environmental difficulties made it very hard to notice the oncoming accident. The environment made significant contributions to the accident, not just your lying eyes.

Take a look at the code again. First, there’s a goto statement. I don’t think I’ve used a goto since high school. Using gotos has been considered a poor programming practice for years because it abruptly alters the logic of the code and can lead to “spaghetti code”. It leads to errors like the one above, and It’s easy to jump to the wrong label. On rare occasions a goto can get you out of a jam you can’t avoid, but it makes the code harder to follow and understand. If I were reviewing this code I’d ask for some good justification for using goto. In the above code, it’s not necessary at all.

Second, the variable and goto label are poorly named. Why do software developers use such short names? Compilers don’t care how long the names are, but people do. “err” is not a clear explanation of what the value is. An error of what?

“err” is also an incorrect name for that variable because it might not contain an error status. It’s the return value of the authentication, which could report that the authentication is successful. Even if it does reflect an unsuccessful authentication, that’s not necessarily an error. It could just be a typo on the password. In the context of authentication, it’s not a failure to when you don’t authenticate. That actually reflects that the code is doing its job.

By the same token, you get to the “fail” label no matter what, even if none of the failure conditions are met. It’s confusing – the implication from the label name is that if you reach the label (which you always do) you have failed in some way. In fact, if you reach the label you’re either successful or unsuccessful with authentication, and neither of those is a failure. A failure is when something fails in some unexpected or unusual way, such as when you can’t save a file because you’ve run out of disk space.

It’s hard to figure out what’s going on with such terse and incorrect variable and label names. So “err” should be called something like sslReturnValue, and “fail” should be called something like osStatusComplete.

There should also be brackets for each if statement. This is one of the big coding failures here and it’s easy to correct. A single line after an if statement is part of that if statement. But if you have more than 1 line you need brackets. So put them there even if you only have 1 line when you first write the code. It’s quite possible you’ll have to add more lines later. And it just makes the code easier to read. This is especially poignant because the failure to see something “simple” in the code is what caused this error. It’s important to remember that it’s not just machines reading this text, it’s humans too. And the humans are more important in finding and fixing errors. Code should be easy to read for just that reason.

Let me repeat: code should be easy to read. It makes it easier to find and fix errors. It makes it easier to see the logic.

Sometimes software engineers like to write “impressive” code that’s dense and difficult to read. The code gets a lot done on a single line, or the spacing of the code is dense with little or no whitespace to call out important routines. That doesn’t mean you’re a smart engineer or that the code will run any faster. It means you write code that’s hard to maintain.

Finally, it’s always dicey to perform assignments within conditionals. It’s a shortcut but it makes it easier to conflate the assignment and the conditional. I hesitate to say “don’t do it”, but avoiding it makes for easier reading and better code comprehension.

So here’s one possible re-write of the code above:

OSStatus authenticationStatus;

authenticationStatus = ReadyHash(&SSLHashSHA1, &hashCtx);

// fall through to the end of the “if” statement if we can't authenticate, otherwise continue

if (0 == authenticationStatus) { 
 authenticationStatus = SSLHashSHA1.update(&hashCtx, &clientRandom);

  if (0 == authenticationStatus) {
    authenticationStatus = SSLHashSHA1.update(&hashCtx, &serverRandom);

    if (0 == authenticationStatus) {
      authenticationStatus = SSLHashSHA1.update(&hashCtx, &signedParams);


       if (0 == authenticationStatus) {
         authenticationStatus = SSLHashSHA1.final(&hashCtx, &hashOut);  
       }
     }
   }
}

return authenticationStatus; // will be 0 if all checks are OK

So we have nested if statements instead of goto statements. The result is the same as the original example because we always stop as soon as we get a non-zero status code, and we return the status no matter what it is. We also see that we’re returning a status, not an error. We’ll never accidentally assign something in a conditional. There’s no goto statement, so we couldn’t replicate the error in the original code if we tried. It’s more obvious what’s going on just by looking at the structure, which makes for quicker assessment of the code.

The original code was poorly written and is deceptively difficult to comprehend. I would say it’s mostly due to a lack of coding standards, which can be considered lazy software engineering. The boring but important types of practices illustrated above result in big savings in the long term – such as a lack of security vulnerabilities.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s