Case Study

On today's computers, every program is obviously not reliable, and the reliability of many desktop programs is tied to the cost of their failure. The costs incurred give an indication of how critical the software is. Safety-related programs are especially critical as their failure may result in the loss of lives, severe injuries, or monetary losses. Examples of such programs can be found in the nuclear, space, medical, and transportation fields. These programs are generally embedded in some broader application or device that can be neither easily updated nor fixed in critical situations. Therefore, special care must be taken when specifying, designing, and building these programs.

One example of catastrophic failure due to software errors occurred during the first flight of the Ariane V launcher (see http://www.esa.int/esapub/bulletin/bullet89/dalma89.htm). The launcher exploded after approximately 42 seconds in flight at an altitude of 4000 meters above the launch pad, due to a complete loss of guidance and altitude information. This data was provided by two onboard Inertial Reference Systems (IRSs). After detailed analysis, it was determined these IRSs were designed and verified for the former Ariane IV launcher but not for the new Ariane V model. IRSs provide data concerning booster thrust, represented by 16 bits. In Ariane V, this value was copied to 64 bits real. The coding was sufficient for the old launcher but not for Ariane V, where 16 bits were insufficient to represent a higher thrust. A data overflow occurred and the incorrect 16bit integer was moved into the 64-bit real without any check, causing an operand fault. The guidance software interpreted this as a failure of the main ISR, switched to the redundant ISR, which had the same trouble. The error diagnostics for the main system interpreted this as new flight data requiring a trajectory correction. The almost random corrections caused the launcher to disintegrate due to an angle of attack exceeding the 20° limit. At this limit, the boosters separated from the main vehicle and the launcher self-destructed.

The Epinal radiotherapy accident series is another partially unexplained example (see http://www.johnstonsarchive.net/nuclear/radevents/2004FRA1.html and http://tf1.lci.fr/ infos/sciences/sante/0,,3538589,00-accident-radiotherapie-300-nouveaux-cas-epinal-.html). A severe accident caused by the misuse of a radiotherapy planning software for the treatment of cancer affected 24 patients at the French Epinal Hospital between May 2004 and August 2005. Radiotherapy doses were 20 percent too high. One patient died in 2006 and 13 were injured. In 2006, it was noted that serious dysfunctions affected patients treated for prostate cancer between 2001 and 2006. Four hundred patients received doses that overexposed them by 8 percent. Software problems and inadequately trained personnel were determined to be at fault. Other systematic dysfunctions were also reported in September 2007 concerning the period 1989 to 2000, where approximately 300 patients out of a total of 5000 patients received a 7 percent overexposure. At the time of writing, investigations on the precise causes of these accidents are continuing.

This chapter is about the reliability of C code. Trusting code requires that the code be reliable; in other words, that it does what it's supposed to do. Unreliable code might crash or perform a completely unintended operation. Therefore, ensuring that every piece of source code performs correctly and as its author initially specified is essential. The word correct is paramount here as will be seen in "C Code Static Analysis."

Linux is mostly written in C as well as in assembly languages for certain well-defined parts that must be efficient or that are dependent on the hardware. This latter layer of code is sometimes referred as the Hardware dependent Software (HdS), and it manipulates very low-level data (registers, interrupts, etc.). Most applications that run on top of the Linux kernel are written in several languages including sh, csh, bash, C, C++, Ada, Perl, and Caml. But historically, C is considered the major programming language for system software, as well as for many embedded applications, and has, therefore, been the subject of much research to improve its reliability.

This chapter will give the reader an in-depth view into the techniques involved in analyzing the correctness of C programs. This is a very active field and formal methods have been used to check the correctness of programs for many programming languages, using sometimes different approaches.

Let's start with a few useful concepts.

Reliability Software reliability means that the probability of that software failing is at an acceptable level. A complete absence of failure is not required, but failures should occur at a level and a frequency that are acceptable with regard to the software's criticalness. Using a simplified view, only programs shorter than one page can be made perfectly reliable. Failure is measured pragmatically and occurs when the software crashes (with no or incomplete results) or produces incorrect results (temporarily or permanently).

Correctness Software is considered correct when it behaves as intended in its specifications. Correctness binds an implementation to its (earlier) specifications. Typically, correctness is interpreted and verified formally, i.e., with mathematical means and rigor.

A program can, therefore, be correct but unreliable, for instance, when it results from an incorrect specification. A program can also be reliable but incorrect, for instance, if it meets its specification approximately but works well enough. You are always aiming for reliability, and correctness is a means to this end.

To avoid pitfalls, such as the destruction of Ariane V, what can you do?

A first answer might be to use reliable tools, design programs carefully, and test thoroughly. This might, however, not be enough, since, for instance, you can't test every possible case exhaustively. In the real world, unexpected incidents occur and lead to unpredicted and sometimes erroneous results. Can you do better?

Was this article helpful?

0 0
The Ultimate Computer Repair Guide

The Ultimate Computer Repair Guide

Read how to maintain and repair any desktop and laptop computer. This Ebook has articles with photos and videos that show detailed step by step pc repair and maintenance procedures. There are many links to online videos that explain how you can build, maintain, speed up, clean, and repair your computer yourself. Put the money that you were going to pay the PC Tech in your own pocket.

Get My Free Ebook


Post a comment