## Analyzing C Code Using Hoare Logics

Let's first discuss the principles of Hoare Logics and the VDM method. Hoare Logics consists of describing at the specification level the intended behavior of every C function in terms of a pre- and a post-condition. These are state predicates, expressed in first order or higher order logics (such as VDM-SL), that formalize precisely the state of the variables before and after the function is executed. The predicates relate variables (local and global ones) by means of a relation. Through that relation, variables are bound by an explicit function (such as y=f(x) with some function f) or some implicit relation or function (such as R(x,y) where R is the relation). The former is a particular case of the latter.

As an example, consider the following bubble sort function:

//Sort in^increasing order the integers of table t

The pre- and post-conditions, respectively, express that MAX_INDEX is some positive integer constant and that the table is sorted in increasing order afterward. This writes as follows:

Post-bubblesort: ( Vi e ( 0 rMAX_INDEii-1 ], t(i) < t(i+l) )

These predicates use the following different objects that are part of some logics:

• Variables and constants These belong to some model sets, generally mathematical sets. For instance, integer variables are modeled by mathematical integers. Other primitive sets are Booleans, characters, tokens, and rational numbers. Real numbers are not necessary as a computer does not really manipulate real numbers but instead a subset of real numbers that have finite precision, i.e., rational numbers. In VDM-SL, more complex model sets can also be built, using (partial or complete) functions, subsets, supersets, product sets, and so on.

• Expressions C operators are modeled using mathematical operators. For instance, the addition operator for C integers is modeled using the mathematical addition operator for integers. A whole bunch of operators on the various model sets allow you to build complex expressions. Universal and existential quantifiers are allowed.

• Higher-order objects In several logics, such as the one used by VDM-SL, higher-order terms of any degree can be written. In VDM-SL, power sets and functions build such higher-degree objects. Functions can have other functions as arguments, such as in the following example:

Nat filter : (nat -> bool) * seq of nat -> seq of nat Nat_filter(p,ns) == [ns(i) | i in set inds ns & p(ns(i))]

Nat_filter takes a sequence of natural numbers and a predicate and returns the subsequence that satisfies this predicate. Then nat_filter(lambda x: nat & x mod 2 = 0, [1,2,3,4,5]) returns [2,4].

In fact, this function is not particular to natural numbers and can be made polymorphic, as follows:

Filter[@elem] : (@elem -> bool) * seq of @elem -> seq of @elem Filter(p,l) == [l(i) | i in set inds l & p(l(i))]

This function applies to any type @elem.

The pre- and post-conditions are, therefore, predicates on the models of the state variables of the function. Every variable is interpreted by a mathematical object and classical assignment is no longer used, as it is only meaningful in C.

In the bubble sort example, the precondition must be given by hand and should be a predicate stating the constraints on the input variables and global variables before the function starts executing. Preferably, the preconditions should be the weakest possible in the sense of logical implication, maximizing the size of the input domain. This restrains the domain of the input variable and global variable values to some subset on which the function is defined. This means that the function does compute a correct result for every value in this set. Remember that predicates are equivalent to sets. For the function Bubblesort, the variable t is modeled as a mapping (of the same name):

t : nat -> nat which is a partial function only defined on the subset [0,MAX_INDEX]. The symbol -> is the constructor of partial functions, whereas the +> symbol is the constructor of total functions.

The post-condition must be written as well and should be a predicate on the output variables and global variables of the function, once the function has terminated. In other terms, it describes the result of the function's computation as a predicate. The postcondition should preferably be the strongest one possible in the sense of logical implication. For instance, Post-bubblesort states that the table t is completely sorted. A post-condition only characterizes the result but does not necessarily describe how it is obtained. This is the role of the C code itself.

Another predicate must be introduced, namely invariants. Invariants are associated to loops only (this applies to any kind of loop) and characterize the behavior of loops during execution. Let L be a loop statement, then Inv_L is a predicate on the function's local and global variables that holds at every iteration of the loop L. Generally, Inv_L is placed at the very beginning of the loop. Inv_L must be satisfied when entering the loop at every iteration and at the end (i.e., afterward).

For instance, the invariants of the outermost and innermost loop of the bubble sort example can be written, respectively,

Inv_I ! ( Vi E EMAX_:raDEX-x,MAX_INDEX-lJ, t(i)=nth-highest(t,i)) ¿nv_2 | I Vi € [MAX_INDEX-x,MAX_INDEX-1], t(i)=nth-highest(t,i))

where nth-highest(t,i) denotes the nth highest element of t.

The first invariant states that the table is partially ordered from index MAX_INDEX-x up to MAX_INDEX. This is due to the fact that every iteration brings the next biggest element to the left. The second invariant is a conjunction of two parts. The first part is the same as the first invariant. The second part expresses that at iteration y, the table t[0,y] has its highest element on the right; this reflects the principle of the bubble sort. Of course, this example is quite short and the invariant looks very similar to the postcondition, but this is not always the case as loops can be embedded in the code is many ways and places.

Invariants must generally be written by hand too.