## Verification = TCB/PB Reduction

What does it mean if somebody claims that their software is formally verified?

## 1. Verification as we learn in high school

In high-school maths, we are introduced to the idea of a proof — the word “proof” is etymologically related to the word “probe”. In a proof, we reason from first principles (aka axioms), why a certain assertion (aka theorem) holds true. Said differently, we verify the theorem, and the proof represents the verification process.

The axioms and the proof need a language in which they are represented — at a minimum, this language involves a syntax and a set of proof rules governing that syntax. This mathematical rule-based verification is also called formal verification, to distinguish it from the colloquial use of the word verification which may just involve human inspection.

The theorem being verified is often broken up into two parts: (1) the implementation, and (2) a specification that is supposed to hold for that implementation. Here is an example of an implementation, a specification, the axioms that we assume, and the proof system that can be used to complete the proof.

###### Choosing an Abstraction Level for the Proof

The proof author can choose an abstraction level for the axioms and the proof rules, to trade-off between proof simplicity and the level of rigour. For example, the author may choose to consider the commutativity of natural number addition as an axiom, and perhaps prove it separately elsewhere. Alternatively, the author may choose to restrict herself to lower-level axioms only, to potentially obtain more succinct and elegant proofs.

Similarly, the author may choose to assume the validity of mathematical induction over natural numbers (proof rule) even though that may be derivable from more general proof rules such as Noetherian induction.

From a pure theoretical standpoint, it should not matter which (abstraction level of) axioms and proof rules you pick. However, from a computer systems point of view, this does have an effect on your confidence in the verification effort as we discuss below (TL;DR higher abstraction levels slightly increase the size of specifications and proof rules, thus potentially increasing the probability of a bug escaping the verification process).

## 2. Verification of software systems

size_t strlen(char const* str) {
char* ptr;   ulong *longword_ptr;
ulong longword, himagic, lomagic;
for (ptr = str; ((ulong)ptr & 7) != 0; ++ptr)
if (*ptr == ‘\0’) return ptr - str;
longword_ptr = (ulong*)ptr;
himagic = 0x8080808080808080L;
lomagic = 0x0101010101010101L;
for (;;) {
longword = *longword_ptr++;
if ((longword - lomagic) & ~longword & himagic) {
char *cp = (char*)(longword_ptr - 1);
if (cp[0] == 0) return cp - str;
if (cp[1] == 0) return cp - str + 1;
if (cp[2] == 0) return cp - str + 2;
if (cp[3] == 0) return cp - str + 3;
if (cp[4] == 0) return cp - str + 4;
if (cp[5] == 0) return cp - str + 5;
if (cp[6] == 0) return cp - str + 6;
if (cp[7] == 0) return cp - str + 7;
}   }   }

In software, we are usually interested in proving properties of a computer program. The program will have its own syntax, depending on the programming language (PL) in which it has been developed. The syntax of the computer program can be different from the syntax that is used for the specification, the axioms and the proof. In the text below, we show four different ways to write the specification for the C library’s strlen() function; for each specification method, we also provide the required axioms and proof system.

(i) Specification in First Order Logic

$\forall_{i=0}^{\mathrm{\tt strlen(str)-1}} \mathrm{\tt str[i]} \neq \mathrm{\tt 0} \land \mathrm{\tt str[strlen(str)]=0}$

(ii) Specification in Quantifier-Free Logic With the a Custom Axiomatic Operator $\mathrm{\tt FirstOccurrenceOf"}$

$\mathrm{\tt strlen(str) = FirstOccurrenceOf(Heap, str, 0, \lambda x.(*x==0))}$

The proof rules for the $\mathrm{\tt FirstOccurrenceOf()}$ operator are shown as inference rules below. If you are not an expert in reading this notation, don’t bother — basically these are a set of rules that try and resolve the value of the $\mathrm{\tt FirstOccurrenceOf()}$ operator depending on the context in which it occurs. The mechanical checker can use these inference rules as input to decide the validity of a proof step.

In these inference rules, Ctx represents the context under which this evaluation is performed, Heap represents the memory array, addrb and addre represent the bitvectors representing addresses used to index into the heap. L represents a (lambda) function that takes a byte and returns either true or false. Intuitively, the $\mathrm{\tt FirstOccurrenceOf}(Heap, addr_b, addr_e, L)$ operator returns the first index (relative to addrb) of the byte (starting at addrb and before addre) which evaluates L to true. The sentinel value of -1 is used if none of the bytes between addrb and addre evaluate L to true. For example, the first inference rule (base case) states that if addrb and addre are equal (i.e., there are no bytes in the range), then $\mathrm{\tt FirstOccurrenceOf}(Heap, addr_b, addr_e, L)$ evaluates to the sentinel value -1 (irrespective of Heap and L). The $\oplus$ operator represents bitvector addition with wrap-around semantics.

Notice that the addre argument in our theorem relating strlen to $\mathrm{\tt FirstOccurrenceOf}$ is zero, indicating that we are interested in finding the first null character between str and the maximum address value addrmax in the memory address space (recall that addrmax$\oplus$1=0).

(iii) Specification through Equivalence to a Program in the Same Syntax

$\mathrm{\tt strlen(str) = strlenNaive(str)}$

size_t strlenNaive(char const* s) {
char const* p = s;
for (p = s; *p; p++);
return p - s;
}

In this method of specification, both strlenNaive and strlen are specified in the same syntax (the syntax of the C programming language). However, the specification program (strlenNaive) is significantly simpler and smaller than the implementation (strlen).

(iv) Specification through Equivalence to a Program in a Different Syntax

$Related(\mathrm{\tt str}, \mathrm{\tt strCoq}) \implies \mathrm{\tt strlen(str) = strlenCoq(strCoq)}$

Inductive string : Set :=
| EmptyString : string
| String : ascii -> string -> string

Fixpoint strlenCoq (strCoq : string) : nat :=
match s with
| EmptyString => 0
| String c s' => S (length s')
end.

Here, a different syntax (Coq) is used for the specification program. Using a higher-level Coq syntax has the benefit that a higher level syntax does not allow several unsafe behaviours, such as memory errors, that are possible in the C syntax (discussed in more detail in Adam Chlipala’s blog post). On the other hand, proofs with respect to the Coq specification now require the prover to relate the data structures between two different representations — e.g., str in C is a character array that needs to be related to strCoq that is a user-specified abstract data type.

## 3. Correctness of a Proof

A proof, or a verification process, is incorrect if it concludes that a theorem is provably valid even when it was not.

Irrespective of the method of specification and proof, the verification process can be incorrect for one of the following reasons:

1. Incoherent Proof Steps: If any step in the proof does not follow from the existing theorems/axioms/proof rules, then we have an incorrect proof. The common solution to guard against this possibility is to have a mechanical proof checker, codified as a computer program, that can ascertain that all the steps in your proof are valid.

However, if the proof checker has bugs or oversights, then you may admit an incorrect proof.

2. Incorrect Axioms: If the axioms were incorrect, then this can corrupt the entire proof effort. In the context of verification of computer systems, the semantics of a programming language form the set of axioms for example. Further, the specification itself is a part of the axioms, e.g., if the simpler strlenNaive function has a bug, then we would admit the same bug in the strlen implementation as well.

It is important for the axioms to be either small (so that they are manually verifiable) or rigorously tested. For example, the strlenNaive function is small enough to develop confidence through manual inspection alone. On the other hand, the C language semantics are much larger and hard to read manually to develop confidence. Fortunately, C language semantics involve a single codebase that is used for verification and testing across tens of thousands of codebases, and so such wide testing and production use provides confidence in its correctness. For example, it is possible to test the logical encodings of the C semantics by checking that the outputs of the logical encodings are identical to the outputs produced through C compilations on a physical computer.

3. Incorrect Proof Rules: If any of the proof rules is unsound (buggy), then the proof steps can become unsound, which in turn results in an incorrect proof.

In our example, most proof rules are based on standard quantifier-free or first-order logic for bitvectors and arrays. These proof rules have undergone several code reviews, extensive testing, and are in production use by thousands of users — and so we can be reasonably confident of their correctness.

There is a slightly higher likelihood for the proof rules to be unsound for custom proof rules added for custom axiomatic operators, such as $\mathrm{\tt FirstOccurrenceOf}$ in our example. However, even so, these custom rules are typically much simpler than the implementations that they verify (because they do not need to be optimized) and further they have wider use and testing (e.g., $\mathrm{\tt FirstOccurrenceOf}$ could be used for several search-based algorithms, and is not limited to strlen).

## 4. So what does Formal Verification buy us?

We have already discussed that the proof can be incorrect. If you have read this interesting blog post on the Piano Test by John Regehr, you now know that you do not want to be sitting under that piano.

Then, what does it mean for a software program to be formally verified, if it does not guarantee the absence of bugs?

The value of a formal verification effort is that it reduces your Trusted Computing Base/Probability of a Bug, or TCB/PB. In our strlen example, if the implementation has been formally verified (using any of the four specification methods), then we know that for there to be a bug in the implementation, there must be a bug either in the specification (which is much smaller than the implementation) or there must be a bug in the logical encoding of the C programming language semantics (which have been heavily tested) or there must be a bug in the proof rules (also heavily tested) or there must be a bug in the proof checker (also heavily tested). The formal verification process adds value if the chances of bugs in any of these subsystems is lower than the chance of a bug in your original implementation.

It is worth emphasizing that a “reduction” in the TCB is not necessarily a quantifiable reduction, e.g., it often cannot be measured by SLOC (source lines of code) alone. For some components of the proof, such as the specification, it makes sense to quantify the TCB reduction through SLOC. But for other components, such as axioms, proof rules, and proof checkers, we are not relying on SLOC reduction but on the fact that these are infrastructural components that have been heavily tested and are often in heavy production use. Thus TCB reduction is not enough to capture the value of the verification effort, and so we also need to quantify the reduction in the bug probability (PB).

Further, even if any of these infrastructural verification components have a corner-case bug, it is very unlikely that the bug manifests itself in a way that it causes an incorrect proof to go through. For example, it is much more likely for my strlen implementation to have a memory error, than the possibility of a memory error in the proof checker that causes the verification of strlen to incorrectly succeed. At worst, a corner-case bug in the verification tools would cause the verification process to end abruptly, not cause a wrong proof to go through (with a very high probability).

## 5. Applications of Formal Verification

In practice, formal verification is very useful because it is able to catch several corner-case bugs. For mission-critical applications, such as those found in defence, aerospace, and automobile sectors, the formal verification of software reduces the probability of errors by orders of magnitude and prevents catastrophes (e.g., Boeing).

The speed at which implementations evolve is significantly faster than the pace at which the verification infrastructure needs to be modified/adapted — and so the presence of a verification harness provides the much needed confidence-of-correctness even in the presence of continuous code modifications. Such verification infrastructure allows organizations to save testing/QA costs.

Security properties are usually better suited for verification than testing. Researchers have found serious bugs in rigorously tested compilers and libraries — most such corner-case bugs are very difficult to catch through testing and much easier to catch through verification tools. If you look through our bugs-found page for examples of bugs found by our verification tool, you will find that the most common bugs involve corner cases like pointer aliasing, buffer/integer overflows, overflows during type-conversions, etc. Verification tools have unique advantages over traditional testing methods in identifying such hidden errors and security loopholes.

Machine-readable proofs serve as executable documents and help an engineering team in the development and maintenance of their software over time.

It is my humble prediction that it is only a matter of time before verified software becomes the norm.