CompCert and C semantics ======================== Where are we in the class? Starting to read research papers on state-of-the-art verification. Today's paper: correctness of C programs. Thursday's paper: Vale, correctness of high-performance assembly code. Hard read, lots of material. Don't worry if you don't understand all of the details. Background CompCert: C compiler with proof of correctness in Coq Significant verification project Xavier Leroy has been working on it for almost 15 years One of the big examples of success in machine-checked correctness proofs Commercial users (embedded/aircraft/space systems) Important building block for some other verification efforts Write (and prove) C code, then generate provably correct asm E.g., CertiKOS verified microkernel Many papers on aspects of CompCert's development http://compcert.inria.fr/publi.html Significant progress since paper written Almost complete support for various C features Many architectures: PPC, ARM, x86, RISC-V More verified parts (parsing, executable output, ..) Source code (and specs and proofs) available https://github.com/AbsInt/CompCert How significant are compiler bugs? Many bugs in GCC, Clang/LLVM that emit wrong code http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf Particularly bad for several reasons: Hard to track down and reason about Upgrading compiler might introduce bugs (but may also uncover real source-code bugs) Subverts any source-level analysis Undefined behavior [STACK paper]. Goal: allow compilers to produce efficient code. Example: integer divide by zero. Underlying goal: emit a single integer-divide assembly instruction. What should the C language say about divide by zero to allow this? Require zero result? Doesn't work for architectures that return non-zero. Allow arbitrary resulting value? Doesn't work if arch traps on div-by-0. Allow arbitrary outcomes including crashes? UB. Other examples of UB: Dereferencing null pointer. Accessing mem outside allocated bounds. (Can arbitrarily corrupt state.) Computing out-of-bounds pointers. (Depends on pointer repr w/ segments.) Signed integer overflow. (Depends on how signed integers are represented.) Modern languages have UB too. Golang: unsynchronized access to slice variable is undefined. Slice variable consists of pointer + length (also cap, but not relevant here). Accessing slice: read length, check, then access pointer. Writing to slice variable: write pointer, then write length. Can easily access memory out-of-bounds, leading to arbitrary corruption. What would be the cost of making concurrent slice access well-defined? Extra sync operations on each slice variable access. Significant performance overhead for concurrent code. C compilers take advantage of UB in surprising ways. Figure 1 from the STACK paper: if (buf + len >= end) ... if (buf + len < buf) ... What is this intending to catch? Why does the compiler drop the second check? Reasoning involves potential UB when computing (buf+len). Case 1: buf+len does not go out-of-bounds. No overflow. Not (C ⇓ B) (1) from the paper Why too strong? Non-determinism in S that's missing in C Undefined ("going wrong") behavior in S not preserved in C Example 1: int foo() { printf("foo"); return 1; } int bar() { printf("bar"); return 2; } printf("%d %d", foo(), bar()); >> foo bar 1 2 >> bar foo 1 2 Example 2: int i; printf("Hello %d\n", i); >> Hello 5 >> Hello 6 >> ... Example 3 (from the paper's description): int i = 0; int j = 5/i; printf("Hello world!\n"); >> ?? Safe behavior subset: S safe => forall B, (C ⇓ B) => (S ⇓ B) (2) from the paper Why is this better? Intuitively, similar to trace inclusion that Butler described Anything that compiled code (C) does must be possible in source code (S) Works out well for example 1 Works out well for example 2 Works out in a boring way for example 3 But that's perhaps the best we can do for that program C's undefined behavior says that program could do anything Deterministic semantics: forall B \notin Wrong, (S ⇓ B) => (C ⇓ B) (3) from the paper Why is this better? Easier to prove by induction / case analysis on S C = f(S), loosely speaking, where f() is the compiler "simpl" works well inverse of "f" is difficult to reason about in a proof But works only for deterministic source and target languages.. Top-level C is not deterministic Some hardware instruction sets are not deterministic Deal with C non-determinism at the top level Don't emit any hardware instructions that aren't deterministic What about floating point? Not sure what's the answer. Still works for example 2 More precisely, "same amount of non-determinism in S and C"? Why is behavior deterministic? What about external inputs? CompCert's model assumes the program executes in a specific "external world". Same external input will appear for both the source and compiled code. External inputs are pre-determined, even if they aren't known to the program yet. Effectively, CompCert's statements are all "forall ExternalWorld, ...". What does it mean to verify a compiler (i.e., what spec should compiler satisfy)? Verified compiler: correct compilation if returned OK (definition 6) Overall CompCert takes this form Is it OK for CompCert to crash? Verified validator: correct compilation if validator returned OK (definition 7) Why is this desirable? Probably too difficult for complete C compiler. Works well for some internal passes inside CompCert. E.g., register allocation coloring algorithm: easy to check result. Proof-carrying code: validator can check that resulting code satisfies some spec. Typically used where the spec is relatively simple (e.g., memory safety). Used in software fault isolation (SFI) systems like Google's Native Client. Limitations of the spec: what is CompCert's spec good for / what is it missing? Safety properties Code will not produce results disallowed by C code Liveness properties Compiler might refuse to compile code Can compiled code refuse to do anything? Spin forever? Large delay seems to be allowed Infinite loop falls into "divergence"; not allowed unless source loops Security? Private key erased from memory, if C source has a memset()? C spec doesn't promise this directly; safe to optimize away memset. C11 provides memset_s that cannot be optimized away. But what does this mean? What if compiler made a temporary copy of the data on some stack? Uninitialized data does not contain something sensitive? Requires that program does not have any non-determinism that's resolved in some "surprising" way during the early pass of C -> Clight Others: Linking Concurrency How to model C semantics; e.g., memory model? Much more abstract than what hardware provides (flat addresses) Memory consists of disjoint blocks Block has a certain number of bytes Pointer refers to a block and offset-within-block Block "identifiers" are not reused (e.g., after a block is freed) Is it OK that this model cannot represent pointer arithmetic crossing into another block? CompCert does, of course, eventually flatten the memory model in its compilation pipeline.. How to verify a C program that is then compiled using CompCert? Easier in Coq: - turn the C program into Clight - reason about program using CompCert's semantics for Clight - compile using CompCert no ambiguity about C parsers or Clight semantics! Outside of Coq (e.g., using a model checker or SMT-based analyzer): Need to make sure C semantics match CompCert's parser + Clight. Limitations of the implementation Some less-frequently-used aspects of C not originally supported CompCert 3.1 supports many of them now (long long; goto; varargs) Some are handled during unverified C-to-Clight parsing/translation (e.g., struct parameters and return values) What is not verified in CompCert? C parsing / input But now done in a cleaner way than the original CACM paper Verified parser produced from a grammar for the C language Assembly How to structure the proof? Layers: Clight, ... Many passes: Figure 1 Explicit semantics for each little language Quite a bit of effort to define languages (10% of the source code) But probably saves on the proof effort (which is 76% of the source code) Verification vs validation Register assignment coloring Implemented in Ocaml Easier to write (and prove) a checker in Coq Diagram: section 4.3 Similar to Butler's commutative diagrams Execution here goes down "Abstraction relation" holds across (tilde) Trace is external functions, rather than entire program's execution Instead of reasoning about a particular implementation and spec, we are reasoning about ANY program in S1 and ANY corresponding compiled program in S2 Tilde "abstraction relation" says S2's code is a compilation of S1's, etc How effective is formal verification? Csmith: tool for finding compiler bugs http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf Found some interesting bugs in CompCert's unverified parts Misinterpreted integer semantics when parsing C into Clight Generated invalid assembly (missing constraint on immediate constants) But no bugs in verified "middle" portion, unlike every other compiler