CompCert and C semantics
========================

Where are we in the class?
  Starting to read research papers on state-of-the-art verification.
  Today's paper: correctness of C programs.
  Thursday's paper: Vale, correctness of high-performance assembly code.
    Hard read, lots of material.
    Don't worry if you don't understand all of the details.

Background
  CompCert: C compiler with proof of correctness in Coq
  Significant verification project
    Xavier Leroy has been working on it for almost 15 years
  One of the big examples of success in machine-checked correctness proofs
    Commercial users (embedded/aircraft/space systems)
  Important building block for some other verification efforts
    Write (and prove) C code, then generate provably correct asm
    E.g., CertiKOS verified microkernel
  Many papers on aspects of CompCert's development
    http://compcert.inria.fr/publi.html
    Significant progress since paper written
      Almost complete support for various C features
      Many architectures: PPC, ARM, x86, RISC-V
      More verified parts (parsing, executable output, ..)
  Source code (and specs and proofs) available
    https://github.com/AbsInt/CompCert

How significant are compiler bugs?
  Many bugs in GCC, Clang/LLVM that emit wrong code
    http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf
  Particularly bad for several reasons:
    Hard to track down and reason about
    Upgrading compiler might introduce bugs
      (but may also uncover real source-code bugs)
    Subverts any source-level analysis

Undefined behavior [STACK paper].
  Goal: allow compilers to produce efficient code.
  Example: integer divide by zero.
  Underlying goal: emit a single integer-divide assembly instruction.
  What should the C language say about divide by zero to allow this?
    Require zero result?  Doesn't work for architectures that return non-zero.
    Allow arbitrary resulting value?  Doesn't work if arch traps on div-by-0.
    Allow arbitrary outcomes including crashes?  UB.
  Other examples of UB:
    Dereferencing null pointer.
    Accessing mem outside allocated bounds.  (Can arbitrarily corrupt state.)
    Computing out-of-bounds pointers.  (Depends on pointer repr w/ segments.)
    Signed integer overflow.  (Depends on how signed integers are represented.)

Modern languages have UB too.
  Golang: unsynchronized access to slice variable is undefined.
  Slice variable consists of pointer + length (also cap, but not relevant here).
    Accessing slice: read length, check, then access pointer.
    Writing to slice variable: write pointer, then write length.
    Can easily access memory out-of-bounds, leading to arbitrary corruption.
  What would be the cost of making concurrent slice access well-defined?
    Extra sync operations on each slice variable access.
    Significant performance overhead for concurrent code.

C compilers take advantage of UB in surprising ways.
  Figure 1 from the STACK paper:
    if (buf + len >= end) ...
    if (buf + len < buf) ...
  What is this intending to catch?
  Why does the compiler drop the second check?
  Reasoning involves potential UB when computing (buf+len).
    Case 1: buf+len does not go out-of-bounds.  No overflow.  Not <buf.
    Case 2: buf+len goes out-of-bounds.  UB, so don't care about this case!
  More generally, suppose your code contains a statement that might trigger UB.
    Compilers interpret this as assertion that the UB is never triggered.
  Many compilers take advantage of many kinds of UB: Figure 3.

How to reason about C program semantics?
  Program behavior: P ⇓ B
  Behavior B consists of:
    Termination, divergence (inf. loop), or "going wrong" (undefined behavior)
    Trace of external functions (e.g., system calls, library calls)
      Function name, arguments, return value

What does it mean to correctly compile a program?
  Completely faithful: forall B, (S ⇓ B) <=> (C ⇓ B)
    (1) from the paper
    Why too strong?
      Non-determinism in S that's missing in C
      Undefined ("going wrong") behavior in S not preserved in C

    Example 1:
      int foo() {
        printf("foo");
        return 1;
      }
      int bar() {
        printf("bar");
        return 2;
      }
      printf("%d %d", foo(), bar());

      >> foo bar 1 2
      >> bar foo 1 2

    Example 2:
      int i;
      printf("Hello %d\n", i);

      >> Hello 5
      >> Hello 6
      >> ...

    Example 3 (from the paper's description):
      int i = 0;
      int j = 5/i;
      printf("Hello world!\n");

      >> ??

  Safe behavior subset: S safe => forall B, (C ⇓ B) => (S ⇓ B)
    (2) from the paper
    Why is this better?
      Intuitively, similar to trace inclusion that Butler described
      Anything that compiled code (C) does must be possible in source code (S)
    Works out well for example 1
    Works out well for example 2
    Works out in a boring way for example 3
      But that's perhaps the best we can do for that program
      C's undefined behavior says that program could do anything

  Deterministic semantics: forall B \notin Wrong, (S ⇓ B) => (C ⇓ B)
    (3) from the paper
    Why is this better?
      Easier to prove by induction / case analysis on S
      C = f(S), loosely speaking, where f() is the compiler
      "simpl" works well
      inverse of "f" is difficult to reason about in a proof

    But works only for deterministic source and target languages..
      Top-level C is not deterministic
      Some hardware instruction sets are not deterministic

    Deal with C non-determinism at the top level
    Don't emit any hardware instructions that aren't deterministic
      What about floating point?  Not sure what's the answer.

    Still works for example 2
      More precisely, "same amount of non-determinism in S and C"?

Why is behavior deterministic?  What about external inputs?
  CompCert's model assumes the program executes in a specific "external world".
    Same external input will appear for both the source and compiled code.
  External inputs are pre-determined, even if they aren't known to the program yet.
  Effectively, CompCert's statements are all "forall ExternalWorld, ...".

What does it mean to verify a compiler (i.e., what spec should compiler satisfy)?
  Verified compiler: correct compilation if returned OK (definition 6)
    Overall CompCert takes this form
    Is it OK for CompCert to crash?
  Verified validator: correct compilation if validator returned OK (definition 7)
    Why is this desirable?
    Probably too difficult for complete C compiler.
    Works well for some internal passes inside CompCert.
    E.g., register allocation coloring algorithm: easy to check result.
  Proof-carrying code: validator can check that resulting code satisfies some spec.
    Typically used where the spec is relatively simple (e.g., memory safety).
    Used in software fault isolation (SFI) systems like Google's Native Client.

Limitations of the spec: what is CompCert's spec good for / what is it missing?
  Safety properties
    Code will not produce results disallowed by C code
  Liveness properties
    Compiler might refuse to compile code
    Can compiled code refuse to do anything?  Spin forever?
      Large delay seems to be allowed
      Infinite loop falls into "divergence"; not allowed unless source loops
  Security?
    Private key erased from memory, if C source has a memset()?
      C spec doesn't promise this directly; safe to optimize away memset.
      C11 provides memset_s that cannot be optimized away.
        But what does this mean?
        What if compiler made a temporary copy of the data on some stack?
    Uninitialized data does not contain something sensitive?
    Requires that program does not have any non-determinism that's resolved
      in some "surprising" way during the early pass of C -> Clight
  Others:
    Linking
    Concurrency

How to model C semantics; e.g., memory model?
  Much more abstract than what hardware provides (flat addresses)
  Memory consists of disjoint blocks
  Block has a certain number of bytes
  Pointer refers to a block and offset-within-block
  Block "identifiers" are not reused (e.g., after a block is freed)

  Is it OK that this model cannot represent pointer arithmetic
    crossing into another block?

  CompCert does, of course, eventually flatten the memory model
    in its compilation pipeline..

How to verify a C program that is then compiled using CompCert?
  Easier in Coq:
    - turn the C program into Clight
    - reason about program using CompCert's semantics for Clight
    - compile using CompCert
    no ambiguity about C parsers or Clight semantics!
  Outside of Coq (e.g., using a model checker or SMT-based analyzer):
    Need to make sure C semantics match CompCert's parser + Clight.

Limitations of the implementation
  Some less-frequently-used aspects of C not originally supported
  CompCert 3.1 supports many of them now (long long; goto; varargs)
  Some are handled during unverified C-to-Clight parsing/translation
    (e.g., struct parameters and return values)

What is not verified in CompCert?
  C parsing / input
    But now done in a cleaner way than the original CACM paper
    Verified parser produced from a grammar for the C language
  Assembly

How to structure the proof?
  Layers: Clight, ...
  Many passes: Figure 1
  Explicit semantics for each little language
    Quite a bit of effort to define languages (10% of the source code)
    But probably saves on the proof effort (which is 76% of the source code)

  Verification vs validation
    Register assignment coloring
    Implemented in Ocaml
    Easier to write (and prove) a checker in Coq

  Diagram: section 4.3
    Similar to Butler's commutative diagrams
    Execution here goes down
    "Abstraction relation" holds across (tilde)
    Trace is external functions, rather than entire program's execution
    Instead of reasoning about a particular implementation and spec,
      we are reasoning about ANY program in S1 and
                             ANY corresponding compiled program in S2
    Tilde "abstraction relation" says S2's code is a compilation of S1's, etc

How effective is formal verification?
  Csmith: tool for finding compiler bugs
    http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf
  Found some interesting bugs in CompCert's unverified parts
    Misinterpreted integer semantics when parsing C into Clight
    Generated invalid assembly (missing constraint on immediate constants)
  But no bugs in verified "middle" portion, unlike every other compiler