L22: Study of correctness of verified systems

Paper: "An empirical study of the correctness of formally verified distributed
systems" by Fonseca et al. (Eurosys 2017)

Why this paper?
  Bugs in verified systems
    Incl. systems we covered earlier (e.g., IronFleet)
  Bugs in interface between verified/unverified
  Bugs invalidate guarantees, crash systems
  Testing of spec and fuzzing interfaces

How is possible that there are bugs in verified systems?
  System doesn't run in isolation  (see Fig 1)
  System relies on unverified parts:
    unverified OS (e.g., for sending packets)
    unverified use of tools (e.g., verifier)
    unverified specs
      spec may promise too little or too much (e.g., at-most-once or not?)
  System makes wrong assumptions about unverified parts
    Bugs in those parts can result in correctness violations
    
Do the verified pieces have bugs?
  No ... as least as far the authors were able to discover
  Could there have been bugs?  Maybe, we just don't know
    Z3 is complex
    Coq relies on unproven core
    Easy to admit lemmas without proofs
    ...
  How did the authors find the bugs
    Study code and design for unverified assumptions
    fuzzing of shim layer (e.g., packets, disk)
    Cross-checking across systems

What if you verified all components in Figure 1?
  Verify application, tools, OS
  You would find mismatches in between system and OS, etc.
  Incredible amount of work!

IronFleet  (topic of previous lecture)
  Implementation of replicated-state machine library
  Verified counter on top of library
  Complex tool chain involving Dafny, Boogie, and Z3
    Easy to misuse tools
  Bugs
    None in shim layer!
    Few tool misuses
    Specification ambiguity

Ironfleet: tool bugs 
  I2: bug in parsing Dafny output
    two messages per invocation
      diagnostic message
      prover crashed
    second message ignored
    the tool build the unverified binary
  I4: Z3 crashing
    z3.exe build for 64-bit architecture
    binary libraries for 32-bit architecture
  I3: killing Z3
    SIGINT terminates Z3, but Dafny reports no error

Ironfleet: specification ambiguity
  Background: exactly-once semantics
    A client put for a K/V server is executed once
  Challenge in providing exactly-once semantics
    A client may not hear from a server
    Re-send same request to another server
    Servers see duplicates
    Typical implementation filters duplicates
  Ironfleet Spec didn't guarantee exactly-once semantics
    But, implementation did provide it!
    Impl. without duplicate detection meets spec
      Probably incorrect
    But, it would be difficult to prove k/v server
      Good to verify application to detect problems in spec

Verdi: replicated-state machine server
  Coq, translated into OCaml
    OCaml part of TCB
  Some univeried OCaml shims
    TCP for RPC

Verdi: shim layer bugs
  V1: recv() can return incomplete message
    server crashes in that case
  V2: incorrect marshaling
    uses newline and space to separate commands from arguments
    client can inject commands because no escaping of newline and space
      a single command becomes two commands
      some commands crash server
      incorrect results to client
  V3: large file system writes to add log entries
    large writes are not atomic
    incomplete log entries
    is lab infrastructure correct?
  V4: incorrect snapshot update
    old deleted before new finished
    can cause data loss
  V5: failure to read snapshot
    shim assumed on failure of open(), snapshot didn't exist
      but snapshot may exist
      open can fail because out of out of file descriptors
    can cause data loss
  V8: stack overflow
    recursive call to findGtIndex() on a big list
    crashes leader
      
What to learn from this paper?
  Don't do verification?
    No.  No bugs in verified components
    (As far as we know)
    While unverified systems have protocol bugs
    Verification improves the situation
  But, verification relies on assumptions
    Need to check assumptions
      Fuzzing assumptions
      Use spec to verify application
  Unverified systems probably also have bugs in shim layer