Study of correctness of verified systems
========================================

Paper:
  "An empirical study of the correctness of formally verified distributed systems"
  Fonseca et al.
  Eurosys 2017

Why this paper?
  Bugs in verified systems
    Incl. systems we covered earlier (e.g., IronFleet)
  Bugs in interface between verified/unverified
  Bugs invalidate guarantees, crash systems
  Testing of spec and fuzzing interfaces

How is possible that there are bugs in verified systems?
  System doesn't run in isolation  (see Fig 1)
  System relies on unverified parts:
    unverified OS (e.g., for sending packets)
    unverified use of tools (e.g., verifier)
    unverified specs
      spec may promise too little or too much (e.g., at-most-once or not?)
  System makes wrong assumptions about unverified parts
    Bugs in those parts can result in correctness violations

Do the verified pieces have bugs?
  No ... as least as far the authors were able to discover
  Could there have been bugs?  Maybe, we just don't know
    Z3 is complex
    Coq relies on unproven core
      [ Ref: https://github.com/coq/coq/blob/master/dev/doc/critical-bugs ]
    Easy to admit lemmas without proofs
    ...

How did the authors find the bugs
  Not looking for bugs in verifier or OS itself
  Study code and design for unverified assumptions
  fuzzing of shim layer (e.g., packets, disk)
  Cross-checking across systems
  Serious effort: paper claims over 8 months of looking for bugs
    Not many other papers looking at this problem
    Only other example cited was an analysis of compiler bugs
      Similar findings for CompCert
      No bugs in verified core
      Bugs in unverified parts

What if you verified all components in Figure 1?
  Verify application, tools, OS
  You would find mismatches in between system and OS, etc.
  Incredible amount of work!

IronFleet  (topic of previous lecture)
  Implementation of replicated-state machine library
  Verified counter on top of library
  Complex tool chain involving Dafny, Boogie, and Z3
    Easy to misuse tools
  Bugs
    None in shim layer!
    Few tool misuses
    Specification ambiguity

Ironfleet: tool bugs
  I2: bug in parsing Dafny output
    two messages per invocation
      diagnostic message
      prover crashed
    second message ignored
    the tool build the unverified binary
  I4: Z3 crashing
    z3.exe build for 64-bit architecture
    binary libraries for 32-bit architecture
  I3: killing Z3
    SIGINT terminates Z3, but Dafny reports no error

What to do about tool bugs?
  Test that verification doesn't go through with bugs.
  Make verification a more integral part of the build.

Ironfleet: specification ambiguity
  Background: exactly-once semantics
    A client put for a K/V server is executed once
  Challenge in providing exactly-once semantics
    A client may not hear from a server
    Re-send same request to another server
    Servers see duplicates
    Typical implementation filters duplicates
  Ironfleet Spec didn't guarantee exactly-once semantics
    But, implementation did provide it!
    Impl. without duplicate detection meets spec
      Probably incorrect (in the sense of being unexpected)
    It would be difficult to prove k/v server
      Good to verify application to detect problems in spec

Spec issues in other papers we've read
  Netcore (previous lecture): theorem doesn't cover output packets
  FSCQ: spec for fdatasync didn't match what some users expected
  FSCQ: spec didn't mandate specific error codes

What to do about spec bugs?
  Prove properties about the spec.
  Prove applications on top of the spec.
  Test spec (e.g., against other implementations).
    SibylFS.

Verdi: replicated-state machine server
  Coq, translated into OCaml
    OCaml part of TCB
  Some univeried OCaml shims
    TCP for RPC

Verdi: shim layer bugs
  V1: recv() can return incomplete message
    server crashes in that case
  V2: incorrect marshaling
    uses newline and space to separate commands from arguments
    client can inject commands because no escaping of newline and space
      a single command becomes two commands
      some commands crash server
      incorrect results to client
  V3: large file system writes to add log entries
    large writes are not atomic
    incomplete log entries
    is lab infrastructure correct?
  V4: incorrect snapshot update
    old deleted before new finished
    can cause data loss
  V5: failure to read snapshot
    shim assumed on failure of open(), snapshot didn't exist
      but snapshot may exist
      open can fail because out of out of file descriptors
    can cause data loss
  V8: stack overflow
    recursive call to findGtIndex() on a big list
    crashes leader

Chapar: shim layer bugs
  Network implementation used UDP
  C1: Assumed network was reliable, did not duplicate packets
    Retransmitted state update overwrote later updates
    Protocol was proven correct assuming no duplicates
  C2: Assumed network could not lose packets
    Lost packet caused an update to never arrive
    Protocol required waiting for this update
    System deadlocks after single lost packet
  C3: Odd OCaml network library semantics
    Shim model assumed network can send large packets
    OCaml library returned an error if write exceeded packet size
    But left previous parts of the packet in-place
    Later writes would append to this partial packet

Shim bugs also showed up in FSCQ's Haskell code
  Subtle issues in interaction with FUSE
  Invoking wrong extracted function

What to do about shim layer bugs?
  Expand verification boundary
  Cut the shim layer at a narrow interface with simpler assumptions
  Extend spec to cover resource use, liveness
  Shim layer should use well-designed, well-documented libraries
  Test shim layer (similar to spec testing, ala SibylFS)
  Test resource exhaustion

What to learn from this paper?
  Don't do verification?
    No.  No bugs in verified components
    (As far as we know)
    While unverified systems have protocol bugs
    Verification improves the situation
  But, verification relies on assumptions
    Need to check assumptions
      Fuzzing assumptions
      Use spec to verify application
  Unverified systems probably also have bugs in shim layer