Study of correctness of verified systems ======================================== Paper: "An empirical study of the correctness of formally verified distributed systems" Fonseca et al. Eurosys 2017 Why this paper? Bugs in verified systems Incl. systems we covered earlier (e.g., IronFleet) Bugs in interface between verified/unverified Bugs invalidate guarantees, crash systems Testing of spec and fuzzing interfaces How is possible that there are bugs in verified systems? System doesn't run in isolation (see Fig 1) System relies on unverified parts: unverified OS (e.g., for sending packets) unverified use of tools (e.g., verifier) unverified specs spec may promise too little or too much (e.g., at-most-once or not?) System makes wrong assumptions about unverified parts Bugs in those parts can result in correctness violations Do the verified pieces have bugs? No ... as least as far the authors were able to discover Could there have been bugs? Maybe, we just don't know Z3 is complex Coq relies on unproven core [ Ref: https://github.com/coq/coq/blob/master/dev/doc/critical-bugs ] Easy to admit lemmas without proofs ... How did the authors find the bugs Not looking for bugs in verifier or OS itself Study code and design for unverified assumptions fuzzing of shim layer (e.g., packets, disk) Cross-checking across systems Serious effort: paper claims over 8 months of looking for bugs Not many other papers looking at this problem Only other example cited was an analysis of compiler bugs Similar findings for CompCert No bugs in verified core Bugs in unverified parts What if you verified all components in Figure 1? Verify application, tools, OS You would find mismatches in between system and OS, etc. Incredible amount of work! IronFleet (topic of previous lecture) Implementation of replicated-state machine library Verified counter on top of library Complex tool chain involving Dafny, Boogie, and Z3 Easy to misuse tools Bugs None in shim layer! Few tool misuses Specification ambiguity Ironfleet: tool bugs I2: bug in parsing Dafny output two messages per invocation diagnostic message prover crashed second message ignored the tool build the unverified binary I4: Z3 crashing z3.exe build for 64-bit architecture binary libraries for 32-bit architecture I3: killing Z3 SIGINT terminates Z3, but Dafny reports no error What to do about tool bugs? Test that verification doesn't go through with bugs. Make verification a more integral part of the build. Ironfleet: specification ambiguity Background: exactly-once semantics A client put for a K/V server is executed once Challenge in providing exactly-once semantics A client may not hear from a server Re-send same request to another server Servers see duplicates Typical implementation filters duplicates Ironfleet Spec didn't guarantee exactly-once semantics But, implementation did provide it! Impl. without duplicate detection meets spec Probably incorrect (in the sense of being unexpected) It would be difficult to prove k/v server Good to verify application to detect problems in spec Spec issues in other papers we've read Netcore (previous lecture): theorem doesn't cover output packets FSCQ: spec for fdatasync didn't match what some users expected FSCQ: spec didn't mandate specific error codes What to do about spec bugs? Prove properties about the spec. Prove applications on top of the spec. Test spec (e.g., against other implementations). SibylFS. Verdi: replicated-state machine server Coq, translated into OCaml OCaml part of TCB Some univeried OCaml shims TCP for RPC Verdi: shim layer bugs V1: recv() can return incomplete message server crashes in that case V2: incorrect marshaling uses newline and space to separate commands from arguments client can inject commands because no escaping of newline and space a single command becomes two commands some commands crash server incorrect results to client V3: large file system writes to add log entries large writes are not atomic incomplete log entries is lab infrastructure correct? V4: incorrect snapshot update old deleted before new finished can cause data loss V5: failure to read snapshot shim assumed on failure of open(), snapshot didn't exist but snapshot may exist open can fail because out of out of file descriptors can cause data loss V8: stack overflow recursive call to findGtIndex() on a big list crashes leader Chapar: shim layer bugs Network implementation used UDP C1: Assumed network was reliable, did not duplicate packets Retransmitted state update overwrote later updates Protocol was proven correct assuming no duplicates C2: Assumed network could not lose packets Lost packet caused an update to never arrive Protocol required waiting for this update System deadlocks after single lost packet C3: Odd OCaml network library semantics Shim model assumed network can send large packets OCaml library returned an error if write exceeded packet size But left previous parts of the packet in-place Later writes would append to this partial packet Shim bugs also showed up in FSCQ's Haskell code Subtle issues in interaction with FUSE Invoking wrong extracted function What to do about shim layer bugs? Expand verification boundary Cut the shim layer at a narrow interface with simpler assumptions Extend spec to cover resource use, liveness Shim layer should use well-designed, well-documented libraries Test shim layer (similar to spec testing, ala SibylFS) Test resource exhaustion What to learn from this paper? Don't do verification? No. No bugs in verified components (As far as we know) While unverified systems have protocol bugs Verification improves the situation But, verification relies on assumptions Need to check assumptions Fuzzing assumptions Use spec to verify application Unverified systems probably also have bugs in shim layer