L22: Study of correctness of verified systems Paper: "An empirical study of the correctness of formally verified distributed systems" by Fonseca et al. (Eurosys 2017) Why this paper? Bugs in verified systems Incl. systems we covered earlier (e.g., IronFleet) Bugs in interface between verified/unverified Bugs invalidate guarantees, crash systems Testing of spec and fuzzing interfaces How is possible that there are bugs in verified systems? System doesn't run in isolation (see Fig 1) System relies on unverified parts: unverified OS (e.g., for sending packets) unverified use of tools (e.g., verifier) unverified specs spec may promise too little or too much (e.g., at-most-once or not?) System makes wrong assumptions about unverified parts Bugs in those parts can result in correctness violations Do the verified pieces have bugs? No ... as least as far the authors were able to discover Could there have been bugs? Maybe, we just don't know Z3 is complex Coq relies on unproven core Easy to admit lemmas without proofs ... How did the authors find the bugs Study code and design for unverified assumptions fuzzing of shim layer (e.g., packets, disk) Cross-checking across systems What if you verified all components in Figure 1? Verify application, tools, OS You would find mismatches in between system and OS, etc. Incredible amount of work! IronFleet (topic of previous lecture) Implementation of replicated-state machine library Verified counter on top of library Complex tool chain involving Dafny, Boogie, and Z3 Easy to misuse tools Bugs None in shim layer! Few tool misuses Specification ambiguity Ironfleet: tool bugs I2: bug in parsing Dafny output two messages per invocation diagnostic message prover crashed second message ignored the tool build the unverified binary I4: Z3 crashing z3.exe build for 64-bit architecture binary libraries for 32-bit architecture I3: killing Z3 SIGINT terminates Z3, but Dafny reports no error Ironfleet: specification ambiguity Background: exactly-once semantics A client put for a K/V server is executed once Challenge in providing exactly-once semantics A client may not hear from a server Re-send same request to another server Servers see duplicates Typical implementation filters duplicates Ironfleet Spec didn't guarantee exactly-once semantics But, implementation did provide it! Impl. without duplicate detection meets spec Probably incorrect But, it would be difficult to prove k/v server Good to verify application to detect problems in spec Verdi: replicated-state machine server Coq, translated into OCaml OCaml part of TCB Some univeried OCaml shims TCP for RPC Verdi: shim layer bugs V1: recv() can return incomplete message server crashes in that case V2: incorrect marshaling uses newline and space to separate commands from arguments client can inject commands because no escaping of newline and space a single command becomes two commands some commands crash server incorrect results to client V3: large file system writes to add log entries large writes are not atomic incomplete log entries is lab infrastructure correct? V4: incorrect snapshot update old deleted before new finished can cause data loss V5: failure to read snapshot shim assumed on failure of open(), snapshot didn't exist but snapshot may exist open can fail because out of out of file descriptors can cause data loss V8: stack overflow recursive call to findGtIndex() on a big list crashes leader What to learn from this paper? Don't do verification? No. No bugs in verified components (As far as we know) While unverified systems have protocol bugs Verification improves the situation But, verification relies on assumptions Need to check assumptions Fuzzing assumptions Use spec to verify application Unverified systems probably also have bugs in shim layer