L11: FSCQ Paper: "Using crash hoare logic for certifying the FSCQ file system", Chen et al. (SOSP 15) Source code: https://github.com/mit-pdos/fscq Why are we reading this paper? Example of systems verification File systems are critical infrastructure File systems have bugs Reasoning about crashes CHL will show up in lab 3 and 4 File system proven in Coq Lot of class infrastructure was inspired by this paper Revisit some big database ideas Logging (will show up in lab 4) Goal: crash safety File system runs, power fails computer reboots User can use file system again, as if crash didn't happen Storage devices powers up correctly File systems's internal data structures are correct User data is present (except recent files that haven't been saved yet) Demo: crash FSCQ Risk: crash exposes subtle bugs append to a file requires three disk writes 1. write data 2. write inode 3. write free block map crash exposes intermediate states e.g., 1 and 2 happened, but not 3 what to do in this case? is block free or not? changing the order doesn't help 3, 1, 2 what if crash after 3? Standard solution: logging clean, principled solution used by many file systems never update file system in place first write to log, commit, apply log to file system if crash before commit, then don't apply not easy to implement (in particular if you want performance) e.g., bugs in ext4 contribution of paper verifying crash safety of a file system using logging Verifying logging what does correct mean? how to model the disk? how to model crashes? how to model reboot? how to proof correctness? without failures must prove core subset of POSIX correct open, close, read, write, create, mkdir, etc. requires reasoning about directories, inodes, etc. with failures System overview Similar to labs, but fuse instead of nbd Spec, program, and proof written in Coq Extract to Haskell Link with Fuse library and run as a server on Linux Good: Run standard Linux programs Bad: Large TCB (Linux, Haskell, etc.) Correctness/Specification 1. atomic system calls all writes happen or none 2. system calls are synchronous (in this paper) effects are on disk after system call returns high-performance file systems are asynchronous 3. systems calls runs one by one, sequentially no concurrently 2-3 are simplifications to help proofs follow-on paper removes restriction 2 3 is open problem Modeling disk simplified compared to real disks operations: read, write, sync labs has only synchronous read and write specifications for operations pre- and post-conditions Example: synchronous write {a -> v0} disk_write(a, v) {a -> v} -> : seperation logic only address a changed and nothing else we don't use it in the labs labs talk specs about whole state -> allows for more succinct specs e.g., when updating several locations Asynchronous disks { a->0 } write (a, 10), write(a, 11) { a->?} what can be on disk if crash in this sequence? 0, 10, and 11 what will a read return? { a->0 } write (a, 10), r = read(a) { r = ?? } r = 10 (i.e., last value written) idea: model with value sets { a->0 } write (a, 10), write(a, 11) { a-> <11, {0, 10, 11}>} subsequent read returns 11 { a->0 } write (a, 10), write(a, 11), sync() { a-> <11, {}>} sync flushes set after crash any value in value set Describing disk crashes Idea: Add a crash condition to pre- and postcondition Example crash condition: { a-><0, {} } write (a, 10) { a-> <10, {0, 10}>} or { a-><0, {} } write (a, 10) { a-> <0, {}>} Models atomic sector write Common assumption in file systems SSDs from last paper seem to provide this property Atomicity atomic_two_write spec topic of lab 3 spec in figure 4, using logging *one* implementation: append (a1, v1), write (a2, v2) in log write commit block apply log: write a1, a2 correctness of implementation no-crash: a1 and a2 are updated crash: log is in tact Rep functions and Logical address spaces For example, log_rep in Figure 4 Predicate describing the disk in terms of -> See figure 5 log_rep takes a logical address space as an argument E.g., start_state allows abstracting away details (e.g., valusets) Proving procedures Sequences of disk, read, and syncs Computation in Galina as in labs Example: recovery(): // simplified! r = read(c) if r then: write (a1, v1) write (a2, v2) write (0, c) sync() post-condition (no cashes) chain pre and post conditions as in lab crash condition OR all crash conditions of operations of procedure Reboot and recovery What is post condition of recovery()? log as has been applied What is crash condition of recovery()? log is in tact What is pre condition of recovery()? log is in tact Recovery is idem-potent crash condition => pre condition -> ok to crash during recovery and run recovery again Example: specification of rename (fig 13) Putting normal and recovery execution together Performance OK compared to synchronous file systems Bad compared to asynchronous file systems How to specify asynchronous file systems? See follow-on paper in SOSP'17 # Notes # Ran out of time; didn't cover "Proving procedures" and rename spec # Be more concrete about log_intact and the various conditions