L11: FSCQ

Paper: "Using crash hoare logic for certifying the FSCQ file system", Chen et
al. (SOSP 15)

Source code: https://github.com/mit-pdos/fscq

Why are we reading this paper?
  Example of systems verification
    File systems are critical infrastructure
    File systems have bugs
  Reasoning about crashes
    CHL will show up in lab 3 and 4
  File system proven in Coq
    Lot of class infrastructure was inspired by this paper
  Revisit some big database ideas
    Logging (will show up in lab 4)

Goal: crash safety
  File system runs, power fails computer reboots
  User can use file system again, as if crash didn't happen
    Storage devices powers up correctly
    File systems's internal data structures are correct
    User data is present (except recent files that haven't been saved yet)

Demo: crash FSCQ

Risk: crash exposes subtle bugs
  append to a file requires three disk writes
    1. write data
    2. write inode
    3. write free block map
  crash exposes intermediate states
    e.g., 1 and 2 happened, but not 3
    what to do in this case?
      is block free or not?
    changing the order doesn't help
      3, 1, 2
      what if crash after 3?

Standard solution: logging
  clean, principled solution used by many file systems
    never update file system in place
    first write to log, commit, apply log to file system
    if crash before commit, then don't apply
  not easy to implement (in particular if you want performance)
    e.g., bugs in ext4
  contribution of paper
    verifying crash safety of a file system using logging

Verifying logging
  what does correct mean?
  how to model the disk?
  how to model crashes?
  how to model reboot?
  how to proof correctness?
    without failures
      must prove core subset of POSIX correct
      open, close, read, write, create, mkdir, etc.
      requires reasoning about directories, inodes, etc.
    with failures

System overview
  Similar to labs, but fuse instead of nbd
  Spec, program, and proof written in Coq
  Extract to Haskell
  Link with Fuse library and run as a server on Linux
    Good: Run standard Linux programs
    Bad: Large TCB (Linux, Haskell, etc.)

Correctness/Specification
  1. atomic system calls
    all writes happen or none
  2. system calls are synchronous (in this paper)
    effects are on disk after system call returns
    high-performance file systems are asynchronous
  3. systems calls runs one by one, sequentially
    no concurrently
  2-3 are simplifications to help proofs
    follow-on paper removes restriction 2
    3 is open problem

Modeling disk
  simplified compared to real disks
  operations: read, write, sync
    labs has only synchronous read and write
  specifications for operations
    pre- and post-conditions
    Example: synchronous write
      {a -> v0} disk_write(a, v) {a -> v}
  -> : seperation logic
    only address a changed and nothing else
    we don't use it in the labs
      labs talk specs about whole state
    -> allows for more succinct specs
      e.g., when updating several locations
      
Asynchronous disks
  { a->0 } write (a, 10), write(a, 11) { a->?}
  what can be on disk if crash in this sequence?
    0, 10, and 11
  what will a read return?
   { a->0 } write (a, 10), r = read(a) { r = ?? }
   r = 10 (i.e., last value written)
  idea: model with value sets
    { a->0 } write (a, 10), write(a, 11) { a-> <11, {0, 10, 11}>}
       subsequent read returns 11
    { a->0 } write (a, 10), write(a, 11), sync() { a-> <11, {}>}
       sync flushes set
  after crash
    any value in value set 

Describing disk crashes
  Idea: Add a crash condition to pre- and postcondition
  Example crash condition:
    { a-><0, {} } write (a, 10) { a-> <10, {0, 10}>}  or
      { a-><0, {} } write (a, 10) { a-> <0, {}>}
  Models atomic sector write
    Common assumption in file systems
    SSDs from last paper seem to provide this property

Atomicity
  atomic_two_write spec
    topic of lab 3
    spec in figure 4, using logging
  *one* implementation:
    append (a1, v1), write (a2, v2) in log
    write commit block
    apply log: write a1, a2
  correctness of implementation
    no-crash: a1 and a2 are updated
    crash: log is in tact

Rep functions and Logical address spaces
  For example, log_rep in Figure 4
    Predicate describing the disk in terms of ->
    See figure 5
  log_rep takes a logical address space as an argument
    E.g., start_state
    allows abstracting away details (e.g., valusets)
  
Proving procedures
  Sequences of disk, read, and syncs
    Computation in Galina as in labs
  Example:
    recovery():  // simplified!
      r = read(c)
      if r then:
      	 write (a1, v1)
	 write (a2, v2)
	 write (0, c)
	 sync()
    post-condition (no cashes)
      chain pre and post conditions as in lab
    crash condition
      OR all crash conditions of operations of procedure

Reboot and recovery
  What is post condition of recovery()?
    log as has been applied
  What is crash condition of recovery()?
    log is in tact
  What is pre condition of recovery()?
    log is in tact
  Recovery is idem-potent
    crash condition => pre condition
    -> ok to crash during recovery and run recovery again

Example: specification of rename (fig 13)
  Putting normal and recovery execution together

Performance
  OK compared to synchronous file systems
  Bad compared to asynchronous file systems
  How to specify asynchronous file systems?
    See follow-on paper in SOSP'17

# Notes
# Ran out of time; didn't cover "Proving procedures" and rename spec
# Be more concrete about log_intact and the various conditions