L12: Yggdrasil Paper: "Push-Button Verification of File Systems via Crash Refinement" by Sigurbjarnarson et al. (OSDI 16) Source code: https://github.com/locore/yggdrasil.git Why this paper? Same topic as FSCQ paper But, developer doesn't write proofs Write specs in a way that SMT solver can solve them SMT = satisfiability Modulo Theories SMT solvers not as powerful as Coq Surpring that one can verify a file system in this way Key ideas: Crash refinement Cleverly designed abstractions Some limitations Data structures must have a limited (small) size No recursion in implementation E.g.. Cannot reason about file system trees a la FSCQ Bounded loops Crash refinement Code disk states are a subset of spec's disks states Spec is transactional disk: possible states are old or new Code is write-ahead log implementation need to show that all possible states of C are a subset of S that is either correspond to old or correspond to new need to model that C may run a recovery procedure after crash How to compute all possible disk of C and S? Represent all possible disks as a Z3 expression Yggdrasil provides a "symbolic" async disk using Z3 on: data is in disk cache sync: data is on persistent storage a write f(s, x, b) = s[a -> if on /\ sync then v else s(a)] where x = (a, v) and b = (on, sync) adds constraint: previous sync and on a flush() adds constraints on the symbolic values if sync, then on is true too Crash states sequence: write a; fsyn; crash disk state: sync = 0 (fsync didn't complete) -> on is 0 or on is 1 (disk might flush on its own) sync = 1 (fsync did complete) -> on is 1 sequence: write a,v0; fsync; write a, v1 disk state: sync = 1 and on=1 for previous writes sequence: write a, v0; write a, v1 disk state: sync = 1 for previous writes Verifying Run a test case on the symbolic code disk Creates a Z3 expression for all possible code disks (e1) Run same test case on symbolic spec disk Produces another Z3 expression (e2) Ask Z3 to find counter example by asking NOT(e1 => e2) If counter example, then bug with explanation If not, then e1 => e2 Example: test_atomic() [test_waldisk.py] one part of the proof that waldisk refines transactional disk runs with one block write we hope that is atomic! runs with two block write test the WAL implementation after a crash, the current blocks are all new or all old python2 test_waldisk.py Z3 couldn't find counter example This is part of the theorem 2 in the paper (Show Z3 expression being solved, uncomment line 184 in test.py) Z3 SMT solver powerful constraint solver arithmetic: x+2 = y array theory: a[2] = 10 uninterpreted functions: f(x) = x quantifiers propositional logic ... bitvectors small demo (see http://ericpony.github.io/z3py-tutorial/guide-examples.htmi) z3 outcome: proof, counter example, or timeout authors don't ask Z3 to prove, but instead as finding a counter example no counter example => proof authors use z3 cleverly to avoid timeouts Write-ahead logging implementation Roughly implementation as in FSCQ lecture See [yydrasil/waldisk.py] Draw picture log disk data disk (e.g., inodes) data disk (e.g., file blocks) These disks can be "symbolic" or real Symbolic one used for verification Real disk used for running the file system Stores actual values [diskimp.pyx] Cpython translates it into a C program Modify implementation Comment out line 110 Run again python2 test_waldisk.py Produces a counter example Asynchronous symbolic disk Created in [test_wal.py] and passed into waldisk.py [yggdrasil/ufarray.py] the disk is a Z3 array of blocks (a Z3 bitvector) an update to a block is modeled as uninterpreted function Operations: read, write, flush, ... [yggdrasil/diskspec.py, Class AsyncDisk] Same spec has FSCQ, but "specified" in python, symbolicly maintains cache write: adds to cache update the interpretive function describing the disk read: reads from cache flush: adds constraint sync -> on crash: throws away cache Symbolic execution Previous example involves only writes Challenge: b = read(0, 0) if b == 0: xxx else: yyy Solution: Solver forks itself into two solvers One solver: b = 0 and execute xxx branch Other solve: b != 0 and execute yyy branch Unroll loops See __recover in [waldisk.py] Design ideas to make Z3 not timeout Use strict layering Proof crash refinement per pair of layers Two-level representation of disk Use interpretive functions for reading/writing blocks Use bitvectors for blocks Alternative plan: array of blocks Use multiple disks one disk for log one disk for metadata one disk for data blocks one disk for free bit block map ... Each inode has its own virtual disk Making things finite All data structures have a fixed upperbound Log disk has 10 blocks Directories have 512 entries Make search of dir entry untrusted but validate (a la CompCert) Make spec finite Abstract of file system is not a tree Simple top-level spec No pathnames Doesn't model delayed commit and recovery May make it difficult to prove certain applications on top of Yggdrasil Summary Clever use of Z3 to prove crash refinement for a file system No proof effort! Limited by what Z3 can express