L12: Yggdrasil
Paper: "Push-Button Verification of File Systems via Crash Refinement" by
Sigurbjarnarson et al. (OSDI 16)
Source code: https://github.com/locore/yggdrasil.git
Why this paper?
Same topic as FSCQ paper
But, developer doesn't write proofs
Write specs in a way that SMT solver can solve them
SMT = satisfiability Modulo Theories
SMT solvers not as powerful as Coq
Surpring that one can verify a file system in this way
Key ideas:
Crash refinement
Cleverly designed abstractions
Some limitations
Data structures must have a limited (small) size
No recursion in implementation
E.g.. Cannot reason about file system trees a la FSCQ
Bounded loops
Crash refinement
Code disk states are a subset of spec's disks states
Spec is transactional disk: possible states are old or new
Code is write-ahead log implementation
need to show that all possible states of C are a subset of S
that is either correspond to old or correspond to new
need to model that C may run a recovery procedure after crash
How to compute all possible disk of C and S?
Represent all possible disks as a Z3 expression
Yggdrasil provides a "symbolic" async disk using Z3
on: data is in disk cache
sync: data is on persistent storage
a write
f(s, x, b) = s[a -> if on /\ sync then v else s(a)]
where x = (a, v) and b = (on, sync)
adds constraint: previous sync and on
a flush() adds constraints on the symbolic values
if sync, then on is true too
Crash states
sequence: write a; fsyn; crash
disk state:
sync = 0 (fsync didn't complete)
-> on is 0 or on is 1 (disk might flush on its own)
sync = 1 (fsync did complete)
-> on is 1
sequence: write a,v0; fsync; write a, v1
disk state:
sync = 1 and on=1 for previous writes
sequence: write a, v0; write a, v1
disk state:
sync = 1 for previous writes
Verifying
Run a test case on the symbolic code disk
Creates a Z3 expression for all possible code disks (e1)
Run same test case on symbolic spec disk
Produces another Z3 expression (e2)
Ask Z3 to find counter example by asking NOT(e1 => e2)
If counter example, then bug with explanation
If not, then e1 => e2
Example: test_atomic() [test_waldisk.py]
one part of the proof that waldisk refines transactional disk
runs with one block write
we hope that is atomic!
runs with two block write
test the WAL implementation
after a crash, the current blocks are all new or all old
python2 test_waldisk.py
Z3 couldn't find counter example
This is part of the theorem 2 in the paper
(Show Z3 expression being solved, uncomment line 184 in test.py)
Z3 SMT solver
powerful constraint solver
arithmetic: x+2 = y
array theory: a[2] = 10
uninterpreted functions: f(x) = x
quantifiers
propositional logic
...
bitvectors
small demo (see http://ericpony.github.io/z3py-tutorial/guide-examples.htmi)
z3 outcome: proof, counter example, or timeout
authors don't ask Z3 to prove, but instead as finding a counter example
no counter example => proof
authors use z3 cleverly to avoid timeouts
Write-ahead logging implementation
Roughly implementation as in FSCQ lecture
See [yydrasil/waldisk.py]
Draw picture
log disk
data disk (e.g., inodes)
data disk (e.g., file blocks)
These disks can be "symbolic" or real
Symbolic one used for verification
Real disk used for running the file system
Stores actual values [diskimp.pyx]
Cpython translates it into a C program
Modify implementation
Comment out line 110
Run again python2 test_waldisk.py
Produces a counter example
Asynchronous symbolic disk
Created in [test_wal.py] and passed into waldisk.py
[yggdrasil/ufarray.py]
the disk is a Z3 array of blocks (a Z3 bitvector)
an update to a block is modeled as uninterpreted function
Operations: read, write, flush, ...
[yggdrasil/diskspec.py, Class AsyncDisk]
Same spec has FSCQ, but "specified" in python, symbolicly
maintains cache
write:
adds to cache
update the interpretive function describing the disk
read: reads from cache
flush: adds constraint sync -> on
crash: throws away cache
Symbolic execution
Previous example involves only writes
Challenge:
b = read(0, 0)
if b == 0: xxx
else: yyy
Solution: Solver forks itself into two solvers
One solver: b = 0 and execute xxx branch
Other solve: b != 0 and execute yyy branch
Unroll loops
See __recover in [waldisk.py]
Design ideas to make Z3 not timeout
Use strict layering
Proof crash refinement per pair of layers
Two-level representation of disk
Use interpretive functions for reading/writing blocks
Use bitvectors for blocks
Alternative plan: array of blocks
Use multiple disks
one disk for log
one disk for metadata
one disk for data blocks
one disk for free bit block map
...
Each inode has its own virtual disk
Making things finite
All data structures have a fixed upperbound
Log disk has 10 blocks
Directories have 512 entries
Make search of dir entry untrusted
but validate (a la CompCert)
Make spec finite
Abstract of file system is not a tree
Simple top-level spec
No pathnames
Doesn't model delayed commit and recovery
May make it difficult to prove certain applications on top of Yggdrasil
Summary
Clever use of Z3 to prove crash refinement for a file system
No proof effort!
Limited by what Z3 can express