L1: POCS intro * POCS: Principles of Computer Systems Discuss key ideas in computer systems Isolation, locking, logging, consistency, Paxos Ideas you have seen in 6.033, 6.828, 6.824, ... But in a principled manner With specifications and proofs * Motivation: avoid bugs Systems are subtle and complex Concurrency Distributed Security Fault tolerance Performance Complexity leads to bugs Benign bugs But also serious ones: users may loose their data systems leaks confidential data * Focus in POCS labs on fault tolerance but papers will touch on other challenges two kinds of failures crash (e.g., power failures) defects/faults (e.g., dead disk, dead sector) assumption: fail-stop after failure, computer systems stops immediately on crash, computer reboots and one can run repair * Example systems that must handle failures Single-node systems File systems Databases Key/value stores Distributed systems Distributed KVs ... Embedded systems Flash-translation layer (FTL) in SSDs * A key challenge: crash-safety of persistent state Both in single-node and distributed systems System runs updating persistent state Crash happens Must recover to a correct state perhaps ok to loose some user data but internal system structure must survive * Why is achieving crash safety hard? Persistent state becomes inconsistent E.g., failure during a two-step operation (e.g., updating free list and file) -> Block is marked allocated, but isn't in use -> Block is part of a file, but not marked allocated Storage runs concurrent with program Programs sends several disk request to disk, disk works on them After disk failure, which requests have been processed? Storage devices are slower than main memory Many complex optimizations to avoid disks writes Complexity leads to more opportunities for problems * Real-world examples Linux file system study [http://pages.cs.wisc.edu/~ll/papers/fsstudy.pdf] FSCQ slides [https://pdos.csail.mit.edu/papers/fscq:sosp15-slides.pdf] SSD study [https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf] Bugs can have bad consequences * Explanation of comments in slide [bugs.pdf] * Standard logging (slow but correct) data + metadata through log flush log, flush commit hdr * Log-bypass (first code comment) metadata through log, data direct flush data, flush log, flush commit hdr [can combine flush of data+log] * checksums (second code comment) put checksum(log) in hdr flush log+hdr together incompatible with log-bypass! * How to find/avoid these bugs? Testing Works well! But doesn't demonstrate absence of bugs Model checking Check all symbolic states Hard to capture key properties without state explosion Verification Prove implementation has no bugs All approaches are active areas of research 6.826 focuses on verification But we will read papers about the other approaches too * Verification Three parts to verification Specification: a precise description of the desired behavior of a system Implementation: an implementation of a system Proof: mathematical proof showing implementation meets specification Old area, but has become hot again Verification technology has become good enough to tackle complex systems * Real-world successes of verification CompCert Amazon's use of specs Maybe more in the future * Specifications Properties of a good spec: Describe the behavior of the system Tell client and implementer what each need to know Precise enough to be able to prove theorems based on spec Leave out unnecessary requirements that may confuse clients make proofs intractable This is hard! Many ways to describe behavior of a system Easy to miss subtle cases Bugs in the specs are problematic Specifications on their own have much value See Amazon paper next week The old POCS notes [http://bwlampson.site/48-POCScourse/48-POCS2006.pdf] * Implementations/Code Implementations are not much code in POCS POCS focuses on the core of a system with subtle issues sometimes complex ones too * Proofs By hand or machine-checked Old POCS was pen&paper proofs New POCS focuses on machine-checked Write spec, code, and proof in proof assistant (Coq) Coq checks proof Doing a proof in Coq requires developing a proof argument Often done on paper Old POCS notes influenced much recent research in verified systems * Class structure Labs to teach you to write specs and prove implementations correct Readings to teach you about the state of the art verification Many recent papers, fast developing area of research Grading: labs and class participation No quizzes Staff === Coq overview proof assistant expressive: functional programs abstract mathematical definitions theorems proofs scripting language for proofs assistant: checks proofs rather than generates all proofs can write scripts to automate proof generation can generate running code coq -> haskell -> executable binary Demo Bank.v bank definition nat balances transfer extraction "Separate Extraction" look at Top.hs look at Main.hs ghc Main ./Main 5 works 20 is nonsense fix and prove no more bugs transfer' total theorem: total remains the same cases omega toy example; real programs are more sophisticated in many dimensions mutable state (variables) I/O (disk, network, user) crashes concurrency assembly code .. but each dimension adds complexity difficult to verify existing programs (say, Linux kernel) specs are complicated and messy, or not well-defined at all language is not a great fit for verification complex semantics of what C statements mean might run differently depending on which C compiler or flags are used ... code not structured for verification: too much complexity all at once mutable shared memory, IO, crashes, concurrency, assembly, ... verification effort is typically larger than effort to build new impl so, makes sense to rewrite impl to simplify verification labs will explore how to deal with some of these complexities mutable state I/O crashes concurrency (limited form) no assembly code .. but carefully introducing each form of complexity final lab: disk mirroring (RAID 0) demo [ run everything as root ] ./bin/replicate-nbd init ./bin/replicate-nbd start & modprobe nbd nbd-client localhost /dev/nbd0 mkfs -t ext4 /dev/nbd0 mount /dev/nbd0 /mnt/x cd /mnt/x echo hello world > test.txt sync strings disk0.img strings disk1.img rm disk0.img echo another test > foo.txt sync strings disk1.img cat test.txt umount /mnt/x nbd-client -d /dev/nbd0