CHESS ===== Paper: "Finding and reproducing Heisenbugs in Concurrent Programs" by Musavathi et al (OSDI 2008) Why read this paper? Concurrency is good area for verification Subtle bugs Testing is difficult (e.g., incomplete) Core challenge: Reasoning about many orderings Insight Few threads and preemption points necessary to find bugs Cool tool Works on real systems Finds bugs Reproduces interleaving Problem: Heisenberg bugs Result of a race condition in a parallel program Bugs that show up only in some runs of a program E.g., Test case fails once in a few months Difficult to find Testing for it may change behavior and not result in bug Difficult to reproduce Example: bug in Concurrency and Coordination Runtime (CCR) Test didn't fail for many months, then failed Chess found it in 20 secs 6737 interleavings Bug: CCR ports (Concurrency and Coordination Runtime) ports: https://msdn.microsoft.com/en-us/library/bb648755.aspx spec: atomically process tasks associated with a port race in two-phase commit protocol Thread A: registers it, but preempted before grabbing tasks Thread B: cancels tasks, but is preempted after first task Thread A: wakes up and also processes the tasks Chess can reproduce it CHESS plan: try different executions 1. Control all sources of non-determinism 2. Run with different non-deterministic choices, check for bug 3. Search heuristics to guide choice of next set of choices to try Naive approach: Run test Scheduler forces all interleaving of instructions of threads run first thread for one instruction then 2nd thread for one instruction then 3rd thread for one instruction backout 2nd thread, run 3rd thread for one instruction etc. To back out Record program state of a thread, so that you can roll it back If program re-executes system call Replay recorded results from first invocation If test fails Race happened Report bug for the recorded interleaving What's a bug? Crash: null pointer, exception, etc. Deadlock/livelock: test never finished. Specialized checkers: memory leak checker, use-after-free. This approach is too expensive Ok for small programs Not good for whole systems Observation: interleave at *preemption points* preemption point: point where thread may be rescheduled Thread synchronizes with another thread Thread release the processor Thread acquire locks ... Run a thread atomically between preemption points Approach: interleave only at preemption points Take a test for a program P Link with a wrapper library Intercepts all preemption points When a thread creates another thread When thread acquires lock When thread performs a blocking operation ... Code that is likely involved in a data race Calls scheduler at preemption points Basic operation of scheduler Replay phase Replay scheduling decisions to some preemption point Select another thread to run, and go to record phase Record phase Run a thread until it yields, then scheduler runs another thread Scheduler prefers *not* to preempt a thread But record at each preemption point, the enabled threads Search phase If test completes, determine a new schedule Go to replay phase Design scheduler Don't want scheduler to know about details of each synchronization primitive Don't want to reason about threads in detail (e.g., execution speeds) Idea: happens-before graph nodes = instructions edge = enforces partial order on instructions each node is annotated with (a task, sync variable, operation) task = thread, callback, etc. sync variable = resource (lock, semaphore, etc) used by task operation = isWrite and isRelease if isWrite for node n: add edge from preceding nodes with same resource to n add edge from n to all subsequent nodes with the same resource if isRelease is set by a node n: add waiting tasks to set of enabled tasks If program is data-race free, edges order accesses For example, if threads share a variable, protected by a lock Suppose we have primitives for a concurrent queue (without an upper size bound) What would a queue push look like? Sync var, isWrite, isRelease. What would a queue pop look like? Sync var, isWrite, not isRelease. What would a queue length look like? Sync var, not isWrite, not isRelease. Implementation Wrapper Dynamically-linked library Binary rewriting For hardware-level instructions Win32, .NET, Singularity How does a wrapper determine that a call with block? Use the "try" version of the call Conservative set isWrite to True set isRelease to True Why the focus on test cases instead of whole programs? Important concurrent code already likely has test cases. Need to repeatably run the same test with different thread schedules. Want to get to running interesting code quickly. Challenges Benign races How to capture them? Enforce single-threaded execution! No concurrent access, no data race Slower, but run Chess instances in parallel May miss heisenbugs Solution run data-race detector on the first few runs probably something like Eraser add preemption points at data race Handling all preemption points Many of them (Win32 has many sync primitives) At different levels of abstraction (e.g., Win32 and HW) Careful to pick right level of abstraction Locks vs atomic instructions Difficult but not too bad Tests help them Imperfect replay Relies on replay to construct state at preemption point Replay is light-weight Misses sources of non-determinism Lazy initialization Interference from environment Non-deterministic calls Reduce interleavings Avoid unfair schedules Ignore preemptions in libraries Special case accesses to volatile variables Scheduler recognizes early-visited states Same happens-before graph Run with few threads Results Found many known bugs Two unknown bugs What else do developers do in practice? Run stress tests with many threads, in hopes of triggering races Helpful but still does not catch many bugs. Write specialized tools that stress critical concurrent code [ https://www.kernel.org/doc/Documentation/locking/locktorture.txt ] [ https://www.kernel.org/doc/Documentation/RCU/torture.txt ] Shows importance of looking for bugs in concurrent code. Huge effort for application code. Run tools that check for locking rules: [ http://cseweb.ucsd.edu/~savage/papers/Tocs97.pdf ] [ https://golang.org/doc/articles/race_detector.html ] [ https://clang.llvm.org/docs/ThreadSanitizer.html ] [ https://github.com/google/ktsan ] Focused heavily on locks. These tools check specific executions, but do not enumerate executions. Would not catch bugs that require specific thread interleaving. CCR and PLINQ and Singularity bugs. Can find Chess find all Heisenberg bugs? No, driven by specific set of tests No, imperfect replay Difficulty handling benign races