CHESS
=====

Paper: "Finding and reproducing Heisenbugs in Concurrent Programs" by Musavathi
et al (OSDI 2008)

Why read this paper?
  Concurrency is good area for verification
    Subtle bugs
    Testing is difficult (e.g., incomplete)
  Core challenge:
    Reasoning about many orderings
  Insight
    Few threads and preemption points necessary to find bugs
  Cool tool
    Works on real systems
    Finds bugs
    Reproduces interleaving

Problem: Heisenberg bugs
  Result of a race condition in a parallel program
  Bugs that show up only in some runs of a program
    E.g., Test case fails once in a few months
  Difficult to find
    Testing for it may change behavior and not result in bug
  Difficult to reproduce

Example: bug in Concurrency and Coordination Runtime (CCR)
  Test didn't fail for many months, then failed
    Chess found it in 20 secs
    6737 interleavings
  Bug: CCR ports (Concurrency and Coordination Runtime)
    ports: https://msdn.microsoft.com/en-us/library/bb648755.aspx
    spec: atomically process tasks associated with a port
    race in two-phase commit protocol
    Thread A: registers it, but preempted before grabbing tasks
    Thread B: cancels tasks, but is preempted after first task
    Thread A: wakes up and also processes the tasks
  Chess can reproduce it

CHESS plan: try different executions
  1. Control all sources of non-determinism
  2. Run with different non-deterministic choices, check for bug
  3. Search heuristics to guide choice of next set of choices to try

Naive approach:
  Run test
  Scheduler forces all interleaving of instructions of threads
    run first thread for one instruction
    then 2nd thread for one instruction
    then 3rd thread for one instruction
    backout 2nd thread, run 3rd thread for one instruction
    etc.
  To back out
    Record program state of a thread, so that you can roll it back
  If program re-executes system call
    Replay recorded results from first invocation
  If test fails
    Race happened
    Report bug for the recorded interleaving
  What's a bug?
    Crash: null pointer, exception, etc.
    Deadlock/livelock: test never finished.
    Specialized checkers: memory leak checker, use-after-free.

This approach is too expensive
  Ok for small programs
  Not good for whole systems

Observation: interleave at *preemption points*
  preemption point: point where thread may be rescheduled
    Thread synchronizes with another thread
    Thread release the processor
    Thread acquire locks
    ...
  Run a thread atomically between preemption points
    
Approach: interleave only at preemption points
  Take a test for a program P
  Link with a wrapper library
    Intercepts all preemption points
      When a thread creates another thread
      When thread acquires lock
      When thread performs a blocking operation
      ...
      Code that is likely involved in a data race
    Calls scheduler at preemption points

Basic operation of scheduler
  Replay phase
    Replay scheduling decisions to some preemption point
    Select another thread to run, and go to record phase
  Record phase
    Run a thread until it yields, then scheduler runs another thread
      Scheduler prefers *not* to preempt a thread
    But record at each preemption point, the enabled threads
  Search phase
    If test completes, determine a new schedule
    Go to replay phase

Design scheduler
  Don't want scheduler to know about details of each synchronization primitive
  Don't want to reason about threads in detail (e.g., execution speeds)
  Idea: happens-before graph
    nodes = instructions
    edge = enforces partial order on instructions
    each node is annotated with (a task, sync variable, operation)
      task = thread, callback, etc.
      sync variable = resource (lock, semaphore, etc) used by task
      operation = isWrite and isRelease
    if isWrite for node n:
      add edge from preceding nodes with same resource to n
      add edge from n to all subsequent nodes with the same resource
    if isRelease is set by a node n:
      add waiting tasks to set of enabled tasks
  If program is data-race free, edges order accesses
    For example, if threads share a variable, protected by a lock

Suppose we have primitives for a concurrent queue (without an upper size bound)
  What would a queue push look like?  Sync var, isWrite, isRelease.
  What would a queue pop look like?  Sync var, isWrite, not isRelease.
  What would a queue length look like?  Sync var, not isWrite, not isRelease.

Implementation
  Wrapper
    Dynamically-linked library
    Binary rewriting
      For hardware-level instructions
  Win32, .NET, Singularity
  How does a wrapper determine that a call with block?
    Use the "try" version of the call
  Conservative
    set isWrite to True
    set isRelease to True

Why the focus on test cases instead of whole programs?
  Important concurrent code already likely has test cases.
  Need to repeatably run the same test with different thread schedules.
  Want to get to running interesting code quickly.

Challenges
  Benign races
    How to capture them?
    Enforce single-threaded execution!
    No concurrent access, no data race
    Slower, but run Chess instances in parallel
    May miss heisenbugs
    Solution
      run data-race detector on the first few runs
      probably something like Eraser
      add preemption points at data race
  Handling all preemption points
    Many of them (Win32 has many sync primitives)
    At different levels of abstraction (e.g., Win32 and HW)
    Careful to pick right level of abstraction
      Locks vs atomic instructions
    Difficult but not too bad
    Tests help them
  Imperfect replay
    Relies on replay to construct state at preemption point
    Replay is light-weight
      Misses sources of non-determinism
      Lazy initialization
      Interference from environment
      Non-deterministic calls
  Reduce interleavings
    Avoid unfair schedules
    Ignore preemptions in libraries
    Special case accesses to volatile variables
    Scheduler recognizes early-visited states
      Same happens-before graph
    Run with few threads

Results
  Found many known bugs
  Two unknown bugs

What else do developers do in practice?
  Run stress tests with many threads, in hopes of triggering races
    Helpful but still does not catch many bugs.
  Write specialized tools that stress critical concurrent code
    [ https://www.kernel.org/doc/Documentation/locking/locktorture.txt ]
    [ https://www.kernel.org/doc/Documentation/RCU/torture.txt ]
    Shows importance of looking for bugs in concurrent code.
    Huge effort for application code.
  Run tools that check for locking rules:
    [ http://cseweb.ucsd.edu/~savage/papers/Tocs97.pdf ]
    [ https://golang.org/doc/articles/race_detector.html ]
    [ https://clang.llvm.org/docs/ThreadSanitizer.html ]
    [ https://github.com/google/ktsan ]
    Focused heavily on locks.
    These tools check specific executions, but do not enumerate executions.
    Would not catch bugs that require specific thread interleaving.
      CCR and PLINQ and Singularity bugs.

Can find Chess find all Heisenberg bugs?
  No, driven by specific set of tests
  No, imperfect replay
    Difficulty handling benign races