L16: Chess

Paper: "Finding and reproducing Heisenbugs in Concurrent Programs" by Musavathi
et al (OSDI 2008)

Why read this paper?
  Concurrency is good area for verification
    Subtle bugs
    Testing is difficult (e.g., incomplete)
  Core challenge:
    Reasoning about many orderings
  Insight
    Few threads and preemption points necessary to find bugs
  Cool tool
    Works on real systems
    Finds bugs
    Reproduces interleaving

Problem: Heisenberg bugs
  Result of a race condition in a parallel program
  Bugs that show up only in some runs of a program
    E.g., Test case fails once in a few months
  Difficult to find
    Testing for it may change behavior and not result in bug
  Difficult to reproduce
  
Example: bug in Concurrency and Coordination Runtime (CCR)
  Test didn't fail for many months, then failed
    Chress found it in 20 secs
    6737 interleavings
  Bug: CCR ports
    ports: https://msdn.microsoft.com/en-us/library/bb648755.aspx
    spec: atomically process tasks associated with a port
    race in two-phase commit protocol
    Thread A: registers it, but preempted before grabbing tasks
    Thread B: cancels tasks, but is preempted after first task
    Thread A: wakes up and also processes the tasks
  Chess can reproduce it

Naive approach:
  Run test
  Scheduler forces all interleaving of instructions of threads
    run first thread for one instruction
    then 2nd thread for one instruction
    then 3rd thread for one instruction
    backout 2nd thread, run 3rd thread for one instruction
    etc.
  To back out
    Record program state of a thread, so that you can roll it back
  If program re-executes system call
    Replay recorded results from first invocation
  If test fails
    Race happened
    Report bug for the recorded interleaving

This approach is too expensive
  Ok for small programs
  Not good for whole systems

Observation: interleave at *preemption points*
  preemption point: point where thread may be rescheduled
    Thread synchronizes with another thread
    Thread release the processor
    Thread acquire locks
    ...
  Run a thread atomically between preemption points
    
Approach: interleave only at preemption points
  Take a test for a program P
  Link with a wrapper library
    Intercepts all preemption points
      When a thread creates another thread
      When thread acquires lock
      When thread performs a blocking operation
      ...
      Code that is likely involved in a data race
    Calls scheduler at preemption points

Basic operation of scheduler
  Replay phase
    Replay scheduling decisions to some preemption point
    Select another thread to run, and go to record phase
  Record phase
    Run a thread until it yields, then scheduler runs another thread
      Scheduler prefers *not* to preempt a thread
    But record at each preemption point, the enabled threads
  Search phase
    If test completes, determine a new schedule
    Go to replace phase
      
Design scheduler
  Don't want scheduler to know about details of each synchronization primitive
  Don't want to reason about threads in detail (e.g., execution speeds)
  Idea: happens-before graph
    nodes = instructions
    edge = enforces partial order on instructions
    each node is annotated with (a task, sync variable, operation)
      task = thread, callback, etc.
      sync variable = resource (lock, semaphore, etc) used by task
      operation = isWrite and isRelease
    if isWrite for node n:
      add edge from preceding nodes with same resource to n
      add edge from n to all subsequent nodes with the same resource
    if isRelease is set by a node n:
      add waiting tasks to set of enabled tasks
  If program is data-race free, edges order accesses
    For example, if threads share a variable, protected by a lock

Implementation
  Wrapper
    Dynamically-linked library
    Binary rewriting
      For hardware-level instructions
  Win32, .NET, Singularity
  How does a wrapper determine that a call with block?
    Use the "try" version of the call
  Conservative
    set isWrite to True
    set isRelease to True

Challenges
  Benign races
    How to capture them?
    Enforce single-threaded execution!
    No concurrent access, no data race
    Slower, but run Chess instances in parallel
    May miss heisenbugs
    Solution
      run data-race detector on the first few runs
      add preemption points at data race
  Handling all preemption points
    Many of them (Win32 has many sync primitives)
    At different levels of abstraction (e.g., Win32 and HW)
    Careful to pick right level of abstraction
      Locks vs atomic instructions
    Difficult but not too bad
    Tests help them
  Imperfect replay
    Relies on replay to construct state at preemption point
    Replay is light-weight
      Misses sources of non-determinism
      Lazy initialization
      Interference from environment
      Non-deterministic calls
  Reduce interleavings
    Avoid unfair schedules
    Ignore preemptions in libraries
    Special case accesses to volatile variables
    Scheduler recognizes early-visited states
      Same happens-before graph
    Run with few threads
    
Results
  Found many known bugs
  Two unknown bugs

Can find Chess find all Heisenberg bugs?
  No, driven by specific set of tests
  No, imperfect replay
    Difficulty handling benign races