TSVD
====

Paper: "Efficient Scalable Thread-Safety-Violation Detection"
  by Li et al (SOSP 2019)

https://github.com/microsoft/tsvd

Why read this paper?
  Concurrency is good area for verification
    Subtle bugs
    Testing is difficult (e.g., incomplete)
  Core challenge:
    Reasoning about many orderings
  Insight
    Delaying ops on shared objects helps find racing accesses
  Cool tool
    Works on real systems
    Finds bugs
    No false positives, by their definition
    Gives stack traces of threads performing conflicting operations

Problem: non-deterministic bugs
  Result of a race condition in a parallel program
  Bugs that show up only in some runs of a program
    E.g., Test case fails once in a few months
  Difficult to find
    Testing for it may change behavior and not result in bug
  Difficult to reproduce
    Something went wrong; how to debug it?

Hard to find concurrency bugs with static analysis.
  Often not a local property: depends on callers holding locks, ordering, etc.
  Requires sophisticated inter-procedural analysis.
  Difficult to infer specs for intermediate procedures like what INFER did.

How to find concurrency bugs with runtime testing?
  Define what constitutes a bug.
    Returning wrong result, crashing, violating some intermediate invariant, ..
      Crash: null pointer, exception, etc.
      Deadlock/livelock: test never finishes, times out.
      Intermediate specialized checkers: memory leak checker, use-after-free.
    Similar flavor to any other testing strategy we've seen.
  Explore different schedules.
    This is the source of non-determinism for concurrency.

===

Strawman exploration plan 1: run many times.
  Each run might have a different schedule.
  E.g., run a stress test with many threads, run it many times, etc.
  E.g., this is done in critical concurrent code in Linux.
    [ https://www.kernel.org/doc/Documentation/locking/locktorture.txt ]
    [ https://www.kernel.org/doc/Documentation/RCU/torture.txt ]
    Shows importance of looking for bugs in concurrent code.
    Huge effort for application code.

Downsides as described above.
  Might be hard to get a test to fail in interesting ways.
  Hard to debug even if a test does fail.

===

Straman exploration plan 2: exhaustive.
  Run test, but with a special scheduler.
  Scheduler forces all interleaving of instructions of threads
    run first thread for one instruction
    then 2nd thread for one instruction
    then 3rd thread for one instruction
    backout 2nd thread, run 3rd thread for one instruction
    etc.
  Different ways you might implement backout
    Snapshot state of process
    Or restart from beginning:
      should be deterministic now that we control scheduling

Good property: repeatable.
  If we hit a bug, we have a schedule that triggers it!

Good property: checks all bugs.

Bad property: too slow for serious programs.
  Lots of interleavings to check.
  Many similar interleavings.

There's been lots of work on the schedule-exploration plan.
  CHESS from Microsoft
  PCT (ASPLOS 2010)
  DPOR (Dynamic Partial Order Reduction)
  BPOR (Bounded Partial Order Reduction)
  CB-DPOR
  ...

Some insights from that line of work:
  Find "preemption points" where interleaving is interesting.
    E.g., thread synchronization, accessing shared variables, etc.
  Few thread switches usually enough to find bugs.
  Optimizations for eliminating equivalent schedules.

===

Another approach: check for intermediate invariants that imply concurrency bugs.
  Above, we were looking for "end-to-end" bug that demonstrates problem.
  Instead, we could formulate invariants that should be true of good code.
  Classical example: must hold a lock while accessing shared memory.

Many tools along these lines.
  Some focused on locks.
    Invariant: must be at least one lock consistently held when accessing
      any given shared memory location.
    [ http://cseweb.ucsd.edu/~savage/papers/Tocs97.pdf ]
  Others are more general "happens-before" trackers.
    Invariant: must be a strict "happens-before" order when two threads
      access the same shared-memory location.
    E.g., "go test -race"
    [ https://golang.org/doc/articles/race_detector.html ]
    [ https://clang.llvm.org/docs/ThreadSanitizer.html ]
    [ https://github.com/google/ktsan ]
  Widely used in practice.

Good property: need far fewer runs (maybe just one).
  Why?
  Intuition: easier to observe invariant violation than precise race.
  No need to actually hit the race, suffices to violate invariant.

Bad property: false positives.
  Invariant might be too strong.
  Tool might not detect some reason why the access is actually safe.
  TSVD paper says precise HB tracking is too hard/costly to get right.

===

Breakout Q: what's the TSVD plan in contrast to the above approaches?
  What defines a bug?
  How does it explore schedules to find the bug?
  Do they miss bugs?  Do they report "bugs" that are not real issues?

===

TSVD approach:
  Check intermediate invariant.
    All about shared data structures in C#: maps, lists, queues, ...
  Tweak the schedule by inserting delays, hoping to trigger invariant violation.
  Insert delays just before an invocation to a method on shared data structure.
    Same insight as schedule-exploration: this is where schedule matters!
    Delay injection idea from DataCollider (OSDI 2010), overlapping authors.
  Wait to see if another thread comes along and violates invariant.

What's the intermediate invariant on data structures?
  Read/write sets.
  OK to have many read operations concurrently.
  Write operation cannot run concurrent with any other write or read.
  [ https://github.com/microsoft/TSVD/blob/master/TestApps/DataRace/Configurations/runtime-config.cfg ]
  Effective because violations of this invariant almost always can cause a crash/panic/exception.

Interesting contrast with data-race checkers.
  Data-race checkers look for unsynchronized read/write to shared memory location.
  Similar to TSVD's read/write sets on a data structure.
  But turns out to not work nearly as well: many benign races on memory locations.
  In contrast, racing on data structure r/w causes data structure corruption.

Optimizations for deciding when to inject delays.

Near-misses.
  Watch for threads that had conflicting accesses within 100 msec of each other.
  Global buffer tracks all recent accesses.
  If we find such a pair of accesses, delay next such access by 100 msec.

Concurrent phase detection.
  Seems to be targeted at tests that do sequential setup, then parallel execution.
  Don't want to slow down the sequential phase.

Inferring happens-before.
  Problem: concurrent accesses might be perfectly well serialized using locks, etc.
  No use trying to delay one thread, because other thread won't race..
  Clever trick: empirical observation of what happens due to delay.
    Inject delay in one thread.
    If another thread gets delayed by ~same amount, assume they are serialized.
    Don't bother trying to get these two threads to race.

Decay.
  If TSVD doesn't see a race quickly, stop trying to get the race to happen.
  Probability of injecting delay starts at 1, and decays over time.

Results.
  Effective: finds bugs in C# code, developers think the bugs are real.
  Easy to apply to C# code.  Instrumenting compiled code, no source changes needed.
  Low overhead: table 2.  33% vs 2-4x for other approaches.
  Misses some bugs that other tools find: figure 8.
    Mostly heuristics going wrong.
    Near-miss window too short, HB inference wrong, injected delay too short.
  Misses some bugs by design.
    Wrong results, not a race on a data structure read/write.