Specification and abstraction: Hoare logic
==========================================

Goal: reason about possible executions.
  Today: specialized reasoning for sequential programs.
    Hoare logic.
    Used in labs.
  Next Tuesday: general abstraction and simulation.
    Similar to Butler's lecture about the Amazon paper.
    Always works: concurrency, crashes, distributed systems, ...
    More principled story.
    Requires more infrastructure to do formal proofs.
  Two different ways of thinking about correctness.
    Coming from different contexts (programs vs. systems).
    Coming from different research communities.
    Significant similarities.
    Still an active area of research.

Part 1: Hoare logic.
--------------------

Hoare logic view of the world:
  System state.
  Procedures (code) runs on this state and return values.
  Specifications for procedures: pre- and post-conditions.
    Often preconditions are just "True".
    Preconditions can be used to restrict specs to subset of cases.
    Preconditions can be used to capture invariants.
    Postcondition could allow multiple resulting states.
      Non-determinism.
  All that matters about a procedure is the state where you end up
    and the procedure's return value.

Contrast with state-machine view:
  Intermediate states aren't important (no concurrent code).
  Transitions aren't important (nothing is externally visible).

Why are pre-/post-conditions a useful spec?
  Theorem directly talks about execution of procedure (code).
  As long as we start from a state satisfying precondition,
    theorem guarantees the ending state will satisfy postcondition.
  If we don't care about how we got there, theorem tells us all we need.

Running example: StatDB (lab 1).
  Logically, want to keep track of values and report statistics.

  # add(x) adds element x.
  def add(x):
    total = total + x
    count = count + 1

  # avg() returns the average so far.
  def avg():
    return total / count

Specifications for these procedures
  add(x):
    PRE(state):      True
    POST(state', r): state'.total = state.total+x /\
                     state'.count = state.count+1
  avg():
    PRE(state):      state.count > 0
    POST(state', r): state' = state /\
                     r = state.total / state.count

How to prove that code meets a spec?
  Hoare logic proofs: sequencing rule.
  Break down proof into smaller pieces.

Sequencing rule.
  Suppose we want to prove a spec for a procedure with code (x; y).
    Precondition PRE, postcondition POST.
  We need a spec for the smaller pieces: x and y.
    Suppose those specs are PREx/POSTx and PREy/POSTy.
  Sequencing rule says: PRE/POST holds for (x; y) if:
    PRE implies PREx
    POSTx implies PREy
    POSTy implies POST

Applying sequencing to the StatDB example:
  What are the primitives involved in the code?
    tmp := READ(total)
    WRITE(total, tmp+x)
    tmp := READ(count)
    WRITE(count, tmp+1)
  [ Note: difference between variables that are part of the state,
    and the temporary variable that's local to the procedure. ]
  What are the specs for these primitives?  All have PRE: True.
    READ(total)'s POST: state'=state /\ r=state.total
    READ(count)'s POST: state'=state /\ r=state.count
    WRITE(total, v)'s POST: state'.total=v /\ state'.count=state.count
    WRITE(count, v)'s POST: state'.count=v /\ state'.total=state.total
  Proving add(x) starting from state0:
    True
      tmp := READ(total)
    state1=state0 /\ tmp=state0.total
      WRITE(total, tmp+x)
    state2.total=state0.total+x /\ state2.count=state1.count
      tmp := READ(count)
    state3=state2 /\ tmp=state2.count
      WRITE(count, tmp+1)
    state4.count=state2.count+1 /\ state4.total=state3.total
  Plugging in through equality we get:
    state4.count=state0.count+1 /\ state4.total=state0.total+x

Preconditions useful for expressing invariants.
  E.g., StatDB invariant: count==0 implies total==0
  Could be useful if we wanted to correctly handle dividing by zero in avg()
    avg():
      if total == 0:
        return 0
      return total / count
  Three steps for using invariants in Hoare logic:
    Ensure invariant is established by initial state (e.g., all zero variables).
    Assume invariant holds on entry to procedure (precondition).
    Ensure invariant still holds at the end of procedure (postcondition).

Sequential rule provides modularity.
  First start by writing + proving specifications for lowest-level primitives.
    Variable reads and writes.
    In lab, we assume these as axioms.
  Then write + prove specifications for next-level procedures.
    Use primitive specs as building blocks.
  Then write + prove specifications for even larger procedures.
    Can use intermediate specs in these proofs now.
    Don't need to descend all the way down to the primitives.

Part 2: Abstraction.
--------------------

How can we write specifications that represent state differently?
  E.g., would like to write StatDB spec in terms of a logical history.
  add(x) adds x to history.
  avg() returns average value based on history.

Abstraction.
  Define new kind of state for the spec.
  Write specs in terms of high-level states.
  Define an abstraction relation that connects low-level and high-level states.

For example, StatDB:
  State = list nat
  add(x): POST: state' = state ++ [x]
  avg(): POST: state' = state /\ r = sum(state) / len(state)

Abstraction relation.
  Connect code- and spec-level state.
  E.g.: AR(state, stateH) := state.count=len(stateH) /\ state.total=sum(stateH).
  Non-deterministic: could be a many-to-one relation, and that's OK.
    Many spec states for one code state: extra state in the spec.
      E.g., many possible StatDB histories that lead to a given total+count.
    Many code states for one spec state: extra details in code that don't matter.
      Not in the StatDB example, but could happen elsewhere: e.g., caches.

How to prove specifications with abstraction relations?
  Start with some high-level state (guaranteed by precondition).
  Observe that there must exist some lower-level state that satisfies AR.
  Use Hoare logic to figure out what the resulting low-level state looks like.
  Prove AR for resulting high-level state.

Simulation diagram similar to last week's lecture:

    s ======> s'
    ^         ^^
    | AR      ||
    V         VV
    c ------> c'

  Horizontal arrows are allowed transitions according to the spec.
    Expressed as pre-/post-conditions.
  Vertical arrows are the abstraction relations between code and spec.
  Single lines are given in the theorem:
    Suppose there is a code state [c], which transitions to state [c'].
    Suppose there is a spec state [s] related to [c] by AR.
      (This is part of the precondition for the high-level spec.)
  Double lines are what you have to prove:
    Show there exists a valid transition from [s] to [s'] according to
      spec pre-/post-condition.
    Show that this same [s'] is related to [c'] using the AR.
      (This is part of the postcondition for the high-level spec.)

Can add abstractions to our existing Hoare logic machinery.
  Express relation between low- and high-level states in pre/post condition.
    add(x):
      PRE(state):
        \exists stateH,
          AR(state, stateH)
      POST(state', r):
        \exists stateH',
          AR(state', stateH') /\
          stateH' = stateH ++ [x]

Hoare logic allows multiple specs for the same procedure.
  Could have different sets of pre/post conditions.
  Could have specs at different levels of abstraction.
  E.g., we had two specs for StatDB's add() and avg().

Hoare logic doesn't require abstraction relation for intermediate states.
  Suppose we called add(2) starting from an initial state with all zeroes.
  After adding total, we end up with (total=2, count=0).
  Impossible state, does not correspond to any abstract state!
  Would be a problem in general-case state machine view.
  Not an issue with Hoare logic: no way to observe intermediate state.
    No concurrency, no crashes, ...

Layering: multiple levels of abstraction.
  Abstraction composes vertically.
  Can build multiple abstraction layers.
  AR just needs to relate adjacent layers.

In our lab infrastructure, we assume there's a bottom-most [world] State.
  Think of the world as (logically) low-level transitors, physics, etc.
  Variables is an abstraction on top of the low-level [world].
  Never need to look below variables.

AR doesn't matter for correctness.
  Could we run into problems if we had a buggy AR?
  No: only observable results are return values, not affected by AR.
  Any AR for which the proofs work out is acceptable.

Lab infrastructure code:
  VariablesAPI.
    easy to miss detail: inited_any
  StatDBAPI.
    inited
  Spec.Proc.
    Crash handling largely irrelevant for now; will come back to it.
  Spec.Abstraction.