Specification and abstraction: Hoare logic
==========================================
Where are we in the class?
This week: specifications and abstractions.
Next week: start on research papers, which will diverge from labs.
Goal: reason about possible executions.
Today: specialized reasoning for sequential programs.
Hoare logic.
Used in labs.
Thursday: general abstraction and simulation.
Similar to Butler's lecture about the Amazon paper.
Always works: concurrency, crashes, distributed systems, ...
More principled story.
Requires more infrastructure to do formal proofs.
Two different ways of thinking about correctness.
Coming from different contexts (programs vs. systems).
Coming from different research communities.
Significant similarities.
Still an active area of research.
Part 1: Hoare logic.
--------------------
Hoare logic view of the world:
System state.
Procedures (code) runs on this state and return values.
Specifications for procedures: pre- and post-conditions.
Often preconditions are just "True".
Preconditions can be used to restrict specs to subset of cases.
Preconditions can be used to capture invariants.
Postcondition could allow multiple resulting states.
Non-determinism.
All that matters about a procedure is the state where you end up
and the procedure's return value.
Contrast with state-machine view:
Intermediate states aren't important (no concurrent code).
Transitions aren't important (nothing is externally visible).
Why are pre-/post-conditions a useful spec?
Theorem directly talks about execution of procedure (code).
As long as we start from a state satisfying precondition,
theorem guarantees the ending state will satisfy postcondition.
If we don't care about how we got there, theorem tells us all we need.
Running example: StatDB (lab 1).
Logically, want to keep track of values and report statistics.
# add(x) adds element x.
def add(x):
total = total + x
count = count + 1
# avg() returns the average so far.
def avg():
return total / count
Specifications for these procedures
add(x):
PRE(state): True
POST(state', r): state'.total = state.total+x /\
state'.count = state.count+1
avg():
PRE(state): state.count > 0
POST(state', r): state' = state /\
r = state.total / state.count
How to prove that code meets a spec?
Hoare logic proofs: sequencing rule.
Break down proof into smaller pieces.
Sequencing rule.
Suppose we want to prove a spec for a procedure with code (x; y).
Precondition PRE, postcondition POST.
We need a spec for the smaller pieces: x and y.
Suppose those specs are PREx/POSTx and PREy/POSTy.
Sequencing rule says: PRE/POST holds for (x; y) if:
PRE implies PREx
POSTx implies PREy
POSTy implies POST
Applying sequencing to the StatDB example:
What are the primitives involved in the code?
tmp := READ(total)
WRITE(total, tmp+x)
tmp := READ(count)
WRITE(count, tmp+1)
[ Note: difference between variables that are part of the state,
and the temporary variable that's local to the procedure. ]
What are the specs for these primitives? All have PRE: True.
READ(total)'s POST: state'=state /\ r=state.total
READ(count)'s POST: state'=state /\ r=state.count
WRITE(total, v)'s POST: state'.total=v /\ state'.count=state.count
WRITE(count, v)'s POST: state'.count=v /\ state'.total=state.total
Proving add(x) starting from state0:
True
tmp := READ(total)
state1=state0 /\ tmp=state0.total
WRITE(total, tmp+x)
state2.total=state0.total+x /\ state2.count=state1.count
tmp := READ(count)
state3=state2 /\ tmp=state2.count
WRITE(count, tmp+1)
state4.count=state2.count+1 /\ state4.total=state3.total
Plugging in through equality we get:
state4.count=state0.count+1 /\ state4.total=state0.total+x
Preconditions useful for expressing invariants.
E.g., StatDB invariant: count==0 implies total==0
Could be useful if we wanted to correctly handle dividing by zero in avg()
avg():
if total == 0:
return 0
return total / count
Three steps for using invariants in Hoare logic:
Ensure invariant is established by initial state (e.g., all zero variables).
Assume invariant holds on entry to procedure (precondition).
Ensure invariant still holds at the end of procedure (postcondition).
Sequential rule provides modularity.
First start by writing + proving specifications for lowest-level primitives.
Variable reads and writes.
In lab, we assume these as axioms.
Then write + prove specifications for next-level procedures.
Use primitive specs as building blocks.
Then write + prove specifications for even larger procedures.
Can use intermediate specs in these proofs now.
Don't need to descend all the way down to the primitives.
Part 2: Abstraction.
--------------------
How can we write specifications that represent state differently?
E.g., would like to write StatDB spec in terms of a logical history.
add(x) adds x to history.
avg() returns average value based on history.
Abstraction.
Define new kind of state for the spec.
Write specs in terms of high-level states.
Define an abstraction relation that connects low-level and high-level states.
For example, StatDB:
State = list nat
add(x): POST: state' = state ++ [x]
avg(): POST: state' = state /\ r = sum(state) / len(state)
Can add abstractions to our existing Hoare logic machinery.
Express relation between low- and high-level states in pre/post condition.
add(x): POST(state', r):
\exists stateH',
AR(state', stateH') /\
stateH' = state ++ [x]
Could be many possible histories that lead to a given code state.
E.g., AR: state.count=len(stateH) /\ state.total=sum(stateH).
How to prove specifications with abstraction relations?
Start with some high-level state (guaranteed by precondition).
Observe that there must exist some lower-level state that satisfies AR.
Use Hoare logic to figure out what the resulting low-level state looks like.
Prove AR for resulting high-level state.
Simulation diagram similar to last week's lecture:
s ======> s'
^ ^^
| AR ||
V VV
c ------> c'
Horizontal arrows are allowed transitions according to the spec.
Expressed as pre-/post-conditions.
Vertical arrows are the abstraction relations between code and spec.
Single lines are given in the theorem:
Suppose there is a code state [c], which transitions to state [c'].
Suppose there is a spec state [s] related to [c] by AR.
(This is part of the precondition for the high-level spec.)
Double lines are what you have to prove:
Show there exists a valid transition from [s] to [s'] according to
spec pre-/post-condition.
Show that this same [s'] is related to [c'] using the AR.
(This is part of the postcondition for the high-level spec.)
Hoare logic allows multiple specs for the same procedure.
Could have different sets of pre/post conditions.
Could have specs at different levels of abstraction.
E.g., we had two specs for StatDB's add() and avg().
Hoare logic doesn't require abstraction relation for intermediate states.
Suppose we called add(2) starting from an initial state with all zeroes.
After adding total, we end up with (total=2, count=0).
Impossible state, does not correspond to any abstract state!
Would be a problem in general-case state machine view.
Not an issue with Hoare logic: no way to observe intermediate state.
No concurrency, no crashes, ...
Layering: multiple levels of abstraction.
Abstraction composes vertically.
Can build multiple abstraction layers.
AR just needs to relate adjacent layers.
In our lab infrastructure, we assume there's a bottom-most [world] State.
Think of the world as (logically) low-level transitors, physics, etc.
Variables is an abstraction on top of the low-level [world].
Never need to look below variables.
AR doesn't matter for correctness.
Could we run into problems if we had a buggy AR?
No: only observable results are return values, not affected by AR.
Any AR for which the proofs work out is acceptable.
Lab infrastructure code:
VariablesAPI.
easy to miss detail: inited_any
StatDBAPI.
inited
Spec.Proc.
Crash handling largely irrelevant for now; will come back to it.
Spec.Abstraction.