L09: Separation Logic
====================
Why this paper?
- SL has been enormously successful in reasoning about pointers, concurrency,
and in automated verification.
- The paper explains why you should care, the lecture will be more technical.
Outline for this lecture
1. Motivation: why is SL exciting?
2. SL as a way of describing heaps
3. Intuition for SL proofs
4. SL in Coq
Motivation
----------
SL has worked for these problems, as described in the paper:
1. Reasoning about pointer-manipulating programs, especially data structure
implementations. For example, linked lists, binary search trees, graph
search. Can relate abstract representations (lists, trees, graphs, etc) to
heap representation.
2. Reasoning about concurrency by extending the logic to "non-standard models."
These were achieved by introducing the idea of "fictional separation," where
separation was no longer between non-aliasing portions of the heap, but
between compatible views/ownership of different threads.
3. Automated reasoning about programs with pointers. The paper mainly refers to
Infer, a tool used at Facebook (and which originated with Abductor, a tool
O'Hearn developed in academia). Infer aims to discover pre/postconditions for
C++ code that guarantee memory safety. It uses SL to handle heap reasoning.
The locality of SL helps Infer scale: it analyzes each procedure
independently and then composes or stitches these analyses together, which
both parallelizes the checking and allows checking a change to one procedure
without re-doing the analysis.
O'Hearn is an expert on the original separation logic and Infer (having helped
invented both). I work on concurrency using separation logic, so my emphasis
will be slightly different.
SL for describing heaps
-----------------------
Mathematical syntax of separation logic
- P, Q are arbitrary predicates
- h |= P for "P holds in (heaplet) h"
- P |- Q for entailment: whenever h |= P, h |= Q
- h |= x ↦ y is true when h just maps x to y
- h |= P * Q means P and separately Q (predicates are true in disjoint sub-heaps
of h)
- h |= P -* Q means if we extend heap with h' ## h where h' |= P, then
h ∪ h' |= Q (think: (P -* Q) * P |- Q)
- h |= emp says h is empty
Picture semantics of separation logic
- x ↦ y will be box (x) pointing to value (y), often itself a pointer value
- since we focus on the pointer part, we often neglect to put data in any of our
data structures in these diagrams; in reality, x is a C `*node` which
points to a struct with a `data` field and a pointer (y) in the `next` field.
y has type `*node`.
- P * Q means we draw the two picture diagrams separately, which implicitly
means all of the boxes are distinct. We could partition the diagram into two
halves and they would cover distinct memory addresses.
**Examples**
list(x) for an acyclic list starting at x
list(x) := if x == null
then emp
else ∃ d, z. x ↦ [data: d; next: z] * list(z)
(often abbreviated to x ↦ z, ignoring data)
tree(t) is a tree rooted at t
I will represent trees without null pointers but instead using Some and None. A
tree(t) is either a pointer to Inl (), or a pointer to Inr(l, r) where l and r
are themselves trees. Aside: Inl or Inr are "in-left" and "in-right". Note that
tree(t) implies t is a non-null pointer.
tree(t) := (t ↦ Inl () ∨ ∃ l r. t ↦ Inr(l, r) ∗ tree(l) ∗ tree(r))
(again, no data in the tree, just an acyclic structure of pointers)
Discuss reading question in small groups
----------------------------------------
Suppose we made the tree (described on p91, middle column) share nodes whenever
an inserted tree matched one already present, saving memory by storing a pointer
to the existing tree. With such an implementation what does `tree(τ1, root_1) *
tree(τ2, root_2)` mean? Does modifying `root_1` affect just `τ1` (as before), or
also `τ2`?
SL representation predicates
----------------------------
We can do more than memory safety and relate the heap to an abstract
representation. For example, we can talk about representing a list of numbers as
a linked list:
listP(x, l) := match l with
| [] => emp
| a::l' => ∃ z. x ↦ (a, z) ∗ listP(z, l')
Now we can relate imperative procedures like list prepend, list append, list length:
{listP(x, l)} list_length(x) {λ r. r = length l ∧ listP(x, l)}
{listP(x, l)} list_prepend(x, a) {λ r. listP(r, a::l)}
Here a more complex example, where we write [tree t els] when there's a binary
search tree starting at the pointer t which logically holds elements in [els]:
```
Theorem tree_unfold (t:loc) (els:gset Z)
tree t els ⊣⊢
(⌜els = ∅⌝ ∧ t ↦ NONEV) ∨
(∃ left_els right_els (key: Z) l r,
⌜els = {[key]} ∪ left_els ∪ right_els⌝ ∗
⌜∀ e, e ∈ left_els → e < key⌝ ∗
⌜∀ e, e ∈ right_els → key < e⌝ ∗
t ↦ SOMEV (#key, (#l, #r)) ∗
▷ tree l left_els ∗ ▷ tree r right_els).
```
SL-based proofs
---------------
Rather than tightly formalize separation logic, I'll try convey what it "feels"
like to do a proof in separation logic.
```
{x↦0 ∗ y↦0}
*x = y
{x↦y ∗ y↦0}
*y = x
{x↦y ∗ y↦x}
```
A key idea: *x = y is like an "in-place" update to x↦0, turning it into x↦y.
This is almost like the actual behavior of the CPU instruction (it changes one
field in memory), but this is a logic system! For example, we use predicates
like x↦y ∗ y↦0 which are abstract (they use the variable x and y). One way to
think about these SL assertions is as abstract views of the heap, and the
abstract view behaves a lot like a physical heap because of disjointness.
This is fine and all, but the really interesting proofs are when we start using
predicates. Let's look at a proof of the deletetree example:
{tree(t)}
deletetree(t)
{emp}
two cases:
### case 1 (empty tree)
{t ↦ None}
Free(t)
{emp}
easy, follows directly from rule for Free
### case 2 (has children)
{t ↦ Some(l,r) ∗ tree(l) ∗ tree(r)}
deletetree(l)
{t ↦ Some(l,r) ∗ emp ∗ tree(r)}
deletetree(r)
{t ↦ Some(l,r) ∗ emp ∗ emp }
Free(t)
{emp}
To appreciate why this proof matters, let's think about how we would do this
without separation logic. Let's say we still write down tree(t) as above:
tree(t, h) := h[t] = None ∨ ∃ l r. h[t] = Some(l, r) ∧ tree(l, h) ∧ tree(r, h)
This isn't quite good enough. We should probably add that t, l, and r are
distinct. Ok, let's add that.
Now let's say we want to prove {h: tree(t,h)} deletetree(t) {h': True}; note that it
says nothing about the heap afterward, just that deletetree is safe.
{h: h[t] = Some(l, r) ∧ tree(l,h) ∧ tree(r,h) where t,l,r distinct}
deletetree(l);
deletetree(r);
Free(t)
{h': True}
What is the effect of the first deletetree? Well, the spec doesn't say anything
at all about the heap afterward. We better strengthen it:
{h: tree(t,h)}
deletetree(t)
{h': ∀ p, h'[p] = h[p] if p is unreachable from t in h}
Now this is awful. After deleting the left subtree, how do we know what pointers
are reachable from tree(l,h)? How do we know that tree(r,h') is still true?
Actually, this tree definition doesn't even guarantee it - the tree could have
internal sharing, as long as it doesn't share directly from one level to the
next (our distinctness condition covers that).
SL in Coq
---------
I'll now show you how the deletetree proof works in Coq, in a particular library
called Iris. Iris defines a programming language and a separation logic for that
language. It also comes with the IPM, which is like a DSL for Coq-like proofs
within separation logic, all within Coq.
To understand the goals we're seeing in Coq, I need to introduce a couple
things. The most important is how to read IPM goals.
Here's a typical Coq goal:
```
H1: P
H2: Q
------
P ∧ Q
```
We interpret this as "assuming the premises P and Q (which are named H1 and H2),
prove P ∧ Q". Then we issue tactics to gradually chip away at this proof. While
running tactics, hypotheses and the goal get transformed.
The IPM is like that, but it adds a new type of assumption:
```
x: loc
-----
"Hx": x ↦ #0
"Hy": y ↦ #3
--------------∗
y ↦ #3 ∗ x ↦ #0
```
The part above the solid line is the "Coq context", the usual Coq assumptions.
The part above the -----∗ is the "spatial context" - these are separation-logic
assumptions, and as you can see they are SL assertions and not Coq propositions.
The meaning of this goal is that x ↦ #0 ∗ y ↦ #3 ⊢ y ↦ #3 ∗ x ↦ #0. This is like
the Coq goal, with the crucial difference that the items in the "spatial
context" are connected by "and separately" rather than in Coq, where intuitively
they're combined with "and".
The other thing is that the way Iris encodes Hoare triples is using _weakest
preconditions_. WP e {Q} is a predicate such that if it holds in some heaplet
(that is, if (WP e {Q})(h) holds), then running e starting from h will deliver a
heaplet h' satisfying Q. This is a lot like a Hoare triple; in fact `P ⊢ WP e
{Q}` is equivalent to `{P} e {Q}`. Why is this useful? It lets us stick to
reasoning using separation logic predicates rather than reasoning about states.
You can view the code at
https://github.com/tchajed/seplogic-demo/blob/master/src/delete_tree.v.