Confidentiality
===============

security
  prev lecture: integrity
    prevent one process from tampering with another process
  this lecture: confidentiality
    prevent one process from getting confidential data from another process
    unlike previous topics, not well understood how to formally reason

example: OneDiskAPI
  give each user their own disk block
  want to make sure one user cannot get another user's data
  how to capture what confidentiality means?

challenge 1: many possible "attack routes"
  strawman: cannot run read() on protected data
    e.g., read of another user's data returns "permission denied"
  limited use: assumes specific "attack route" (i.e., direct call to read)
  might be hard to think through possible other attack routes
    maybe disk is an SSD that internally does deduplication of identical blocks
    exposes a separate call to get the number of blocks actually in use
    adversary can try to write different data, see when dedup kicks in

challenge 2: non-determinism
  this is what the paper means by "refinement does not propagate security"
  example: what if adversary calls read() on out-of-bounds sector?
  OneDiskAPI says any return value is allowed
  what if implementation returned user 0's data?
    allowed by the OneDiskAPI spec
    does not return "permission denied" because not user 0's block
    clearly discloses user 0's data
  another example: OneDiskAPI also allows a crash at any point
    implementation could crash depending on whether user 0's block is odd/even
  confidentiality interacts poorly with non-determinism
    any non-determinism is suspect
    non-det choices could be leaking confidential data!

better spec: non-interference
  regardless of what one user's block is, other processes run the same way
  not a safety property in the sense we discussed so far in class
    recall: safety property means can determine violation by looking at a trace
    cannot determine that data leaked by looking at one trace
  non-interference is a two-safety property
    can determine violation by looking at two traces
    specifically, two traces that differ just in one user's private data
    if another process observes different results in these traces, data leaked

need to pin down some issues to formalize this
  1. what is one user's confidential data?
  2. what does it mean for another process to observe different results?

this paper's approach to formalization of non-interference
  spec (and code) must be deterministic
  observation functions
  reason about confidentiality purely at the spec level
    don't need to re-prove the whole system to change the confidentiality spec

idea of "observation function" used to pin down both of the above issues
  observe(principal, state)
  but maybe easier to think of the two uses of obs. func. as different
  1. at the spec level, observation function defines what's "readable"
    indirectly answers question about other user's confidential data
    one user's confidential data should not be in another user's obs. func.
  2. at the code level, observation function defines what's exposed
    one user's (principal's) part of the externally visible actions
    e.g., console output of one process (mCertiKOS)
    e.g., network packets sent to one user (hypothetically in another system)
  the two are related:
    code-level observation must be a subset of spec-level observation
    i.e., spec must say the externally visible trace is observable
    AR(codeState, specState) -> O(codeState) \in O(specState)

proof goal: same externally visible actions
  start with two states that have same spec-level observer (for principal P)
  run the system in each of those two states
  the result better be the same externally visible actions (for principal P)

observation functions for our example
  need to make some externally visible actions
    let's add network packets sent to a user
    each process can send only to the process's user; no cross-user packets
  code observer for user U: packets sent to U
  spec observer for U's process: U's blocks plus packets sent to U

prove confidentiality by induction at the spec level
  consider one spec step at a time
  assume two spec states with same spec-level observation
  do the step
  resulting states must also agree on spec-level observation
  by induction, all states will agree on spec-level observation

why does this translate to the code level?
  start with two code-level states
  every code step corresponds to 0 or more spec steps
  0 spec steps:
    no externally visible actions
    because ext. visible actions included in spec observer
    and state didn't change and thus spec observer (ext actions) didn't change
  1+ spec steps:
    only one possible sequence of spec steps for each code state
      recall: spec must be deterministic
    each spec step says the spec observer is the same across two states
    spec observer includes code observer, thus ext. actions also the same
  subtle issue: termination
    maybe in one of the two states, code loops forever without stepping spec
    paper avoids this; see their reference on "infinite stuttering"

what if developer writes the wrong spec observation?
  might fail to prove: something is exposed but not captured by obs. func.
  might prove ok but allows too much to be exposed
    spec observer is assumed correct
    it's the security spec

what if developer writes the wrong code observation?
  might say there's too much stuff being exposed externally
    cause proofs to fail even though it's not a security problem
  might forget to capture some stuff being exposed externally
    e.g., forget to capture some packets or console output
    means system is leaking data that isn't captured in the model
    security theorem might be meaningless

proving confidentiality for processes in mCertiKOS
  what's the observer spec for a single process?
    virtual memory
    registers (if running)
    saved registers (if suspended)
    some state about child processes

  proof issue: virtual memory aliasing
    started in two states that appear equal (same virtual memory)
    but in one case, there was aliasing between pages
    resulted in two different states
    needed to strengthen invariant (no aliasing)

  interface issue: process IDs
    naive plan: global counter for child processes
    creating child process returns number of processes created so far
    can indirectly tell what's going on in another process!
    their solution: pre-allocate PIDs to everyone
    another solution: file descriptor-like naming of PIDs

  spec issue: suspended processes
    after context switch, might get resumed after different number of steps
    states are distinguishable: resumed in one, still suspended in another
      only known fact about starting states: this process observer is equal
      but another thread may be in different states, may call yield differently
    solution: "local" semantics that completely ignores steps other processes
      step by another thread in global semantics maps to 0 steps in local sem

non-interference in practice often looks a little different
  non-interference between processes, rather than with respect to an observer
  process A cannot interfere with process B
    regardless of what process A does, process B produces same results
  can think of this as both integrity and confidentiality
    process A is bad guy, process B is good guy: A cannot tamper with B
    process A is good guy, process B is bad guy: B can't deduce A's data
  one way of specifying which processes can interfere: information flow lattice
    partial order on processes that defines if A can interfere with B
    not commutative: could have one-way information flows
  simple example of lattice: military classification
    "unclassified" < "secret" < "top secret"
    maybe with categories (nuclear, foreign, ..)
  nice lattice property: transitivity
    if A cannot flow to C in the lattice, then cannot interfere directly
    but also, if A can interfere with some other B, then B cannot interfere with C
    due to partial order
  not in wide use
    often turns out to be too strict; want information flows to get stuff done
    relatedly, hard to eliminate information flow due to side channels
      underlying system (e.g., hardware) gets stuff done and isn't that strict

source code for this paper
  http://flint.cs.yale.edu/flint/publications/pldi16-security/
  code-level observer: [observe] in mcertikos/layerlib/ObservationImpl.v
  spec-level observer: [xobserve] in mcertikos/security/Security.v,
    which comes from [my_observe] in mcertikos/security/ObsEq.v
  subset property: [observe_obs_eq] in mcertikos/security/Security.v