Everest
=======

Why this paper?
  TLS is widely used for security.
  Long history of problems in TLS.
    Design flaws.
    Unclear abstractions.
    Implementation bugs: memory errors, incorrect crypto code, logic errors, ..
    [ Ref: https://www.mitls.org/pages/attacks/ ]
  Ambitious goal: specify and verify entire TLS stack.
  Might have an impact.
    Opportunity due to upgrade to TLS 1.3.
    Already parts are deployed.
    Firefox uses crypto code from Everest.
    Wireguard VPN uses crypto code from Everest.
    TLS 1.3 spec itself evolved due to efforts in formalizing it.

Interesting points:
  What are all of the components being verified?
  What are the specs for the different components?

TLS stack.
  Figure 1 from the paper.
  Top-level protocol: HTTPS, deals with requests, responses, URIs, headers, ..
  Certificates: X.509, encoded using ASN.1.
  TLS as the secure network transport.
  Cryptographic primitives: AES, SHA, ECDH, ..
  Networking, message encoding/decoding, ..

Main results so far:
  [ Ref: https://project-everest.github.io/ ]
  EverCrypt: high-performance crypto implementations.
    Vale (from earlier in the semester) is part of this.
    Pieces are deployed in Firefox, Wireguard, ..
  miTLS: partially verified TLS stack (TLS record layer).
    Had some impact on the TLS 1.3 specification.
  EverParse: verified parsers / serializers for binary formats.

Cryptographic primitives: functional specs.
  SHA, AES: spec is about bit shuffling.
  Poly1305: spec is about math in finite fields.
    Example from paper: written in terms of primitives for mod 2^130-5.
  ECDH: spec is about math on elliptic curves.
    Similarly, defined as math equations using infinite-precision arithmetic.

Proofs for crypto primitives.
  Standard refinement story, like in the lab assignments.
  Code implements crypto primitives in a more efficient way.
    E.g., poly1305 example from paper: 5 limbs of 26 bits eaech.
    More efficient for batching carry operations.
  Abstraction relation connects code states (limbs) to spec states (bignum).
  Refinement proof shows that implementation correctly computes results.

Challenge: generating efficient code for crypto.
  Paper: Low* to C.
    Write restricted code in F* that is amenable to extraction to C.
    Precise semantics in F*, similar to how lab code is implemented in Gallina.
  Earlier this semester: Vale (came out 2 years after this lecture's paper).
    Helps reason about low-level assembly code.
    Integrates assembly reasoning into Hoare logic proofs.
  EverCrypt: specialization of parameters, architecture platforms.
    General implementation with many parameter choices, optimizations.
    Proof covers any parameter choice or optimization choice.
    Specific instantiation picks particular parameters / optimizations.
    Partial evaluation gives efficient (and verified) code for those choices.

Challenge: leakage from crypto code.
  Timing information: code runtime may depend on secret (e.g., key).
  Common source (though not the only source): data-dependent operations.
    Access memory locations that depend on the key.
    Accesses to the same location likely faster due to caching.
    Adversary might infer information about key based on cache hits (faster)
      or cache misses (slower).
  This paper: ensure compilation does not introduce data-dependent accesses.
    Seems like a reasonable start.
  Vale paper: more elaborate model of side channels.
    Adversary gets to observe every instruction executed.
    Adversary gets to observe every memory address accessed.
    Strong model that subsumes many side channels and realistic adversaries.

Challenge: security specs and proofs.
  Functional specs do not directly imply or capture security.
  Secrecy means we want something closer to non-interference (ala CertiKOS).
  Tricky part 1: network protocols are interactive.
    Adversary could interact with our system: send messages, read messages, ..
  Tricky part 2: crypto is not perfect.
    There is some tiny but non-zero probability that adversary can decrypt.
    E.g., tiny probability that adversary can guess the key.
    Can't prove strict CertiKOS-style non-interference.

Cryptographic definition of security: simulating a game with adversary.
  We are building a box that provides some crypto API.
  Two possible worlds: real implementation vs. ideal functionality.
  Ideal functionality isn't executable but captures desired security.
  Adversary gets access to this box.
  Can adversary tell if the box is real or ideal?
  Goal: tiny probability of adversary guessing correctly.

Paper's example: AEAD (Authenticated Encryption with Associated Data).
  [ Aside: Associated Data is just extra payload that isn't encrypted. ]
  We provide the adversary with access to Encrypt() and Decrypt().
    Ideal encrypt and decrypt make up truly random ciphertexts.
    Ideal tracks legitimate encryptions, only decrypts those.
  Adversary can query them some number of times.
  Goal: prove that adversary can't distinguish real vs ideal.

Proof strategy: reduction to primitives with known security.
  Building blocks: encryption (ChaCha20), one-time-MAC (Poly1305).
  Axioms about building blocks: epsilon probability of adversary winning.

Specifying ideal functionality.
  Code is written with "if" statements that choose ideal vs real code.
    Ideal code is like similar to ghost state.
    At runtime, "if" statement collapses to real code.
  Developer needs to show similarity between the two "if" branches.
    "If" statements help keep code structure similar between two worlds.
    Roughly, proof shows that the only differences arise from primitives.
    And those have axioms about their epsilon level of security.

Network messages.
  What could be the specification for the parser / serializer?
    No crashes.
    Can decode any encoded message, preserving semantics.
    Signing or hashing: canonical encoding of a message.
    Avoid ambiguity: unique decodable encoding of a message.
    Semantics agrees with other parsers / serializers.
      Tricky to pin down unless using a common formal spec.
      Example issue: zero bytes in length-encoded fields in TLS certs.
  EverParse.
    Strict: parser must accept exactly one representation of each message.
    DSL for specifying languages.
    "Translation validation" style proof.
      DSL not formalized, and no proof that every DSL program will work.
      DSL translator emits proof of generated parser, checked by F*.
    Performance: zero-copy parsing.

TLS abstraction.
  Complicated API and guarantees.
    Not quite a single secure connection.
    Instead, sessions/streams with different security properties.
    Different cipher, certificates, etc.
  Example issue: re-negotiation.
    Protocol-level bug or overly-complicated API that libraries/apps misuse.
  Everest: TLS 1.3 record layer verification.
    Layer in TLS that (logically) manages keys, encrypts/authenticates messages.
    Exposes streams that are used to implement handshake, alerts, app data.
  Eventual goal: simple secure channel abstraction on top?
    Needs to expose enough information for certificates, HTTPS, ..
    Needs to capture real abstraction provided by TLS: sessions, negotiation, ..

Future work.
  HTTPS.
  Certificates.
  Linking all of these pieces together.
  What's the right abstraction?  RPC-like?
  How to formalize security properties?
  How to connect all of the proofs together?
    Security games for protocols.
    Standard refinement for parsers.
    Side-channel-resistance for low-level code.
    Etc.

Discussion / feedback on 6.826.
  What would you change?
  Papers.
  Labs.
  Lectures.
  Thoughts on an "applied verification" class?

Possibly relevant class next term: 6.822 by Adam Chlipala.
  Formal Reasoning About Programs.