SDN === What is this lecture about? Networks are an important problem domain. Systems rely on networks to operate correctly, securely. Complex: distributed policies, protocols, algorithms. Distributed system: concurrency, recovery from partial failures. SDN simplifies reasoning about network behavior. Centralized controller, policies, decisions. Tries to reduce the "distributed system" complexity of the network. Offers possibility of formally verifying network correctness. Some interesting issues in defining correctness. Many switches. In-flight packets. What is this paper? Formalizing a framework in Coq for reasoning about SDNs. With an example SDN controller built in this framework. Focus of the paper is more on the framework than example controller. Not much details about the actual controller that was built. Some features of the framework / model not used by example controller. Related paper: "Abstractions for Network Update". [ Ref: http://www.cs.cornell.edu/~jnfoster/papers/consistent-updates-sigcomm12.pdf ] Lays out some bigger problems that motivate need for formal reasoning. Motivates some of the seemingly less-necessary features of the framework. Useful as a forcing function to talk about SDN. So, will be more of a discussion of SDN and how to formalize and verify. Traditional network view. Switches with many ports. (Historical terminology: Ethernet switches, IP routers.) Devices connect to switches. Links between switches, often redundant topologies. Many goals that might need to be enforced. Reachability: deal with failures of links or switches. Resource allocation: ensure bandwidth for particular applications. Access control: isolate applications from one another for security. Support "middleboxes": logging, firewall/IDS, load balancer, .. Traditionally achieved in a distributed manner. Distributed routing protocols run on switches. Policies distributed across switches, into local per-switch fragments. Switches might have policies or protocols specified in different ways. Agreement only on the network protocol, not on configuration. SDN overview. Adds a centralized controller. Alternative to distributed protocols and policies in traditional view. Single copy of policies for access control, bandwidth, fault tolerance, .. Switches connect to the controller for decisions. When packet comes in, switch sends packet to controller. Controller responds telling switch what to do with the packet. For performance, controller can push specific rules into an on-switch cache. Specify what to do with packets matching a certain pattern. Works well if this ends up being the common case. Something running "on top" of the controller changes policies over time. E.g., fault tolerance protocol might change routes on link failure. E.g., add or remove rules when new machines connect or servers fail. In practice, might be considered to be part of the controller. In the paper, factored out from the narrow view of what the controller is. SDN benefits and downsides. Benefit: centralized control, simpler to reason about. No need to set up complex routing protocols or track per-switch policies. Benefit: interchangeable switches for recovery from switch failures. Switch has no interesting state, easy to replace. Benefit: interchangeable switches between vendors. Downside/cost: requires a high-performance, highly available controller. In serious deployment, need to replicate controller. Distributed system issues come back, though perhaps more focused. Limitation: single organization, since there's a single controller. Doesn't work for the entire Internet, no agreement on controller. SDNs are widely used in practice. Some enterprise networks. Some data centers, though limited by controller bottleneck (perf, avail). Cross-data-center WAN links in a single organization. This paper's view of SDN. Controller has a "program" defining how the network should operate. Written in a language called NetCore. NetCore is a research tool, not widely used in itself. NetCore came from an overlapping set of authors w/ this paper. Packet predicates: input switch port, header values. Packet transformations (not described in this paper). Packet action: forward on some set of output ports. Combine programs together with "union" or "restrict". Something (not in this paper) sits on top of the controller. Has a higher-level policy set by administrator / operator. Probably includes a routing protocol, dynamic rules, .. Constructs the NetCore program to run at any given time. Could have verified properties of this NetCore program. E.g., all HTTP traffic gets sent to some logging server. Verified controller proof tells us this property holds of real network. Switches have a cache of rules: flow table. Somewhat different structure than NetCore. Packet predicates. Some support for packet transformations. Priorities instead of "union" and "restrict". Actions determine output ports. OpenFlow: standard interface between controller and switches. Switch sends all unmatched packets to controller. Controller can tell switch what to do with a packet. Controller can install new entries in switch flow table. Simplified model of OpenFlow switch, called Featherweight OpenFlow. Real OpenFlow protocol somewhat more complicated. [ Ref: https://www.opennetworking.org/wp-content/uploads/2014/10/openflow-switch-v1.5.1.pdf ] Controller as compiler. NetCore program gets compiled and executed on actual switches. Controller's job: perform this compilation and execution. Similar to CompCert: ensure actual execution matches source program. New complication compared to CompCert: execution is part of controller's job. Execution happens on multiple switches. Interface for controlling switches has lots of asynchrony. Program might change over time, need to change to new program. (Not described in this paper, and not part of verified controller.) Correctness of compilation from NetCore to actual switch execution. Externally visible events. Event is a packet arriving on an input port of a switch. Informal goal: preserve externally visible events across compilation. Correctness story seems a bit too simplistic in the paper, though. Puzzle 1: paper claims bisimulation between NetCore and Featherweight OF. Bisimulation is trace equality, not just trace inclusion. Basic trace inclusion: every trace in FOF is also possible in NetCore. Bisimulation adds: every trace in NetCore is also possible in FOF. Why might this be a bad idea? Might impose too strict of a requirement on the implementation. E.g., NetCore can process any queued incoming packet. Real switch might process packets in order. Some NetCore traces might not be realizable in actual switch. Maybe what's going on is bisimulation to FOF model, not real switch. Real switch would prove trace inclusion to FOF model? Puzzle 2: externally visible events don't include output packets. Actual implementation might be quite different than NetCore "spec". E.g., broadcast packets to all connected hosts, or drop all packets. Meets the formal requirement: same trace of processed input packets. Maybe the intended use of spec is to have dummy "switches" for each host? This way, packets sent to a connected machine are inputs to that switch.. Doesn't seem to be what they are doing, though. Interesting extension of correctness: updating NetCore program. Motivated by "Abstractions for Network Update" paper (see above). What does correctness mean when the NetCore program changes? E.g., decide to change where traffic is sent to re-balance. Strawman: atomic change between NetCore programs. As if switches were re-programmed instantaneously. Might not be the right guarantee: in-flight packets. Packet might be sent to a switch which no longer expects that packet. Better correctness definition: per-packet consistency. As if every packet was processed by old or new NetCore program. Packets should finish processing with same program that started them. One solution: flush all in-flight packets when re-configuring. Not a good idea in terms of availability. Might be tricky to do because switches and links don't expose a flush. Another solution: version all packets and NetCore programs. E.g., stick version of NetCore program into VLAN tag. More sophisticated solutions depend on the programs being changed. Controller structure. NetCore -> global flow table -> switches. Similar to CompCert layering. Compile NetCore into a global flow table. Mostly translating unions and restrictions into rule priorities. Weird semantics in OpenFlow switches. If pattern checks TCP port# but doesn't check that proto=TCP, the TCP port# doesn't get checked. This paper's system prevents writing such patterns. Roughly, must check that proto=TCP before checking TCP port#. Seems like a sensible idea independent of verification. Simple optimization to reduce flow table size: subsumed entries. Remove low-priority entries that are subsumed by higher-prio entries. Apparently prior controllers had bugs in this. PANE: combining identical actions sometimes is not idempotent (cross-prio?) Installing flow table entries into switches. OpenFlow protocol between controller and switch has lots of asynchrony. Switch -> controller: received unmatched packet. Controller -> switch: AddFlow | DelFlow | Barrier | send out packet. AddFlow and DelFlow requests can be reordered by switch. Barrier waits for all outstanding AddFlow, DelFlow requests to finish. Prior controllers sometimes didn't get the barriers right. This paper's controller gets barriers right, but is somewhat simplistic. Isuses a barrier for every AddFlow command. [ Ref: https://github.com/frenetic-lang/featherweight-openflow/blob/master/coq/FwOF/FwOFSafeWire.v#L28 ] In principle, packets might be processed by switch or by controller. If switch doesn't have a flow entry, will send packet to controller. This paper's controller installs full flow entries on all switches. Limitations: seems like this might not apply to real networks. Definition of correctness seems too simplistic. No output packets: do we need fake "end-host switches" for correctness? No possibility for reduced non-determinism in FOF switch. Perhaps OK if actual switch proves one-way simulation to FOF model? Controller model seems too simplistic. Static NetCore program running on the controller, no dynamic update. Installs full flow tables in switch. FOF model of switch has infinite flow table space. Simplistic implementation. Controller materializes full flow table in memory. No on-demand installing of flow table entries. Limited performance experiments: can full materialization scale?