Jakub Libosvar 3327db80be ovs-fw: Clear conntrack information before egress pipeline

In case where Neutron logical port is placed directly to hypervisor,
hypervisor does a conntrack lookup before packets reach OVS integration
bridge. This patch introduces a rule with high priority that is placed
at the beginning of the egress pipeline. This rule removes conntrack
information from all packets if conntrack information is present. Then
packets continue in the egress pipeline.

That means all packets in egress pipeline are not tracked and ovs
firewall can do a lookup in correct zone. As for ingress pipeline, it
distinguishes between tracked - which are packets coming from egress
pipeline, and not tracked, which are inbound packets coming not from a
local port.

Change-Id: Ia4f524adce2b5ee6d98d3921cfb03d56ad6d0813
Closes-bug: #1747082

2018-03-14 14:27:40 +00:00

26 KiB

Raw Blame History

Open vSwitch Firewall Driver

The OVS driver has the same API as the current iptables firewall driver, keeping the state of security groups and ports inside of the firewall. Class SGPortMap was created to keep state consistent, and maps from ports to security groups and vice-versa. Every port and security group is represented by its own object encapsulating the necessary information.

Note: Open vSwitch firewall driver uses register 5 for marking flow related to port and register 6 which defines network and is used for conntrack zones.

Firewall API calls

There are two main calls performed by the firewall driver in order to either create or update a port with security groups - prepare_port_filter and update_port_filter. Both methods rely on the security group objects that are already defined in the driver and work similarly to their iptables counterparts. The definition of the objects will be described later in this document. prepare_port_filter must be called only once during port creation, and it defines the initial rules for the port. When the port is updated, all filtering rules are removed, and new rules are generated based on the available information about security groups in the driver.

Security group rules can be defined in the firewall driver by calling update_security_group_rules, which rewrites all the rules for a given security group. If a remote security group is changed, then update_security_group_members is called to determine the set of IP addresses that should be allowed for this remote security group. Calling this method will not have any effect on existing instance ports. In other words, if the port is using security groups and its rules are changed by calling one of the above methods, then no new rules are generated for this port. update_port_filter must be called for the changes to take effect.

All the machinery above is controlled by security group RPC methods, which mean the firewall driver doesn't have any logic of which port should be updated based on the provided changes, it only accomplishes actions when called from the controller.

OpenFlow rules

At first, every connection is split into ingress and egress processes based on the input or output port respectively. Each port contains the initial hardcoded flows for ARP, DHCP and established connections, which are accepted by default. To detect established connections, a flow must by marked by conntrack first with an action=ct() rule. An accepted flow means that ingress packets for the connection are directly sent to the port, and egress packets are left to be normally switched by the integration bridge.

Connections that are not matched by the above rules are sent to either the ingress or egress filtering table, depending on its direction. The reason the rules are based on security group rules in separate tables is to make it easy to detect these rules during removal.

Security group rules are treated differently for those without a remote group ID and those with a remote group ID. A security group rule without a remote group ID is expanded into several OpenFlow rules by the method create_flows_from_rule_and_port. A security group rule with a remote group ID is expressed by three sets of flows. The first two are conjunctive flows which will be described in the next section. The third set matches on the conjunction IDs and does accept actions.

Flow priorities for security group rules

The OpenFlow spec says a packet should not match against multiple flows at the same priority¹. The firewall driver uses 8 levels of priorities to achieve this. The method flow_priority_offset calculates a priority for a given security group rule. The use of priorities is essential with conjunction flows, which will be described later in the conjunction flows examples.

Uses of conjunctive flows

With a security group rule with a remote group ID, flows that match on nw_src for remote_group_id addresses and match on dl_dst for port MAC addresses are needed (for ingress rules; likewise for egress rules). Without conjunction, this results in O(n*m) flows where n and m are number of ports in the remote group ID and the port security group, respectively.

A conj_id is allocated for each (remote_group_id, security_group_id, direction, ethertype, flow_priority_offset) tuple. The class ConjIdMap handles the mapping. The same conj_id is shared between security group rules if multiple rules belong to the same tuple above.

Conjunctive flows consist of 2 dimensions. Flows that belong to the dimension 1 of 2 are generated by the method create_flows_for_ip_address and are in charge of IP address based filtering specified by their remote group IDs. Flows that belong to the dimension 2 of 2 are generated by the method create_flows_from_rule_and_port and modified by the method substitute_conjunction_actions, which represents the portion of the rule other than its remote group ID.

Those dimension 2 of 2 flows are per port and contain no remote group information. When there are multiple security group rules for a port, those flows can overlap. To avoid such a situation, flows are sorted and fed to merge_port_ranges or merge_common_rules methods to rearrange them.

Rules example with explanation:

The following example presents two ports on the same host. They have different security groups and there is icmp traffic allowed from first security group to the second security group. Ports have following attributes:

Port 1
  - plugged to the port 1 in OVS bridge
  - ip address: 192.168.0.1
  - mac address: fa:16:3e:a4:22:10
  - security group 1: can send icmp packets out
  - allowed address pair: 10.0.0.1/32, fa:16:3e:8c:84:13

Port 2
  - plugged to the port 2 in OVS bridge
  - ip address: 192.168.0.2
  - mac address: fa:16:3e:24:57:c7
  - security group 2:
     - can receive icmp packets from security group 1
     - can receive tcp packets from security group 1
     - can receive tcp packets to port 80 from security group 2
     - can receive ip packets from security group 3
  - allowed address pair: 10.1.0.0/24, fa:16:3e:8c:84:14

table 0 contains a low priority rule to continue packets processing in table 60 aka TRANSIENT table. table 0 is left for use to other features that take precedence over firewall, e.g. DVR. The only requirement is that after feature is done with its processing, it needs to pass packets for processing to the TRANSIENT table. This TRANSIENT table distinguishes the traffic to ingress or egress and loads to register 5 value identifying port traffic. Egress flow is determined by switch port number and ingress flow is determined by destination mac address. register 6 contains port tag to isolate connections into separate conntrack zones.

table=60,  priority=100,in_port=1 actions=load:0x1->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,71)
table=60,  priority=100,in_port=2 actions=load:0x2->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,71)
table=60,  priority=90,dl_vlan=0x284,dl_dst=fa:16:3e:a4:22:10 actions=load:0x1->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,81)
table=60,  priority=90,dl_vlan=0x284,dl_dst=fa:16:3e:8c:84:13 actions=load:0x1->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,81)
table=60,  priority=90,dl_vlan=0x284,dl_dst=fa:16:3e:24:57:c7 actions=load:0x2->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,81)
table=60,  priority=90,dl_vlan=0x284,dl_dst=fa:16:3e:8c:84:14 actions=load:0x2->NXM_NX_REG5[],load:0x284->NXM_NX_REG6[],resubmit(,81)
table=60,  priority=0 actions=NORMAL

Following table 71 implements arp spoofing protection, ip spoofing protection, allows traffic for obtaining ip addresses (dhcp, dhcpv6, slaac, ndp) for egress traffic and allows arp replies. Also identifies not tracked connections which are processed later with information obtained from conntrack. Notice the zone=NXM_NX_REG6[0..15] in actions when obtaining information from conntrack. It says every port has its own conntrack zone defined by value in register 6. It's there to avoid accepting established traffic that belongs to different port with same conntrack parameters.

The very first rule in table 71 is a rule removing conntrack information for a use-case where Neutron logical port is placed directly to the hypervisor. In such case kernel does conntrack lookup before packet reaches Open vSwitch bridge. Tracked packets are sent back for processing by the same table after conntrack information is cleared.

::: table=71, priority=110,ct_state=+trk actions=ct_clear,resubmit(,71)

Rules below allow ICMPv6 traffic for multicast listeners, neighbour solicitation and neighbour advertisement.

table=71, priority=95,icmp6,reg5=0x1,in_port=1,icmp_type=130 actions=resubmit(,91)
table=71, priority=95,icmp6,reg5=0x1,in_port=1,icmp_type=131 actions=resubmit(,91)
table=71, priority=95,icmp6,reg5=0x1,in_port=1,icmp_type=132 actions=resubmit(,91)
table=71, priority=95,icmp6,reg5=0x1,in_port=1,icmp_type=135 actions=resubmit(,91)
table=71, priority=95,icmp6,reg5=0x1,in_port=1,icmp_type=136 actions=resubmit(,91)
table=71, priority=95,icmp6,reg5=0x2,in_port=2,icmp_type=130 actions=resubmit(,91)
table=71, priority=95,icmp6,reg5=0x2,in_port=2,icmp_type=131 actions=resubmit(,91)
table=71, priority=95,icmp6,reg5=0x2,in_port=2,icmp_type=132 actions=resubmit(,91)
table=71, priority=95,icmp6,reg5=0x2,in_port=2,icmp_type=135 actions=resubmit(,91)
table=71, priority=95,icmp6,reg5=0x2,in_port=2,icmp_type=136 actions=resubmit(,91)

Following rules implement arp spoofing protection

table=71, priority=95,arp,reg5=0x1,in_port=1,dl_src=fa:16:3e:a4:22:10,arp_spa=192.168.0.1 actions=resubmit(,91)
table=71, priority=95,arp,reg5=0x1,in_port=1,dl_src=fa:16:3e:8c:84:13,arp_spa=10.0.0.1 actions=resubmit(,91)
table=71, priority=95,arp,reg5=0x2,in_port=2,dl_src=fa:16:3e:24:57:c7,arp_spa=192.168.0.2 actions=resubmit(,91)
table=71, priority=95,arp,reg5=0x2,in_port=2,dl_src=fa:16:3e:8c:84:14,arp_spa=10.1.0.0/24 actions=resubmit(,91)

DHCP and DHCPv6 traffic is allowed to instance but DHCP servers are blocked on instances.

table=71, priority=80,udp,reg5=0x1,in_port=1,tp_src=68,tp_dst=67 actions=resubmit(,73)
table=71, priority=80,udp6,reg5=0x1,in_port=1,tp_src=546,tp_dst=547 actions=resubmit(,73)
table=71, priority=70,udp,reg5=0x1,in_port=1,tp_src=67,tp_dst=68 actions=resubmit(,93)
table=71, priority=70,udp6,reg5=0x1,in_port=1,tp_src=547,tp_dst=546 actions=resubmit(,93)
table=71, priority=80,udp,reg5=0x2,in_port=2,tp_src=68,tp_dst=67 actions=resubmit(,73)
table=71, priority=80,udp6,reg5=0x2,in_port=2,tp_src=546,tp_dst=547 actions=resubmit(,73)
table=71, priority=70,udp,reg5=0x2,in_port=2,tp_src=67,tp_dst=68 actions=resubmit(,93)
table=71, priority=70,udp6,reg5=0x2,in_port=2,tp_src=547,tp_dst=546 actions=resubmit(,93)

Flowing rules obtain conntrack information for valid ip and mac address combinations. All other packets are dropped.

table=71, priority=65,ip,reg5=0x1,in_port=1,dl_src=fa:16:3e:a4:22:10,nw_src=192.168.0.1 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=65,ip,reg5=0x1,in_port=1,dl_src=fa:16:3e:8c:84:13,nw_src=10.0.0.1 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=65,ip,reg5=0x2,in_port=2,dl_src=fa:16:3e:24:57:c7,nw_src=192.168.0.2 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=65,ip,reg5=0x2,in_port=2,dl_src=fa:16:3e:8c:84:14,nw_src=10.1.0.0/24 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=65,ipv6,reg5=0x1,in_port=1,dl_src=fa:16:3e:a4:22:10,ipv6_src=fe80::f816:3eff:fea4:2210 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=65,ipv6,reg5=0x2,in_port=2,dl_src=fa:16:3e:24:57:c7,ipv6_src=fe80::f816:3eff:fe24:57c7 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
table=71, priority=10,reg5=0x1,in_port=1 actions=resubmit(,93)
table=71, priority=10,reg5=0x2,in_port=2 actions=resubmit(,93)
table=71, priority=0 actions=drop

table 72 accepts only established or related connections, and implements rules defined by the security group. As this egress connection might also be an ingress connection for some other port, it's not switched yet but eventually processed by ingress pipeline.

All established or new connections defined by security group rule are accepted, which will be explained later. All invalid packets are dropped. In case below we allow all icmp egress traffic.

table=72, priority=75,ct_state=+est-rel-rpl,icmp,reg5=0x1 actions=resubmit(,73)
table=72, priority=75,ct_state=+new-est,icmp,reg5=0x1 actions=resubmit(,73)
table=72, priority=50,ct_state=+inv+trk actions=resubmit(,93)

Important on the flows below is the ct_mark=0x1. Such value have flows that were marked as not existing anymore by rule introduced later. Those are typically connections that were allowed by some security group rule and the rule was removed.

table=72, priority=50,ct_mark=0x1,reg5=0x1 actions=resubmit(,93)
table=72, priority=50,ct_mark=0x1,reg5=0x2 actions=resubmit(,93)

All other connections that are not marked and are established or related are allowed.

table=72, priority=50,ct_state=+est-rel+rpl,ct_zone=644,ct_mark=0,reg5=0x1 actions=resubmit(,91)
table=72, priority=50,ct_state=+est-rel+rpl,ct_zone=644,ct_mark=0,reg5=0x2 actions=resubmit(,91)
table=72, priority=50,ct_state=-new-est+rel-inv,ct_zone=644,ct_mark=0,reg5=0x1 actions=resubmit(,91)
table=72, priority=50,ct_state=-new-est+rel-inv,ct_zone=644,ct_mark=0,reg5=0x2 actions=resubmit(,91)

In the following flows are marked established connections that weren't matched in the previous flows, which means they don't have accepting security group rule anymore.

table=72, priority=40,ct_state=-est,reg5=0x1 actions=resubmit(,93)
table=72, priority=40,ct_state=+est,reg5=0x1 actions=ct(commit,zone=NXM_NX_REG6[0..15],exec(load:0x1->NXM_NX_CT_MARK[]))
table=72, priority=40,ct_state=-est,reg5=0x2 actions=resubmit(,93)
table=72, priority=40,ct_state=+est,reg5=0x2 actions=ct(commit,zone=NXM_NX_REG6[0..15],exec(load:0x1->NXM_NX_CT_MARK[]))
table=72, priority=0 actions=drop

In following table 73 are all detected ingress connections sent to ingress pipeline. Since the connection was already accepted by egress pipeline, all remaining egress connections are sent to normal switching.

table=73, priority=100,reg6=0x284,dl_dst=fa:16:3e:a4:22:10 actions=load:0x1->NXM_NX_REG5[],resubmit(,81)
table=73, priority=100,reg6=0x284,dl_dst=fa:16:3e:8c:84:13 actions=load:0x1->NXM_NX_REG5[],resubmit(,81)
table=73, priority=100,reg6=0x284,dl_dst=fa:16:3e:24:57:c7 actions=load:0x2->NXM_NX_REG5[],resubmit(,81)
table=73, priority=100,reg6=0x284,dl_dst=fa:16:3e:8c:84:14 actions=load:0x2->NXM_NX_REG5[],resubmit(,81)
table=73, priority=90,ct_state=+new-est,reg5=0x1 actions=ct(commit,zone=NXM_NX_REG6[0..15]),resubmit(,91)
table=73, priority=90,ct_state=+new-est,reg5=0x2 actions=ct(commit,zone=NXM_NX_REG6[0..15]),resubmit(,91)
table=73, priority=80,reg5=0x1 actions=resubmit(,91)
table=73, priority=80,reg5=0x2 actions=resubmit(,91)
table=73, priority=0 actions=drop

table 81 is similar to table 71, allows basic ingress traffic for obtaining ip address and arp queries. Note that vlan tag must be removed by adding strip_vlan to actions list, prior to injecting packet directly to port. Not tracked packets are sent to obtain conntrack information.

table=81, priority=100,arp,reg5=0x1 actions=strip_vlan,output:1,resubmit(,92)
table=81, priority=100,arp,reg5=0x2 actions=strip_vlan,output:2,resubmit(,92)
table=81, priority=100,icmp6,reg5=0x1,icmp_type=130 actions=strip_vlan,output:1,resubmit(,92)
table=81, priority=100,icmp6,reg5=0x1,icmp_type=131 actions=strip_vlan,output:1,resubmit(,92)
table=81, priority=100,icmp6,reg5=0x1,icmp_type=132 actions=strip_vlan,output:1,resubmit(,92)
table=81, priority=100,icmp6,reg5=0x1,icmp_type=135 actions=strip_vlan,output:1,resubmit(,92)
table=81, priority=100,icmp6,reg5=0x1,icmp_type=136 actions=strip_vlan,output:1,resubmit(,92)
table=81, priority=100,icmp6,reg5=0x2,icmp_type=130 actions=strip_vlan,output:2,resubmit(,92)
table=81, priority=100,icmp6,reg5=0x2,icmp_type=131 actions=strip_vlan,output:2,resubmit(,92)
table=81, priority=100,icmp6,reg5=0x2,icmp_type=132 actions=strip_vlan,output:2,resubmit(,92)
table=81, priority=100,icmp6,reg5=0x2,icmp_type=135 actions=strip_vlan,output:2,resubmit(,92)
table=81, priority=100,icmp6,reg5=0x2,icmp_type=136 actions=strip_vlan,output:2,resubmit(,92)
table=81, priority=95,udp,reg5=0x1,tp_src=67,tp_dst=68 actions=strip_vlan,output:1,resubmit(,92)
table=81, priority=95,udp6,reg5=0x1,tp_src=547,tp_dst=546 actions=strip_vlan,output:1,resubmit(,92)
table=81, priority=95,udp,reg5=0x2,tp_src=67,tp_dst=68 actions=strip_vlan,output:2,resubmit(,92)
table=81, priority=95,udp6,reg5=0x2,tp_src=547,tp_dst=546 actions=strip_vlan,output:2,resubmit(,92)
table=81, priority=90,ct_state=-trk,ip,reg5=0x1 actions=ct(table=82,zone=NXM_NX_REG6[0..15])
table=81, priority=90,ct_state=-trk,ipv6,reg5=0x1 actions=ct(table=82,zone=NXM_NX_REG6[0..15])
table=81, priority=90,ct_state=-trk,ip,reg5=0x2 actions=ct(table=82,zone=NXM_NX_REG6[0..15])
table=81, priority=90,ct_state=-trk,ipv6,reg5=0x2 actions=ct(table=82,zone=NXM_NX_REG6[0..15])
table=81, priority=80,ct_state=+trk,reg5=0x1 actions=resubmit(,82)
table=81, priority=80,ct_state=+trk,reg5=0x2 actions=resubmit(,82)
table=81, priority=0 actions=drop

Similarly to table 72, table 82 accepts established and related connections. In this case we allow all icmp traffic coming from security group 1 which is in this case only port 1. The first four flows match on the ip addresses, and the next two flows match on the icmp protocol. These six flows define conjunction flows, and the next two define actions for them.

table=82, priority=71,ct_state=+est-rel-rpl,ip,reg6=0x284,nw_src=192.168.0.1 actions=conjunction(18,1/2)
table=82, priority=71,ct_state=+est-rel-rpl,ip,reg6=0x284,nw_src=10.0.0.1 actions=conjunction(18,1/2)
table=82, priority=71,ct_state=+new-est,ip,reg6=0x284,nw_src=192.168.0.1 actions=conjunction(19,1/2)
table=82, priority=71,ct_state=+new-est,ip,reg6=0x284,nw_src=10.0.0.1 actions=conjunction(19,1/2)
table=82, priority=71,ct_state=+est-rel-rpl,icmp,reg5=0x2 actions=conjunction(18,2/2)
table=82, priority=71,ct_state=+new-est,icmp,reg5=0x2 actions=conjunction(19,2/2)
table=82, priority=71,conj_id=18,ct_state=+est-rel-rpl,ip,reg5=0x2 actions=strip_vlan,output:2,resubmit(,92)
table=82, priority=71,conj_id=19,ct_state=+new-est,ip,reg5=0x2 actions=ct(commit,zone=NXM_NX_REG6[0..15]),strip_vlan,output:2,resubmit(,92)
table=82, priority=50,ct_state=+inv+trk actions=resubmit(,93)

There are some more security group rules with remote group IDs. Next we look at TCP related ones. Excerpt of flows that correspond to those rules are:

table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=0x60/0xffe0 actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=0x60/0xffe0 actions=conjunction(23,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=0x40/0xfff0 actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=0x40/0xfff0 actions=conjunction(23,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=0x58/0xfff8 actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=0x58/0xfff8 actions=conjunction(23,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=0x54/0xfffc actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=0x54/0xfffc actions=conjunction(23,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=0x52/0xfffe actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=0x52/0xfffe actions=conjunction(23,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=80 actions=conjunction(22,2/2),conjunction(14,2/2)
table=82, priority=73,ct_state=+est-rel-rpl,tcp,reg5=0x2,tp_dst=81 actions=conjunction(22,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=80 actions=conjunction(23,2/2),conjunction(15,2/2)
table=82, priority=73,ct_state=+new-est,tcp,reg5=0x2,tp_dst=81 actions=conjunction(23,2/2)

Only dimension 2/2 flows are shown here, as the other are similar to the previous ICMP example. There are many more flows but only the port ranges that covers from 64 to 127 are shown for brevity.

The conjunction IDs 14 and 15 correspond to packets from the security group 1, and the conjunction IDs 22 and 23 correspond to those from the security group 2. These flows are from the following security group rules,

- can receive tcp packets from security group 1
- can receive tcp packets to port 80 from security group 2

and these rules have been processed by merge_port_ranges into:

- can receive tcp packets to port != 80 from security group 1
- can receive tcp packets to port 80 from security group 1 or 2

before translating to flows so that there is only one matching flow even when the TCP destination port is 80.

The remaining is a L4 protocol agnostic rule.

table=82, priority=70,ct_state=+est-rel-rpl,ip,reg5=0x2 actions=conjunction(24,2/2)
table=82, priority=70,ct_state=+new-est,ip,reg5=0x2 actions=conjunction(25,2/2)

Any IP packet that matches the previous TCP flows matches one of these flows, but the corresponding security group rules have different remote group IDs. Unlike the above TCP example, there's no convenient way of expressing protocol != TCP or icmp_code != 1. So the OVS firewall uses a different priority than the previous TCP flows so as not to mix up them.

The mechanism for dropping connections that are not allowed anymore is the same as in table 72.

table=82, priority=50,ct_mark=0x1,reg5=0x1 actions=resubmit(,93)
table=82, priority=50,ct_mark=0x1,reg5=0x2 actions=resubmit(,93)
table=82, priority=50,ct_state=+est-rel+rpl,ct_zone=644,ct_mark=0,reg5=0x1 actions=strip_vlan,output:1,resubmit(,92)
table=82, priority=50,ct_state=+est-rel+rpl,ct_zone=644,ct_mark=0,reg5=0x2 actions=strip_vlan,output:2,resubmit(,92)
table=82, priority=50,ct_state=-new-est+rel-inv,ct_zone=644,ct_mark=0,reg5=0x1 actions=strip_vlan,output:1,resubmit(,92)
table=82, priority=50,ct_state=-new-est+rel-inv,ct_zone=644,ct_mark=0,reg5=0x2 actions=strip_vlan,output:2,resubmit(,92)
table=82, priority=40,ct_state=-est,reg5=0x1 actions=resubmit(,93)
table=82, priority=40,ct_state=+est,reg5=0x1 actions=ct(commit,zone=NXM_NX_REG6[0..15],exec(load:0x1->NXM_NX_CT_MARK[]))
table=82, priority=40,ct_state=-est,reg5=0x2 actions=resubmit(,93)
table=82, priority=40,ct_state=+est,reg5=0x2 actions=ct(commit,zone=NXM_NX_REG6[0..15],exec(load:0x1->NXM_NX_CT_MARK[]))
table=82, priority=0 actions=drop

Note: Conntrack zones on a single node are now based on network to which port is plugged in. That makes a difference between traffic on hypervisor only and east-west traffic. For example, if port has a VIP that was migrated to a port on different node, then new port won't contain conntrack information about previous traffic that happened with VIP.

Using OpenFlow in conjunction with OVS firewall

There are three tables where packets are sent once they get through the OVS firewall pipeline. The tables can be used by other mechanisms using OpenFlow that are supposed to work with the OVS firewall. Packets sent to table 91 are considered accepted by the egress pipeline and won't be processed further. NORMAL action is used by default in this table. Packets sent to table 92 were processed by the ingress filtering pipeline. As packets from the ingress filtering pipeline were injected to its destination, table 92 receives copies of those packets and therefore default action is drop. Finally, packets sent to table 93 were filtered by the firewall and should be dropped. Default action is drop in this table.

Future work

Create fullstack tests with tunneling enabled

During the update of firewall rules, we can use bundles to make the changes atomic

Upgrade path from iptables hybrid driver

During an upgrade, the agent will need to re-plug each instance's tap device into the integration bridge while trying to not break existing connections. One of the following approaches can be taken:

1) Pause the running instance in order to prevent a short period of time where its network interface does not have firewall rules. This can happen due to the firewall driver calling OVS to obtain information about OVS the port. Once the instance is paused and no traffic is flowing, we can delete the qvo interface from integration bridge, detach the tap device from the qbr bridge and plug the tap device back into the integration bridge. Once this is done, the firewall rules are applied for the OVS tap interface and the instance is started from its paused state.

2) Set drop rules for the instance's tap interface, delete the qbr bridge and related veths, plug the tap device into the integration bridge, apply the OVS firewall rules and finally remove the drop rules for the instance.

3) Compute nodes can be upgraded one at a time. A free node can be switched to use the OVS firewall, and instances from other nodes can be live-migrated to it. Once the first node is evacuated, its firewall driver can be then be switched to the OVS driver.

Although OVS seems to magically handle overlapping flows under some cases, we shouldn't rely on that.↩︎

26 KiB Raw Blame History