3e94111d6b
For networks in the same address scope, network traffic routes directly. This happens not only between internal networks, but also between internal network and external network. No SNAT is applied when routing traffic to the external network because addresses on the internal network are assumed to be viable on the external network. For networks in different scopes, network traffic can't route directly. Between internal networks in different scopes, traffic is blocked. DNAT for floating IPs will still work. Also, shared SNAT to the external network will still work as it does today. Change-Id: I439633ebef432b1a2eecee09b647207d5a271bf6 Co-Authored-By: Hong Hui Xiao <xiaohhui@cn.ibm.com> Implements: blueprint address-scopes
213 lines
9.8 KiB
ReStructuredText
213 lines
9.8 KiB
ReStructuredText
Subnet Pools and Address Scopes
|
|
===============================
|
|
|
|
This page discusses subnet pools and address scopes
|
|
|
|
Subnet Pools
|
|
------------
|
|
|
|
Learn about subnet pools by watching the summit talk given in Vancouver [#]_.
|
|
|
|
.. [#] http://www.youtube.com/watch?v=QqP8yBUUXBM&t=6m12s
|
|
|
|
Subnet pools were added in Kilo. They are relatively simple. A SubnetPool has
|
|
any number of SubnetPoolPrefix objects associated to it. These prefixes are in
|
|
CIDR format. Each CIDR is a piece of the address space that is available for
|
|
allocation.
|
|
|
|
Subnet Pools support IPv6 just as well as IPv4.
|
|
|
|
The Subnet model object now has a subnetpool_id attribute whose default is null
|
|
for backward compatibility. The subnetpool_id attribute stores the UUID of the
|
|
subnet pool that acted as the source for the address range of a particular
|
|
subnet.
|
|
|
|
When creating a subnet, the subnetpool_id can be optionally specified. If it
|
|
is, the 'cidr' field is not required. If 'cidr' is specified, it will be
|
|
allocated from the pool assuming the pool includes it and hasn't already
|
|
allocated any part of it. If 'cidr' is left out, then the prefixlen attribute
|
|
can be specified. If it is not, the default prefix length will be taken from
|
|
the subnet pool. Think of it this way, the allocation logic always needs to
|
|
know the size of the subnet desired. It can pull it from a specific CIDR,
|
|
prefixlen, or default. A specific CIDR is optional and the allocation will try
|
|
to honor it if provided. The request will fail if it can't honor it.
|
|
|
|
Subnet pools do not allow overlap of subnets.
|
|
|
|
Subnet Pool Quotas
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
A quota mechanism was provided for subnet pools. It is different than other
|
|
quota mechanisms in Neutron because it doesn't count instances of first class
|
|
objects. Instead it counts how much of the address space is used.
|
|
|
|
For IPv4, it made reasonable sense to count quota in terms of individual
|
|
addresses. So, if you're allowed exactly one /24, your quota should be set to
|
|
256. Three /26s would be 192. This mechanism encourages more efficient use of
|
|
the IPv4 space which will be increasingly important when working with globally
|
|
routable addresses.
|
|
|
|
For IPv6, the smallest viable subnet in Neutron is a /64. There is no reason
|
|
to allocate a subnet of any other size for use on a Neutron network. It would
|
|
look pretty funny to set a quota of 4611686018427387904 to allow one /64
|
|
subnet. To avoid this, we count IPv6 quota in terms of /64s. So, a quota of 3
|
|
allows three /64 subnets. When we need to allocate something smaller in the
|
|
future, we will need to ensure that the code can handle non-integer quota
|
|
consumption.
|
|
|
|
Allocation
|
|
~~~~~~~~~~
|
|
|
|
Allocation is done in a way that aims to minimize fragmentation of the pool.
|
|
The relevant code is here [#]_. First, the available prefixes are computed
|
|
using a set difference: pool - allocations. The result is compacted [#]_ and
|
|
then sorted by size. The subnet is then allocated from the smallest available
|
|
prefix that is large enough to accommodate the request.
|
|
|
|
.. [#] neutron/ipam/subnet_alloc.py (_allocate_any_subnet)
|
|
.. [#] http://pythonhosted.org/netaddr/api.html#netaddr.IPSet.compact
|
|
|
|
Address Scopes
|
|
--------------
|
|
|
|
Before subnet pools or address scopes, it was impossible to tell if a network
|
|
address was routable in a certain context because the address was given
|
|
explicitly on subnet create and wasn't validated against any other addresses.
|
|
Address scopes are meant to solve this by putting control over the address
|
|
space in the hands of an authority: the address scope owner. It makes use of
|
|
the already existing SubnetPool concept for allocation.
|
|
|
|
Address scopes are "the thing within which address overlap is not allowed" and
|
|
thus provide more flexible control as well as decoupling of address overlap
|
|
from tenancy.
|
|
|
|
Prior to the Mitaka release, there was implicitly only a single 'shared'
|
|
address scope. Arbitrary address overlap was allowed making it pretty much a
|
|
"free for all". To make things seem somewhat sane, normal tenants are not able
|
|
to use routers to cross-plug networks from different tenants and NAT was used
|
|
between internal networks and external networks. It was almost as if each
|
|
tenant had a private address scope.
|
|
|
|
The problem is that this model cannot support use cases where NAT is not
|
|
desired or supported (e.g. IPv6) or we want to allow different tenants to
|
|
cross-plug their networks.
|
|
|
|
An AddressScope covers only one address family. But, they work equally well
|
|
for IPv4 and IPv6.
|
|
|
|
Routing
|
|
~~~~~~~
|
|
|
|
The reference implementation honors address scopes. Within an address scope,
|
|
addresses route freely (barring any FW rules or other external restrictions).
|
|
Between scopes, routed is prevented unless address translation is used. For
|
|
now, floating IPs are the only place where traffic crosses scope boundaries.
|
|
The 1-1 NAT allows this to happen.
|
|
|
|
.. TODO (Carl) Implement NAT for floating ips crossing scopes
|
|
.. TODO (Carl) Implement SNAT for crossing scopes
|
|
|
|
RPC
|
|
~~~
|
|
|
|
The L3 agent in the reference implementation needs to know the address scope
|
|
for each port on each router in order to map ingress traffic correctly.
|
|
|
|
Each subnet from the same address family on a network is required to be from
|
|
the same subnet pool. Therefore, the address scope will also be the same. If
|
|
this were not the case, it would be more difficult to match ingress traffic on
|
|
a port with the appropriate scope. It may be counter-intuitive but L3 address
|
|
scopes need to be anchored to some sort of non-L3 thing (e.g. an L2 interface)
|
|
in the topology in order to determine the scope of ingress traffic. For now,
|
|
we use ports/networks. In the future, we may be able to distinguish by
|
|
something else like the remote MAC address or something.
|
|
|
|
The address scope id is set on each port in a dict under the 'address_scopes'
|
|
attribute. The scope is distinct per address family. If the attribute does
|
|
not appear, it is assumed to be null for both families. A value of null means
|
|
that the addresses are in the "implicit" address scope which holds all
|
|
addresses that don't have an explicit one. All subnets that existed in Neutron
|
|
before address scopes existed fall here.
|
|
|
|
Here is an example of how the json will look in the context of a router port::
|
|
|
|
"address_scopes": {
|
|
"4": "d010a0ea-660e-4df4-86ca-ae2ed96da5c1",
|
|
"6": null
|
|
},
|
|
|
|
To implement floating IPs crossing scope boundaries, the L3 agent needs to know
|
|
the target scope of the floating ip. The fixed address is not enough to
|
|
disambiguate because, theoritically, there could be overlapping addresses from
|
|
different scopes. The scope is computed [#]_ from the floating ip fixed port
|
|
and attached to the floating ip dict under the 'fixed_ip_address_scope'
|
|
attribute. Here's what the json looks like (trimmed)::
|
|
|
|
{
|
|
...
|
|
"floating_ip_address": "172.24.4.4",
|
|
"fixed_ip_address": "172.16.0.3",
|
|
"fixed_ip_address_scope": "d010a0ea-660e-4df4-86ca-ae2ed96da5c1",
|
|
...
|
|
}
|
|
|
|
.. [#] neutron/db/l3_db.py (_get_sync_floating_ips)
|
|
|
|
Model
|
|
~~~~~
|
|
|
|
The model for subnet pools and address scopes can be found in
|
|
neutron/db/models_v2.py and neutron/db/address_scope_db.py. This document
|
|
won't go over all of the details. It is worth noting how they relate to
|
|
existing Neutron objects. The existing Neutron subnet now optionally
|
|
references a single subnet pool::
|
|
|
|
+----------------+ +------------------+ +--------------+
|
|
| Subnet | | SubnetPool | | AddressScope |
|
|
+----------------+ +------------------+ +--------------+
|
|
| subnet_pool_id +------> | address_scope_id +------> | |
|
|
| | | | | |
|
|
| | | | | |
|
|
| | | | | |
|
|
+----------------+ +------------------+ +--------------+
|
|
|
|
L3 Agent
|
|
~~~~~~~~
|
|
|
|
The L3 agent is limited in its support for multiple address scopes. Within a
|
|
router in the reference implementation, traffic is marked on ingress with the
|
|
address scope corresponding to the network it is coming from. If that traffic
|
|
would route to an interface in a different address scope, the traffic is
|
|
blocked unless an exception is made.
|
|
|
|
One exception is made for floating IP traffic. When traffic is headed to a
|
|
floating IP, DNAT is applied and the traffic is allowed to route to the private
|
|
IP address potentially crossing the address scope boundary. When traffic
|
|
flows from an internal port to the external network and a floating IP is
|
|
assigned, that traffic is also allowed.
|
|
|
|
Another exception is made for traffic from an internal network to the external
|
|
network when SNAT is enabled. In this case, SNAT to the router's fixed IP
|
|
address is applied to the traffic. However, SNAT is not used if the external
|
|
network has an explicit address scope assigned and it matches the internal
|
|
network's. In that case, traffic routes straight through without NAT. The
|
|
internal network's addresses are viable on the external network in this case.
|
|
|
|
The reference implementation has limitations. Even with multiple address
|
|
scopes, a router implementation is unable to connect to two networks with
|
|
overlapping IP addresses. There are two reasons for this.
|
|
|
|
First, a single routing table is used inside the namespace. An implementation
|
|
using multiple routing tables has been in the works but there are some
|
|
unresolved issues with it.
|
|
|
|
Second, the default SNAT feature cannot be supported with the current Linux
|
|
conntrack implementation unless a double NAT is used (one NAT to get from the
|
|
address scope to an intermediate address specific to the scope and a second NAT
|
|
to get from that intermediate address to an external address). Single NAT
|
|
won't work if there are duplicate addresses across the scopes.
|
|
|
|
Due to these complications the router will still refuse to connect to
|
|
overlapping subnets. We can look in to an implementation that overcomes these
|
|
limitations in the future.
|