neutron/doc/source/devref/provisioning_blocks.rst
Kevin Benton b672c26cb4 Add provisioning blocks to status ACTIVE transition
Sometimes an object requires multiple disjoint actors to complete
a set of tasks before the status of the object should be transitioned
to ACTIVE. The main example of this is when a port is being created.
The L2 agent has to do its business to wire up the VIF, but at the same
time the DHCP agent has to setup the DHCP reservation. This led to
Nova booting the VM when the L2 agent was done even though the DHCP
agent may have been nowhere near ready.

This patch introduces a provisioning blocks mechansim that allows the
entities to be tracked that need to be involved to make a transition
to ACTIVE happen. See the devref in the dependent patch for a high-level
view of how this works.

The ML2 code is updated to use this new mechanism to prevent updating
the port status to ACTIVE without both the DHCP agent and L2 agent
reporting that the port is ready.

The DHCP RPC API required a version bump to allow the port ready
notification.

This also adds a devref doc for the provisioning_blocks
module with a high-level overview of how it works in addition
to a detailed description of how it is used specifically with
ML2, the L2 agents, and the DHCP agents.

Closes-Bug: #1453350
Change-Id: Id85ff6de1a14a550ab50baf4f79d3130af3680c8
2016-05-11 11:03:09 -07:00

160 lines
7.2 KiB
ReStructuredText

..
Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License. You may obtain
a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
Convention for heading levels in Neutron devref:
======= Heading 0 (reserved for the title in a document)
------- Heading 1
~~~~~~~ Heading 2
+++++++ Heading 3
''''''' Heading 4
(Avoid deeper levels because they do not render well.)
Composite Object Status via Provisioning Blocks
===============================================
We use the STATUS field on objects to indicate when a resource is ready
by setting it to ACTIVE so external systems know when it's safe to use
that resource. Knowing when to set the status to ACTIVE is simple when
there is only one entity responsible for provisioning a given object.
When that entity has finishing provisioning, we just update the STATUS
directly to active. However, there are resources in Neutron that require
provisioning by multiple asynchronous entities before they are ready to
be used so managing the transition to the ACTIVE status becomes more
complex. To handle these cases, Neutron has the provisioning_blocks
module to track the entities that are still provisioning a resource.
The main example of this is with ML2, the L2 agents and the DHCP agents.
When a port is created and bound to a host, it's placed in the DOWN
status. The L2 agent now has to setup flows, security group rules, etc
for the port and the DHCP agent has to setup a DHCP reservation for
the port's IP and MAC. Before the transition to ACTIVE, both agents
must complete their work or the port user (e.g. Nova) may attempt to
use the port and not have connectivity. To solve this, the
provisioning_blocks module is used to track the provisioning state
of each agent and the status is only updated when both complete.
High Level View
---------------
To make use of the provisioning_blocks module, provisioning components
should be added whenever there is work to be done by another entity
before an object's status can transition to ACTIVE. This is
accomplished by calling the add_provisioning_component method for
each entity. Then as each entity finishes provisioning the object,
the provisioning_complete must be called to lift the provisioning
block.
When the last provisioning block is removed, the provisioning_blocks
module will trigger a callback notification containing the object ID
for the object's resource type with the event PROVISIONING_COMPLETE.
A subscriber to this event can now update the status of this object
to ACTIVE or perform any other necessary actions.
A normal state transition will look something like the following:
1. Request comes in to create an object
2. Logic on the Neutron server determines which entities are required
to provision the object and adds a provisioning component for each
entity for that object.
3. A notification is emitted to the entities so they start their work.
4. Object is returned to the API caller in the DOWN (or BUILD) state.
5. Each entity tells the server when it has finished provisioning the
object. The server calls provisioning_complete for each entity that
finishes.
6. When provisioning_complete is called on the last remaining entity,
the provisioning_blocks module will emit an event indicating that
provisioning has completed for that object.
7. A subscriber to this event on the server will then update the status
of the object to ACTIVE to indicate that it is fully provisioned.
For a more concrete example, see the section below.
ML2, L2 agents, and DHCP agents
-------------------------------
ML2 makes use of the provisioning_blocks module to prevent the status
of ports from being transitioned to ACTIVE until both the L2 agent and
the DHCP agent have finished wiring a port.
When a port is created or updated, the following happens to register
the DHCP agent's provisioning blocks:
1. The subnet_ids are extracted from the fixed_ips field of the port
and then ML2 checks to see if DHCP is enabled on any of the subnets.
2. The configuration for the DHCP agents hosting the network are looked
up to ensure that at least one of them is new enough to report back
that it has finished setting up the port reservation.
3. If either of the preconditions above fail, a provisioning block for
the DHCP agent is not added and any existing DHCP agent blocks for
that port are cleared to ensure the port isn't blocked waiting for an
event that will never happen.
4. If the preconditions pass, a provisioning block is added for the port
under the 'DHCP' entity.
When a port is created or updated, the following happens to register the
L2 agent's provisioning blocks:
1. If the port is not bound, nothing happens because we don't know yet
if an L2 agent is involved so we have to wait until a port update that
binds it.
2. Once the port is bound, the agent based mechanism drivers will check
if they have an agent on the bound host and if the VNIC type belongs
to the mechanism driver, a provisioning block is added for the port
under the 'L2 Agent' entity.
Once the DHCP agent has finished setting up the reservation, it calls
dhcp_ready_on_ports via the RPC API with the port ID. The DHCP RPC
handler receives this and calls 'provisioning_complete' in the
provisioning module with the port ID and the 'DHCP' entity to remove
the provisioning block.
Once the L2 agent has finished setting up the reservation, it calls
the normal update_device_list (or update_device_up) via the RPC API.
The RPC callbacks handler calls 'provisioning_complete' with the
port ID and the 'L2 Agent' entity to remove the provisioning block.
On the 'provisioning_complete' call that removes the last record,
the provisioning_blocks module emits a callback PROVISIONING_COMPLETE
event with the port ID. A function subscribed to this in ML2 then calls
update_port_status to set the port to ACTIVE.
At this point the normal notification is emitted to Nova allowing the
VM to be unpaused.
In the event that the DHCP or L2 agent is down, the port will not
transition to the ACTIVE status (as is the case now if the L2 agent
is down). Agents must account for this by telling the server that
wiring has been completed after configuring everything during
startup. This ensures that ports created on offline agents (or agents
that crash and restart) eventually become active.
To account for server instability, the notifications about port wiring
be complete must use RPC calls so the agent gets a positive
acknowledgement from the server and it must keep retrying until either
the port is deleted or it is successful.
If an ML2 driver immediately places a bound port in the ACTIVE state
(e.g. after calling a backend in update_port_postcommit), this patch
will not have any impact on that process.
References
----------
.. [#] Provisioning Blocks Module: http://git.openstack.org/cgit/openstack/neutron/tree/neutron/db/provisioning_blocks.py