Merge "devref: add upgrade strategy page"
This commit is contained in:
commit
c5cf4f54be
@ -20,6 +20,7 @@
|
||||
''''''' Heading 4
|
||||
(Avoid deeper levels because they do not render well.)
|
||||
|
||||
.. _alembic_migrations:
|
||||
|
||||
Alembic Migrations
|
||||
==================
|
||||
|
@ -182,10 +182,7 @@ Backward compatibility
|
||||
|
||||
Document common pitfalls as well as good practices done when extending the RPC Interfaces.
|
||||
|
||||
* The Neutron upgrade path requires the server to support the previous version of
|
||||
the agent. Any changes to the existing RPC methods must be compatible with the
|
||||
previous version of the agent. Otherwise a version bump is required and the old
|
||||
method must be kept under the previous version RPC endpoint.
|
||||
* Make yourself familiar with :ref:`Upgrade review guidelines <upgrade_review_guidelines>`.
|
||||
|
||||
|
||||
Scalability issues
|
||||
|
@ -69,6 +69,7 @@ Neutron Internals
|
||||
oslo-incubator
|
||||
callbacks
|
||||
dns_order
|
||||
upgrade
|
||||
|
||||
Testing
|
||||
-------
|
||||
|
@ -95,6 +95,8 @@ This class implements the server side of the interface. The
|
||||
oslo_messaging.Target() defined says that this class currently implements
|
||||
version 1.1 of the interface.
|
||||
|
||||
.. _rpc_versioning:
|
||||
|
||||
Versioning
|
||||
----------
|
||||
|
||||
|
@ -21,6 +21,8 @@
|
||||
(Avoid deeper levels because they do not render well.)
|
||||
|
||||
|
||||
.. _rpc_callbacks:
|
||||
|
||||
Neutron Messaging Callback System
|
||||
=================================
|
||||
|
||||
|
250
doc/source/devref/upgrade.rst
Normal file
250
doc/source/devref/upgrade.rst
Normal file
@ -0,0 +1,250 @@
|
||||
..
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
not use this file except in compliance with the License. You may obtain
|
||||
a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
License for the specific language governing permissions and limitations
|
||||
under the License.
|
||||
|
||||
|
||||
Convention for heading levels in Neutron devref:
|
||||
======= Heading 0 (reserved for the title in a document)
|
||||
------- Heading 1
|
||||
~~~~~~~ Heading 2
|
||||
+++++++ Heading 3
|
||||
''''''' Heading 4
|
||||
(Avoid deeper levels because they do not render well.)
|
||||
|
||||
.. note::
|
||||
|
||||
Much of this document discusses upgrade considerations for the Neutron
|
||||
reference implementation using Neutron's agents. It's expected that each
|
||||
Neutron plugin provides its own documentation that discusses upgrade
|
||||
considerations specific to that choice of backend. For example, OVN does
|
||||
not use Neutron agents, but does have a local controller that runs on each
|
||||
compute node. OVN supports rolling upgrades, but information about how that
|
||||
works should be covered in the documentation for networking-ovn, the OVN
|
||||
Neutron plugin.
|
||||
|
||||
Upgrade strategy
|
||||
================
|
||||
|
||||
There are two general upgrade scenarios supported by Neutron:
|
||||
|
||||
#. All services are shut down, code upgraded, then all services are started again.
|
||||
#. Services are upgraded gradually, based on operator service windows.
|
||||
|
||||
The latter is the preferred way to upgrade an OpenStack cloud, since it allows
|
||||
for more granularity and less service downtime. This scenario is usually called
|
||||
'rolling upgrade'.
|
||||
|
||||
Rolling upgrade
|
||||
---------------
|
||||
|
||||
Rolling upgrades imply that during some interval of time there will be services
|
||||
of different code versions running and interacting in the same cloud. It puts
|
||||
multiple constraints onto the software.
|
||||
|
||||
#. older services should be able to talk with newer services.
|
||||
#. older services should not require the database to have older schema
|
||||
(otherwise newer services that require the newer schema would not work).
|
||||
|
||||
`More info on rolling upgrades in OpenStack
|
||||
<http://governance.openstack.org/reference/tags/assert_supports-rolling-upgrade.html>`_.
|
||||
|
||||
Those requirements are achieved in Neutron by:
|
||||
|
||||
#. If the Neutron backend makes use of Neutron agents, the Neutron server have
|
||||
backwards compatibility code to deal with older messaging payloads.
|
||||
#. isolating a single service that accesses database (neutron-server).
|
||||
|
||||
To simplify the matter, it's always assumed that the order of service upgrades
|
||||
is as following:
|
||||
|
||||
#. first, all neutron-servers are upgraded.
|
||||
#. then, if applicable, neutron agents are upgraded.
|
||||
|
||||
This approach allows us to avoid backwards compatibility code on agent side and
|
||||
is in line with other OpenStack projects that support rolling upgrades
|
||||
(specifically, nova).
|
||||
|
||||
Server upgrade
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Neutron-server is the very first component that should be upgraded to the new
|
||||
code. It's also the only component that relies on new database schema to be
|
||||
present, other components communicate with the cloud through AMQP and hence do
|
||||
not depend on particular database state.
|
||||
|
||||
Database upgrades are implemented with alembic migration chains.
|
||||
|
||||
Database upgrade is split into two parts:
|
||||
|
||||
#. neutron-db-manage upgrade --expand
|
||||
#. neutron-db-manage upgrade --contract
|
||||
|
||||
Each part represents a separate alembic branch.
|
||||
|
||||
:ref:`More info on alembic scripts <alembic_migrations>`.
|
||||
|
||||
The former step can be executed while old neutron-server code is running. The
|
||||
latter step requires *all* neutron-server instances to be shut down. Once it's
|
||||
complete, neutron-servers can be started again.
|
||||
|
||||
Agents upgrade
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. note::
|
||||
|
||||
This section does not apply when the cloud does not use AMQP agents to
|
||||
provide networking services to instances. In that case, other backend
|
||||
specific upgrade instructions may also apply.
|
||||
|
||||
Once neutron-server services are restarted with the new database schema and the
|
||||
new code, it's time to upgrade Neutron agents.
|
||||
|
||||
Note that in the meantime, neutron-server should be able to serve AMQP messages
|
||||
sent by older versions of agents which are part of the cloud.
|
||||
|
||||
The recommended order of agent upgrade (per node) is:
|
||||
|
||||
#. first, L2 agents (openvswitch, linuxbridge, sr-iov).
|
||||
#. then, all other agents (L3, DHCP, Metadata, ...).
|
||||
|
||||
The rationale of the agent upgrade order is that L2 agent is usually
|
||||
responsible for wiring ports for other agents to use, so it's better to allow
|
||||
it to do its job first and then proceed with other agents that will use the
|
||||
already configured ports for their needs.
|
||||
|
||||
Each network/compute node can have its own upgrade schedule that is independent
|
||||
of other nodes.
|
||||
|
||||
AMQP considerations
|
||||
+++++++++++++++++++
|
||||
|
||||
Since it's always assumed that neutron-server component is upgraded before
|
||||
agents, only the former should handle both old and new RPC versions.
|
||||
|
||||
The implication of that is that no code that handles UnsupportedVersion
|
||||
oslo.messaging exceptions belongs to agent code.
|
||||
|
||||
:ref:`More information about RPC versioning <rpc_versioning>`.
|
||||
|
||||
Interface signature
|
||||
'''''''''''''''''''
|
||||
|
||||
An RPC interface is defined by its name, version, and (named) arguments that
|
||||
it accepts. There are no strict guarantees that arguments will have expected
|
||||
types or meaning, as long as they are serializable.
|
||||
|
||||
Message content versioning
|
||||
''''''''''''''''''''''''''
|
||||
|
||||
To provide better compatibility guarantees for rolling upgrades, RPC interfaces
|
||||
could also define specific format for arguments they accept. In OpenStack
|
||||
world, it's usually implemented using oslo.versionedobjects library, and
|
||||
relying on the library to define serialized form for arguments that are passed
|
||||
thru AMQP wire.
|
||||
|
||||
Note that Neutron has *not* adopted oslo.versionedobjects library for its RPC
|
||||
interfaces yet (except for QoS feature).
|
||||
|
||||
:ref:`More information about RPC callbacks used for QoS <rpc_callbacks>`.
|
||||
|
||||
Networking backends
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Backend software upgrade should not result in any data plane disruptions.
|
||||
Meaning, e.g. Open vSwitch L2 agent should not reset flows or rewire ports;
|
||||
Neutron L3 agent should not delete namespaces left by older version of the
|
||||
agent; Neutron DHCP agent should not require immediate DHCP lease renewal; etc.
|
||||
|
||||
The same considerations apply to setups that do not rely on agents. Meaning,
|
||||
f.e. OpenDaylight or OVN controller should not break data plane connectivity
|
||||
during its upgrade process.
|
||||
|
||||
Upgrade testing
|
||||
---------------
|
||||
|
||||
`Grenade <https://github.com/openstack-dev/grenade>`_ is the OpenStack project
|
||||
that is designed to validate upgrade scenarios.
|
||||
|
||||
Currently, only offline (non-rolling) upgrade scenario is validated in Neutron
|
||||
gate. The upgrade scenario follows the following steps:
|
||||
|
||||
#. the 'old' cloud is set up using latest stable release code
|
||||
#. all services are stopped
|
||||
#. code is updated to the patch under review
|
||||
#. new database migration scripts are applied, if needed
|
||||
#. all services are started
|
||||
#. the 'new' cloud is validated with a subset of tempest tests
|
||||
|
||||
The scenario validates that no configuration option names are changed in one
|
||||
cycle. More generally, it validates that the 'new' cloud is capable of running
|
||||
using the 'old' configuration files. It also validates that database migration
|
||||
scripts can be executed.
|
||||
|
||||
The scenario does *not* validate AMQP versioning compatibility.
|
||||
|
||||
Other projects (for example Nova) have so called 'partial' grenade jobs where
|
||||
some services are left running using the old version of code. Such a job would
|
||||
be needed in Neutron gate to validate rolling upgrades for the project. Till
|
||||
that time, it's all up to reviewers to catch compatibility issues in patches on
|
||||
review.
|
||||
|
||||
Another hole in testing belongs to split migration script branches. It's
|
||||
assumed that an 'old' cloud can successfully run after 'expand' migration
|
||||
scripts from the 'new' cloud are applied to its database; but it's not
|
||||
validated in gate.
|
||||
|
||||
.. _upgrade_review_guidelines:
|
||||
|
||||
Review guidelines
|
||||
-----------------
|
||||
|
||||
There are several upgrade related gotchas that should be tracked by reviewers.
|
||||
|
||||
First things first, a general advice to reviewers: make sure new code does not
|
||||
violate requirements set by `global OpenStack deprecation policy
|
||||
<http://governance.openstack.org/reference/tags/assert_follows-standard-deprecation.html>`_.
|
||||
|
||||
Now to specifics:
|
||||
|
||||
#. Configuration options:
|
||||
|
||||
* options should not be dropped from the tree without waiting for
|
||||
deprecation period (currently it's one development cycle long) and a
|
||||
deprecation message issued if the deprecated option is used.
|
||||
* option values should not change their meaning between releases.
|
||||
|
||||
#. Data plane:
|
||||
|
||||
* agent restart should not result in data plane disruption (no Open vSwitch
|
||||
ports reset; no network namespaces deleted; no device names changed).
|
||||
|
||||
#. RPC versioning:
|
||||
|
||||
* no RPC version major number should be bumped before all agents had a
|
||||
chance to upgrade (meaning, at least one release cycle is needed before
|
||||
compatibility code to handle old clients is stripped from the tree).
|
||||
* no compatibility code should be added to agent side of AMQP interfaces.
|
||||
* server code should be able to handle all previous versions of agents,
|
||||
unless the major version of an interface is bumped.
|
||||
* no RPC interface arguments should change their meaning, or names.
|
||||
* new arguments added to RPC interfaces should not be mandatory. It means
|
||||
that server should be able to handle old requests, without the new
|
||||
argument specified. Also, if the argument is not passed, the old behaviour
|
||||
before the addition of the argument should be retained.
|
||||
|
||||
#. Database migrations:
|
||||
|
||||
* migration code should be split into two branches (contract, expand) as
|
||||
needed. No code that is unsafe to execute while neutron-server is running
|
||||
should be added to expand branch.
|
||||
* if possible, contract migrations should be minimized or avoided to reduce
|
||||
the time when API endpoints must be down during database upgrade.
|
Loading…
Reference in New Issue
Block a user