From e80648a0d0eba210ae70abe3837a3d411d876ad5 Mon Sep 17 00:00:00 2001 From: Ihar Hrachyshka Date: Wed, 4 Nov 2015 17:53:16 +0100 Subject: [PATCH] devref: add upgrade strategy page The page is intended to describe current upgrade features Neutron supports, lay out potential improvements, describe testing strategy for existing and planned upgrade features, and provide guidelines to reviewers on where to look for potential upgrade breakages in proposed patches. Change-Id: I22e55bb2fe32b32d12fa5889b91ecb9f92b3e6a6 --- doc/source/devref/alembic_migrations.rst | 1 + doc/source/devref/effective_neutron.rst | 5 +- doc/source/devref/index.rst | 1 + doc/source/devref/rpc_api.rst | 2 + doc/source/devref/rpc_callbacks.rst | 2 + doc/source/devref/upgrade.rst | 250 +++++++++++++++++++++++ 6 files changed, 257 insertions(+), 4 deletions(-) create mode 100644 doc/source/devref/upgrade.rst diff --git a/doc/source/devref/alembic_migrations.rst b/doc/source/devref/alembic_migrations.rst index 8762b689e69..09a998cfc5d 100644 --- a/doc/source/devref/alembic_migrations.rst +++ b/doc/source/devref/alembic_migrations.rst @@ -20,6 +20,7 @@ ''''''' Heading 4 (Avoid deeper levels because they do not render well.) +.. _alembic_migrations: Alembic Migrations ================== diff --git a/doc/source/devref/effective_neutron.rst b/doc/source/devref/effective_neutron.rst index cf72dd29586..bbe918189ec 100644 --- a/doc/source/devref/effective_neutron.rst +++ b/doc/source/devref/effective_neutron.rst @@ -182,10 +182,7 @@ Backward compatibility Document common pitfalls as well as good practices done when extending the RPC Interfaces. -* The Neutron upgrade path requires the server to support the previous version of - the agent. Any changes to the existing RPC methods must be compatible with the - previous version of the agent. Otherwise a version bump is required and the old - method must be kept under the previous version RPC endpoint. +* Make yourself familiar with :ref:`Upgrade review guidelines `. Scalability issues diff --git a/doc/source/devref/index.rst b/doc/source/devref/index.rst index 8636770f6f6..02c7e29cfbe 100644 --- a/doc/source/devref/index.rst +++ b/doc/source/devref/index.rst @@ -68,6 +68,7 @@ Neutron Internals oslo-incubator callbacks dns_order + upgrade Testing ------- diff --git a/doc/source/devref/rpc_api.rst b/doc/source/devref/rpc_api.rst index 1a5fca034b7..6c6438802c3 100644 --- a/doc/source/devref/rpc_api.rst +++ b/doc/source/devref/rpc_api.rst @@ -95,6 +95,8 @@ This class implements the server side of the interface. The oslo_messaging.Target() defined says that this class currently implements version 1.1 of the interface. +.. _rpc_versioning: + Versioning ========== diff --git a/doc/source/devref/rpc_callbacks.rst b/doc/source/devref/rpc_callbacks.rst index 97ca772aec4..808c60b2dca 100644 --- a/doc/source/devref/rpc_callbacks.rst +++ b/doc/source/devref/rpc_callbacks.rst @@ -21,6 +21,8 @@ (Avoid deeper levels because they do not render well.) +.. _rpc_callbacks: + Neutron Messaging Callback System ================================= diff --git a/doc/source/devref/upgrade.rst b/doc/source/devref/upgrade.rst new file mode 100644 index 00000000000..21808917f5b --- /dev/null +++ b/doc/source/devref/upgrade.rst @@ -0,0 +1,250 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + + Convention for heading levels in Neutron devref: + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + (Avoid deeper levels because they do not render well.) + +.. note:: + + Much of this document discusses upgrade considerations for the Neutron + reference implementation using Neutron's agents. It's expected that each + Neutron plugin provides its own documentation that discusses upgrade + considerations specific to that choice of backend. For example, OVN does + not use Neutron agents, but does have a local controller that runs on each + compute node. OVN supports rolling upgrades, but information about how that + works should be covered in the documentation for networking-ovn, the OVN + Neutron plugin. + +Upgrade strategy +================ + +There are two general upgrade scenarios supported by Neutron: + +#. All services are shut down, code upgraded, then all services are started again. +#. Services are upgraded gradually, based on operator service windows. + +The latter is the preferred way to upgrade an OpenStack cloud, since it allows +for more granularity and less service downtime. This scenario is usually called +'rolling upgrade'. + +Rolling upgrade +--------------- + +Rolling upgrades imply that during some interval of time there will be services +of different code versions running and interacting in the same cloud. It puts +multiple constraints onto the software. + +#. older services should be able to talk with newer services. +#. older services should not require the database to have older schema + (otherwise newer services that require the newer schema would not work). + +`More info on rolling upgrades in OpenStack +`_. + +Those requirements are achieved in Neutron by: + +#. If the Neutron backend makes use of Neutron agents, the Neutron server have + backwards compatibility code to deal with older messaging payloads. +#. isolating a single service that accesses database (neutron-server). + +To simplify the matter, it's always assumed that the order of service upgrades +is as following: + +#. first, all neutron-servers are upgraded. +#. then, if applicable, neutron agents are upgraded. + +This approach allows us to avoid backwards compatibility code on agent side and +is in line with other OpenStack projects that support rolling upgrades +(specifically, nova). + +Server upgrade +~~~~~~~~~~~~~~ + +Neutron-server is the very first component that should be upgraded to the new +code. It's also the only component that relies on new database schema to be +present, other components communicate with the cloud through AMQP and hence do +not depend on particular database state. + +Database upgrades are implemented with alembic migration chains. + +Database upgrade is split into two parts: + +#. neutron-db-manage upgrade --expand +#. neutron-db-manage upgrade --contract + +Each part represents a separate alembic branch. + +:ref:`More info on alembic scripts `. + +The former step can be executed while old neutron-server code is running. The +latter step requires *all* neutron-server instances to be shut down. Once it's +complete, neutron-servers can be started again. + +Agents upgrade +~~~~~~~~~~~~~~ + +.. note:: + + This section does not apply when the cloud does not use AMQP agents to + provide networking services to instances. In that case, other backend + specific upgrade instructions may also apply. + +Once neutron-server services are restarted with the new database schema and the +new code, it's time to upgrade Neutron agents. + +Note that in the meantime, neutron-server should be able to serve AMQP messages +sent by older versions of agents which are part of the cloud. + +The recommended order of agent upgrade (per node) is: + +#. first, L2 agents (openvswitch, linuxbridge, sr-iov). +#. then, all other agents (L3, DHCP, Metadata, ...). + +The rationale of the agent upgrade order is that L2 agent is usually +responsible for wiring ports for other agents to use, so it's better to allow +it to do its job first and then proceed with other agents that will use the +already configured ports for their needs. + +Each network/compute node can have its own upgrade schedule that is independent +of other nodes. + +AMQP considerations ++++++++++++++++++++ + +Since it's always assumed that neutron-server component is upgraded before +agents, only the former should handle both old and new RPC versions. + +The implication of that is that no code that handles UnsupportedVersion +oslo.messaging exceptions belongs to agent code. + +:ref:`More information about RPC versioning `. + +Interface signature +''''''''''''''''''' + +An RPC interface is defined by its name, version, and (named) arguments that +it accepts. There are no strict guarantees that arguments will have expected +types or meaning, as long as they are serializable. + +Message content versioning +'''''''''''''''''''''''''' + +To provide better compatibility guarantees for rolling upgrades, RPC interfaces +could also define specific format for arguments they accept. In OpenStack +world, it's usually implemented using oslo.versionedobjects library, and +relying on the library to define serialized form for arguments that are passed +thru AMQP wire. + +Note that Neutron has *not* adopted oslo.versionedobjects library for its RPC +interfaces yet (except for QoS feature). + +:ref:`More information about RPC callbacks used for QoS `. + +Networking backends +~~~~~~~~~~~~~~~~~~~ + +Backend software upgrade should not result in any data plane disruptions. +Meaning, e.g. Open vSwitch L2 agent should not reset flows or rewire ports; +Neutron L3 agent should not delete namespaces left by older version of the +agent; Neutron DHCP agent should not require immediate DHCP lease renewal; etc. + +The same considerations apply to setups that do not rely on agents. Meaning, +f.e. OpenDaylight or OVN controller should not break data plane connectivity +during its upgrade process. + +Upgrade testing +--------------- + +`Grenade `_ is the OpenStack project +that is designed to validate upgrade scenarios. + +Currently, only offline (non-rolling) upgrade scenario is validated in Neutron +gate. The upgrade scenario follows the following steps: + +#. the 'old' cloud is set up using latest stable release code +#. all services are stopped +#. code is updated to the patch under review +#. new database migration scripts are applied, if needed +#. all services are started +#. the 'new' cloud is validated with a subset of tempest tests + +The scenario validates that no configuration option names are changed in one +cycle. More generally, it validates that the 'new' cloud is capable of running +using the 'old' configuration files. It also validates that database migration +scripts can be executed. + +The scenario does *not* validate AMQP versioning compatibility. + +Other projects (for example Nova) have so called 'partial' grenade jobs where +some services are left running using the old version of code. Such a job would +be needed in Neutron gate to validate rolling upgrades for the project. Till +that time, it's all up to reviewers to catch compatibility issues in patches on +review. + +Another hole in testing belongs to split migration script branches. It's +assumed that an 'old' cloud can successfully run after 'expand' migration +scripts from the 'new' cloud are applied to its database; but it's not +validated in gate. + +.. _upgrade_review_guidelines: + +Review guidelines +----------------- + +There are several upgrade related gotchas that should be tracked by reviewers. + +First things first, a general advice to reviewers: make sure new code does not +violate requirements set by `global OpenStack deprecation policy +`_. + +Now to specifics: + +#. Configuration options: + + * options should not be dropped from the tree without waiting for + deprecation period (currently it's one development cycle long) and a + deprecation message issued if the deprecated option is used. + * option values should not change their meaning between releases. + +#. Data plane: + + * agent restart should not result in data plane disruption (no Open vSwitch + ports reset; no network namespaces deleted; no device names changed). + +#. RPC versioning: + + * no RPC version major number should be bumped before all agents had a + chance to upgrade (meaning, at least one release cycle is needed before + compatibility code to handle old clients is stripped from the tree). + * no compatibility code should be added to agent side of AMQP interfaces. + * server code should be able to handle all previous versions of agents, + unless the major version of an interface is bumped. + * no RPC interface arguments should change their meaning, or names. + * new arguments added to RPC interfaces should not be mandatory. It means + that server should be able to handle old requests, without the new + argument specified. Also, if the argument is not passed, the old behaviour + before the addition of the argument should be retained. + +#. Database migrations: + + * migration code should be split into two branches (contract, expand) as + needed. No code that is unsafe to execute while neutron-server is running + should be added to expand branch. + * if possible, contract migrations should be minimized or avoided to reduce + the time when API endpoints must be down during database upgrade.