Add upgrades guideline for operators

This patch set will add upgrades guideline for operators. Change-Id: Ic61e4e11ec50c221038a007559bd67fe48a3ac65 Implements: blueprint heat-rolling-upgrades
2017-06-20 22:35:50 +07:00 · 2017-06-20 22:35:50 +07:00 · 10a1161dfa
commit 10a1161dfa
parent cc4fdcef05
2 changed files with 159 additions and 0 deletions
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -89,6 +89,7 @@ Operating Heat
    getting_started/on_ubuntu
    operating_guides/scale_deployment
    operating_guides/httpd
+    operating_guides/upgrades_guide
    man/index

 Developing Heat
--- a/doc/source/operating_guides/upgrades_guide.rst
+++ b/doc/source/operating_guides/upgrades_guide.rst
@ -0,0 +1,158 @@
+..
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
+      not use this file except in compliance with the License. You may obtain
+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+==================
+Upgrades Guideline
+==================
+
+This document outlines several steps and notes for operators to reference when
+upgrading their heat from previous versions of OpenStack.
+
+.. note::
+
+  This document is only tested in the case of upgrading between sequential
+  releases.
+
+
+Plan to upgrade
+===============
+
+* Read and ensure you understand the `release notes
+  <https://docs.openstack.org/releasenotes/heat/>`_ for the next release.
+
+* Make a backup of your database.
+
+* Upgrades are only supported one series at a time, or within a series.
+
+Cold Upgrades
+=============
+
+Heat already supports "cold-upgrades" [1]_, where the heat services have to be
+down during the upgrade. For time-consuming upgrades, it may be unacceptable
+for the services to be unavailable for a long period of time. This type of
+upgrade is quite simple, follow the bellow steps:
+
+1. Stop all heat-api and heat-engine services.
+
+2. Uninstall old code.
+
+3. Install new code.
+
+4. Update configurations.
+
+5. Do Database sync (most time-consuming step)
+
+6. Start all heat-api and heat-engine services.
+
+Rolling Upgrades
+================
+
+.. note::
+
+  Rolling Upgrade is supported since Pike, which means operators can rolling
+  upgrade Heat services from Ocata to Pike release with minimal downtime.
+
+A rolling upgrade would provide a better experience for the users and
+operators of the cloud. A rolling upgrade would allow individual heat-api and
+heat-engine services to be upgraded one at a time, with the rest of the
+services still available. This upgrade would have minimal downtime. Please
+check specs about rolling upgrades [2]_.
+
+Prerequisites
+-------------
+
+* Multiple Heat nodes.
+
+* A load balancer or some other type of redirection device is being used in
+  front of nodes that run heat-api services in such a way that a node can be
+  dropped out of rotation. That node continues running the Heat services
+  (heat-api or heat-engine) but is no longer having requests routed to it.
+
+Procedure
+---------
+
+These following steps are the process to upgrade Heat with minimal downtime:
+
+1. Install the code for the next version of Heat either in a virtual
+   environment or a separate control plane node, including all the python
+   dependencies.
+
+2. Using the newly installed heat code, run the following command to sync the
+   database up to the most recent version. These schema change operations
+   should have minimal or no effect on performance, and should not cause any
+   operations to fail.
+
+    .. code-block:: bash
+
+      heat-manage db_sync
+
+3. At this point, new columns and tables may exist in the database. These DB
+   schema changes are done in a way that both the N and N+1 release can
+   perform operations against the same schema.
+
+4. Create a new rabbitmq vhost for the new release and change the
+   transport_url configuration in heat.conf file to be:
+
+   ``transport_url = rabbit://<user>:<password>@<host>:5672/<new_vhost>``
+
+   for all upgrade services.
+
+5. Stop heat-engine gracefully, Heat has supported graceful shutdown features
+   [2]_. Then start new heat-engine with new code (and corresponding
+   configuration).
+
+   .. note::
+
+      Remember to do Step 4, this would ensure that the existing engines
+      would not communicate with the new engine.
+
+6. A heat-api service is then upgraded and started with the new rabbitmq
+   vhost.
+
+   .. note::
+
+      The second way to do this step is switch heat-api service to use new
+      vhost first (but remember not to shut down heat-api) and upgrade it.
+
+7. The above process can be followed till all heat-api and heat-engine
+   services are upgraded.
+
+   .. note::
+
+      Make sure that all heat-api services has been upgraded before you
+      start to upgrade the last heat-engine service.
+
+   .. warning::
+
+      With the convergence architecture, whenever a resource completes the
+      engine will send RPC messages to another (or the same) engine to start
+      work on the next resource(s) to be processed. If the last engine is
+      going to be shut down gracefully, it will finish what it is working on,
+      which may post more messages to queues. It means the graceful shutdown
+      does not wait for queues to drain. The shutdown leaves some messages
+      unprocessed and any IN_PROGRESS stacks would get stuck without any
+      forward progress. The operator must be careful when shutting down the
+      last engine, make sure queues have no unprocessed messages before
+      doing it. The operator can check the queues directly with RabbitMQ's
+      management plugin [3].
+
+8. Once all services are upgraded, double check the DB and services
+
+References
+==========
+
+.. [1] https://governance.openstack.org/tc/reference/tags/assert_supports-upgrade.html
+
+.. [2] https://review.openstack.org/#/c/407989/
+
+.. [3] http://www.rabbitmq.com/management.html