diff --git a/doc/source/devref/replication.rst b/doc/source/devref/replication.rst index 1b078c993a1..f9420bb0d18 100644 --- a/doc/source/devref/replication.rst +++ b/doc/source/devref/replication.rst @@ -1,64 +1,88 @@ Replication -============ +=========== -How to implement replication features in a backend driver. +For backend devices that offer replication features, Cinder provides a common +mechanism for exposing that functionality on a per volume basis while still +trying to allow flexibility for the varying implementation and requirements of +all the different backend devices. -For backend devices that offer replication features, Cinder -provides a common mechanism for exposing that functionality -on a volume per volume basis while still trying to allow -flexibility for the varying implementation and requirements -of all the different backend devices. +There are 2 sides to Cinder's replication feature, the core mechanism and the +driver specific functionality, and in this document we'll only be covering the +driver side of things aimed at helping vendors implement this functionality in +their drivers in a way consistent with all other drivers. -Most of the configuration is done via the cinder.conf file -under the driver section and through the use of volume types. +Although we'll be focusing on the driver implementation there will also be some +mentions on deployment configurations to provide a clear picture to developers +and help them avoid implementing custom solutions to solve things that were +meant to be done via the cloud configuration. -NOTE: -This implementation is intended to solve a specific use case. -It's critical that you read the Use Cases section of the spec -here: +Overview +-------- + +As a general rule replication is enabled and configured via the cinder.conf +file under the driver's section, and volume replication is requested through +the use of volume types. + +*NOTE*: Current replication implementation is v2.1 and it's meant to solve a +very specific use case, the "smoking hole" scenario. It's critical that you +read the Use Cases section of the spec here: https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cheesecake.html -Config file examples --------------------- +From a user's perspective volumes will be created using specific volume types, +even if it is the default volume type, and they will either be replicated or +not, which will be reflected on the ``replication_status`` field of the volume. +So in order to know if a snapshot is replicated we'll have to check its volume. -The cinder.conf file is used to specify replication config info -for a specific driver. There is no concept of managed vs unmanaged, -ALL replication configurations are expected to work by using the same -driver. In other words, rather than trying to perform any magic -by changing host entries in the DB for a Volume etc, all replication -targets are considered "unmanaged" BUT if a failover is issued, it's -the drivers responsibility to access replication volumes on the replicated -backend device. +After the loss of the primary storage site all operations on the resources will +fail and VMs will no longer have access to the data. It is then when the Cloud +Administrator will issue the ``failover-host`` command to make the +cinder-volume service perform the failover. -This results in no changes for the end-user. For example, He/She can -still issue an attach call to a replicated volume that has been failed -over, and the driver will still receive the call BUT the driver will -need to figure out if it needs to redirect the call to the a different -backend than the default or not. +After the failover is completed, the Cinder volume service will start using the +failed-over secondary storage site for all operations and the user will once +again be able to perform actions on all resources that were replicated, while +all other resources will be in error status since they are no longer available. -Information regarding if the backend is in a failed over state should -be stored in the driver, and in the case of a restart, the service -entry in the DB will have the replication status info and pass it -in during init to allow the driver to be set in the correct state. +Storage Device configuration +---------------------------- -In the case of a failover event, and a volume was NOT of type -replicated, that volume will now be UNAVAILABLE and any calls -to access that volume should return a VolumeNotFound exception. +Most storage devices will require configuration changes to enable the +replication functionality, and this configuration process is vendor and storage +device specific so it is not contemplated by the Cinder core replication +functionality. -**replication_device** +It is up to the vendors whether they want to handle this device configuration +in the Cinder driver or as a manual process, but the most common approach is to +avoid including this configuration logic into Cinder and having the Cloud +Administrators do a manual process following a specific guide to enable +replication on the storage device before configuring the cinder volume service. -Is a multi-dict opt, that should be specified -for each replication target device the admin would -like to configure. +Service configuration +--------------------- -*NOTE:* +The way to enable and configure replication is common to all drivers and it is +done via the ``replication_device`` configuration option that goes in the +driver's specific section in the ``cinder.conf`` configuration file. -There is one standardized and REQUIRED key in the config -entry, all others are vendor-unique: +``replication_device`` is a multi dictionary option, that should be specified +for each replication target device the admin wants to configure. -* backend_id: +While it is true that all drivers use the same ``replication_device`` +configuration option this doesn't mean that they will all have the same data, +as there is only one standardized and **REQUIRED** key in the configuration +entry, all others are vendor specific: -An example driver config for a device with multiple replication targets +- backend_id: + +Values of ``backend_id`` keys are used to uniquely identify within the driver +each of the secondary sites, although they can be reused on different driver +sections. + +These unique identifiers will be used by the failover mechanism as well as in +the driver initialization process, and the only requirement is that is must +never have the value "default". + +An example driver configuration for a device with multiple replication targets is show below:: ..... @@ -76,95 +100,376 @@ is show below:: replication_device = backend_id:vendor-id-1,unique_key:val.... replication_device = backend_id:vendor-id-2,unique_key:val.... -In this example the result is self.configuration.get('replication_device) with the list:: +In this example the result of calling +``self.configuration.safe_get('replication_device)`` within the driver is the +following list:: [{backend_id: vendor-id-1, unique_key: val1}, - {backend_id: vendor-id-2, unique_key: val1}] + {backend_id: vendor-id-2, unique_key: val2}] +It is expected that if a driver is configured with multiple replication +targets, that replicated volumes are actually replicated on **all targets**. - -Volume Types / Extra Specs ---------------------------- -In order for a user to specify they'd like a replicated volume, there needs to be -a corresponding Volume Type created by the Cloud Administrator. - -There's a good deal of flexibility by using volume types. The scheduler can -send the create request to a backend that provides replication by simply -providing the replication=enabled key to the extra-specs of the volume type. - -For example, if the type was set to simply create the volume on any (or if you only had one) -backend that supports replication, the extra-specs entry would be:: - - {replication: enabled} - -Additionally you could provide additional details using scoped keys:: - - {replication: enabled, replication_type: async, replication_count: 2, - replication_targets: [fake_id1, fake_id2]} - -It's up to the driver to parse the volume type info on create and set things up -as requested. While the scoping key can be anything, it's strongly recommended that all -backends utilize the same key (replication) for consistency and to make things easier for -the Cloud Administrator. - -Additionally it's expected that if a backend is configured with 3 replication -targets, that if a volume of type replication=enabled is issued against that -backend then it will replicate to ALL THREE of the configured targets. +Besides specific replication device keys defined in the ``replication_device``, +a driver may also have additional normal configuration options in the driver +section related with the replication to allow Cloud Administrators to configure +things like timeouts. Capabilities reporting ---------------------- -The following entries are expected to be added to the stats/capabilities update for -replication configured devices:: + +There are 2 new replication stats/capability keys that drivers supporting +relication v2.1 should be reporting: ``replication_enabled`` and +``replication_targets``:: stats["replication_enabled"] = True|False stats["replication_targets"] = [...] -NOTICE, we report configured replication targets via volume stats_update -This information is added to the get_capabilities admin call. +If a driver is behaving correctly we can expect the ``replication_targets`` +field to be populated whenever ``replication_enabled`` is set to ``True``, and +it is expected to either be set to ``[]`` or be missing altogether when +``replication_enabled`` is set to ``False``. -Required methods ------------------ -The number of API methods associated with replication is intentionally very limited, +The purpose of the ``replication_enabled`` field is to be used by the scheduler +in volume types for creation and migrations. -Admin only methods. +As for the ``replication_targets`` field it is only provided for informational +purposes so it can be retrieved through the ``get_capabilities`` using the +admin REST API, but it will not be used for validation at the API layer. That +way Cloud Administrators will be able to know available secondary sites where +they can failover. -They include:: +Volume Types / Extra Specs +--------------------------- - replication_failover(self, context, volumes) +The way to control the creation of volumes on a cloud with backends that have +replication enabled is, like with many other features, through the use of +volume types. -Additionally we have freeze/thaw methods that will act on the scheduler -but may or may not require something from the driver:: +We won't go into the details of volume type creation, but suffice to say that +you will most likely want to use volume types to discriminate between +replicated and non replicated volumes and be explicit about it so that non +replicated volumes won't end up in a replicated backend. + +Since the driver is reporting the ``replication_enabled`` key, we just need to +require it for replication volume types adding ``replication_enabled=' +True``` and also specifying it for all non replicated volume types +``replication_enabled=' False'``. + +It's up to the driver to parse the volume type info on create and set things up +as requested. While the scoping key can be anything, it's strongly recommended +that all backends utilize the same key (replication) for consistency and to +make things easier for the Cloud Administrator. + +Additional replication parameters can be supplied to the driver using vendor +specific properties through the volume type's extra-specs so they can be used +by the driver at volume creation time, or retype. + +It is up to the driver to parse the volume type info on create and retype to +set things up as requested. A good pattern to get a custom parameter from a +given volume instance is this:: + + extra_specs = getattr(volume.volume_type, 'extra_specs', {}) + custom_param = extra_specs.get('custom_param', 'default_value') + +It may seem convoluted, but we must be careful when retrieving the +``extra_specs`` from the ``volume_type`` field as it could be ``None``. + +Vendors should try to avoid obfuscating their custom properties and expose them +using the ``_init_vendor_properties`` method so they can be checked by the +Cloud Administrator using the ``get_capabilities`` REST API. + +*NOTE*: For storage devices doing per backend/pool replication the use of +volume types is also recommended. + +Volume creation +--------------- + +Drivers are expected to honor the replication parameters set in the volume type +during creation, retyping, or migration. + +When implementing the replication feature there are some driver methods that +will most likely need modifications -if they are implemented in the driver +(since some are optional)- to make sure that the backend is replicating volumes +that need to be replicated and not replicating those that don't need to be: + +- ``create_volume`` +- ``create_volume_from_snapshot`` +- ``create_cloned_volume`` +- ``retype`` +- ``clone_image`` +- ``migrate_volume`` + +In these methods the driver will have to check the volume type to see if the +volumes need to be replicated, we could use the same pattern described in the +`Volume Types / Extra Specs`_ section:: + + def _is_replicated(self, volume): + specs = getattr(volume.volume_type, 'extra_specs', {}) + return specs.get('replication_enabled') == ' True' + +But it is **not** the recommended mechanism, and the ``is_replicated`` method +available in volumes and volume types versioned objects instances should be +used instead. + +Drivers are expected to keep the ``replication_status`` field up to date and in +sync with reality, usually as specified in the volume type. To do so in above +mentioned methods' implementation they should use the update model mechanism +provided for each one of those methods. One must be careful since the update +mechanism may be different from one method to another. + +What this means is that most of these methods should be returning a +``replication_status`` key with the value set to ``enabled`` in the model +update dictionary if the volume type is enabling replication. There is no need +to return the key with the value of ``disabled`` if it is not enabled since +that is the default value. + +In the case of the ``create_volume``, and ``retype`` method there is no need to +return the ``replication_status`` in the model update since it has already been +set by the scheduler on creation using the extra spec from the volume type. And +on ``migrate_volume`` there is no need either since there is no change to the +``replication_status``. + +*NOTE*: For storage devices doing per backend/pool replication it is not +necessary to check the volume type for the ``replication_enabled`` key since +all created volumes will be replicated, but they are expected to return the +``replication_status`` in all those methods, including the ``create_volume`` +method since the driver may receive a volume creation request without the +replication enabled extra spec and therefore the driver will not have set the +right ``replication_status`` and the driver needs to correct this. + +Besides the ``replication_status`` field that drivers need to update there are +other fields in the database related to the replication mechanism that the +drivers can use: + +- ``replication_extended_status`` +- ``replication_driver_data`` + +These fields are string type fields with a maximum size of 255 characters and +they are available for drivers to use internally as they see fit for their +normal replication operation. So they can be assigned in the model update and +later on used by the driver, for example during the failover. + +To avoid using magic strings drivers must use values defined by the +``ReplicationsSatus`` class in ``cinder/objects/fields.py`` file and +these are: + +- ``ERROR``: When setting the replication failed on creation, retype, or + migrate. This should be accompanied by the volume status ``error``. +- ``ENABLED``: When the volume is being replicated. +- ``DISABLED``: When the volume is not being replicated. +- ``FAILED_OVER``: After a volume has been successfully failed over. +- ``FAILOVER_ERROR``: When there was an error during the failover of this + volume. +- ``NOT_CAPABLE``: When we failed-over but the volume was not replicated. + +The first 3 statuses revolve around the volume creation and the last 3 around +the failover mechanism. + +The only status that should not be used for the volume's ``replication_status`` +is the ``FAILING_OVER`` status. + +Whenever we are referring to values of the ``replication_status`` in this +document we will be referring to the ``ReplicationStatus`` attributes and not a +literal string, so ``ERROR`` means +``cinder.objects.field.ReplicationStatus.ERROR`` and not the string "ERROR". + +Failover +-------- + +This is the mechanism used to instruct the cinder volume service to fail over +to a secondary/target device. + +Keep in mind the use case is that the primary backend has died a horrible death +and is no longer valid, so any volumes that were on the primary and were not +being replicated will no longer be available. + +The method definition required from the driver to implement the failback +mechanism is as follows:: + + def failover_host(self, context, volumes, secondary_id=None): + +There are several things that are expected of this method: + +- Promotion of a secondary storage device to primary +- Generating the model updates +- Changing internally to access the secondary storage device for all future + requests. + +If no secondary storage device is provided to the driver via the ``backend_id`` +argument (it is equal to ``None``), then it is up to the driver to choose which +storage device to failover to. In this regard it is important that the driver +takes into consideration that it could be failing over from a secondary (there +was a prior failover request), so it should discard current target from the +selection. + +If the ``secondary_id`` is not a valid one the driver is expected to raise +``InvalidReplicationTarget``, for any other non recoverable errors during a +failover the driver should raise ``UnableToFailOver`` or any child of +``VolumeDriverException`` class and revert to a state where the previous +backend is in use. + +The failover method in the driver will receive a list of replicated volumes +that need to be failed over. Replicated volumes passed to the driver may have +diverse ``replication_status`` values, but they will always be one of: +``ENABLED``, ``FAILED_OVER``, or ``FAILOVER_ERROR``. + +The driver must return a 2-tuple with the new storage device target id as the +first element and a list of dictionaries with the model updates required for +the volumes so that the driver can perform future actions on those volumes now +that they need to be accessed on a different location. + +It's not a requirement for the driver to return model updates for all the +volumes, or for any for that matter as it can return ``None`` or an empty list +if there's no update necessary. But if elements are returned in the model +update list then it is a requirement that each of the dictionaries contains 2 +key-value pairs, ``volume_id`` and ``updates`` like this:: + + [{ + 'volume_id': volumes[0].id, + 'updates': { + 'provider_id': new_provider_id1, + ... + }, + 'volume_id': volumes[1].id, + 'updates': { + 'provider_id': new_provider_id2, + 'replication_status': fields.ReplicationStatus.FAILOVER_ERROR, + ... + }, + }] + +In these updates there is no need to set the ``replication_status`` to +``FAILED_OVER`` if the failover was successful, as this will be performed by +the manager by default, but it won't create additional DB queries if it is +returned. It is however necessary to set it to ``FAILOVER_ERROR`` for those +volumes that had errors during the failover. + +Driver's don't have to worry about snapshots or non replicated volumes, since +the manager will take care of those in the following manner: + +- All non replicated volumes will have their current ``status`` field saved in + the ``previous_status`` field, the ``status`` field changed to ``error``, and + their ``replication_status`` set to ``NOT_CAPABLE``. +- All snapshots from non replicated volumes will have their statuses changed to + ``error``. +- All replicated volumes that failed on the failover will get their ``status`` + changed to ``error``, their current ``status`` preserved in + ``previous_status``, and their ``replication_status`` set to + ``FAILOVER_ERROR`` . +- All snapshots from volumes that had errors during the failover will have + their statuses set to ``error``. + +Any model update request from the driver that changes the ``status`` field will +trigger a change in the ``previous_status`` field to preserve the current +status. + +Once the failover is completed the driver should be pointing to the secondary +and should be able to create and destroy volumes and snapshots as usual, and it +is left to the Cloud Administrator's discretion whether resource modifying +operations are allowed or not. + +Failback +-------- + +Drivers are not required to support failback, but they are required to raise a +``InvalidReplicationTarget`` exception if the failback is requested but not +supported. + +The way to request the failback is quite simple, the driver will receive the +argument ``secondary_id`` with the value of ``default``. That is why if was +forbidden to use the ``default`` on the target configuration in the cinder +configuration file. + +Expected driver behavior is the same as the one explained in the `Failover`_ +section: + +- Promotion of the original primary to primary +- Generating the model updates +- Changing internally to access the original primary storage device for all + future requests. + +If the failback of any of the volumes fail the driver must return +``replication_status`` set to ``ERROR`` in the volume updates for those +volumes. If they succeed it is not necessary to change the +``replication_status`` since the default behavior will be to set them to +``ENABLED``, but it won't create additional DB queries if it is set. + +The manager will update resources in a slightly different way than in the +failover case: + +- All non replicated volumes will not have any model modifications. +- All snapshots from non replicated volumes will not have any model + modifications. +- All replicated volumes that failed on the failback will get their ``status`` + changed to ``error``, have their current ``status`` preserved in the + ``previous_status`` field, and their ``replication_status`` set to + ``FAILOVER_ERROR``. +- All snapshots from volumes that had errors during the failover will have + their statuses set to ``error``. + +We can avoid using the "default" magic string by using the +``FAILBACK_SENTINEL`` class attribute from the ``VolumeManager`` class. + +Initialization +-------------- + +It stands to reason that a failed over Cinder volume service may be restarted, +so there needs to be a way for a driver to know on start which storage device +should be used to access the resources. + +So, to let drivers know which storage device they should use the manager passes +drivers the ``active_backend_id`` argument to their ``__init__`` method during +the initialization phase of the driver. Default value is ``None`` when the +default (primary) storage device should be used. + +Drivers should store this value if they will need it, as the base driver is not +storing it, for example to determine the current storage device when a failover +is requested and we are already in a failover state, as mentioned above. + +Freeze / Thaw +------------- + +In many cases, after a failover has been completed we'll want to allow changes +to the data in the volumes as well as some operations like attach and detach +while other operations that modify the number of existing resources, like +delete or create, are not allowed. + +And that is where the freezing mechanism comes in; freezing a backend puts the +control plane of the specific Cinder volume service into a read only state, or +at least most of it, while allowing the data plane to proceed as usual. + +While this will mostly be handled by the Cinder core code, drivers are informed +when the freezing mechanism is enabled or disabled via these 2 calls:: freeze_backend(self, context) thaw_backend(self, context) -**replication_failover** +In most cases the driver may not need to do anything, and then it doesn't need +to define any of these methods as long as its a child class of the ``BaseVD`` +class that already implements them as noops. -Used to instruct the backend to fail over to the secondary/target device. -If not secondary is specified (via backend_id argument) it's up to the driver -to choose which device to failover to. In the case of only a single -replication target this argument should be ignored. +Raising a `VolumeDriverException` exception in any of these methods will result +in a 500 status code response being returned to the caller and the manager will +not log the exception, so it's up to the driver to log the error if it is +appropriate. -Note that ideally drivers will know how to update the volume reference properly so that Cinder is now -pointing to the secondary. Also, while it's not required, at this time; ideally the command would -act as a toggle, allowing to switch back and forth between primary and secondary and back to primary. +If the driver wants to give a more meaningful error response, then it can raise +other exceptions that have different status codes. -Keep in mind the use case is that the backend has died a horrible death and is -no longer valid. Any volumes that were on the primary and NOT of replication -type should now be unavailable. +When creating the `freeze_backend` and `thaw_backend` driver methods we must +remember that this is a Cloud Administrator operation, so we can return errors +that reveal internals of the cloud, for example the type of storage device, and +we must use the appropriate internationalization translation methods when +raising exceptions; for `VolumeDriverException` no translation is necessary +since the manager doesn't log it or return to the user in any way, but any +other exception should use the ``_()`` translation method since it will be +returned to the REST API caller. -NOTE: We do not expect things like create requests to go to the driver and -magically create volumes on the replication target. The concept is that the -backend is lost, and we're just providing a DR mechanism to preserve user data -for volumes that were specified as such via type settings. +For example, if a storage device doesn't support the thaw operation when failed +over, then it should raise an `Invalid` exception:: -**freeze_backend** - -Puts a backend host/service into a R/O state for the control plane. For -example if a failover is issued, it is likely desirable that while data access -to existing volumes is maintained, it likely would not be wise to continue -doing things like creates, deletes, extends etc. - -**thaw_backend** - -Clear frozen control plane on a backend. + def thaw_backend(self, context): + if self.failed_over: + msg = _('Thaw is not supported by driver XYZ.') + raise exception.Invalid(msg)