cinder/doc/source/admin/volume-backups.rst
Andrew Bogott b661d115f5 Added documentation about backup_file_size about memory usage
I just spent a while chasing phantom OOM crashes in the backup
agent; I'm hoping this comment will help future users avoid
my problem.

Change-Id: I58ff10d212283b032cabf623021d3955111ff8f0
2022-04-14 14:21:04 -05:00

11 KiB

Back up and restore volumes and snapshots

The openstack command-line interface provides the tools for creating a volume backup. You can restore a volume from a backup as long as the backup's associated database information (or backup metadata) is intact in the Block Storage database.

Run this command to create a backup of a volume:

$ openstack volume backup create [--incremental] [--force] VOLUME

Where VOLUME is the name or ID of the volume, incremental is a flag that indicates whether an incremental backup should be performed, and force is a flag that allows or disallows backup of a volume when the volume is attached to an instance.

Without the incremental flag, a full backup is created by default. With the incremental flag, an incremental backup is created.

Without the force flag, the volume will be backed up only if its status is available. With the force flag, the volume will be backed up whether its status is available or in-use. A volume is in-use when it is attached to an instance. The backup of an in-use volume means your data is crash consistent. The force flag is False by default.

Note

The force flag is new in OpenStack Liberty.

The incremental backup is based on a parent backup which is an existing backup with the latest timestamp. The parent backup can be a full backup or an incremental backup depending on the timestamp.

Note

The first backup of a volume has to be a full backup. Attempting to do an incremental backup without any existing backups will fail. There is an is_incremental flag that indicates whether a backup is incremental when showing details on the backup. Another flag, has_dependent_backups, returned when showing backup details, will indicate whether the backup has dependent backups. If it is true, attempting to delete this backup will fail.

A new configure option backup_swift_block_size is introduced into cinder.conf for the default Swift backup driver. This is the size in bytes that changes are tracked for incremental backups. The existing backup_swift_object_size option, the size in bytes of Swift backup objects, has to be a multiple of backup_swift_block_size. The default is 32768 for backup_swift_block_size, and the default is 52428800 for backup_swift_object_size.

The configuration option backup_swift_enable_progress_timer in cinder.conf is used when backing up the volume to Object Storage back end. This option enables or disables the timer. It is enabled by default to send the periodic progress notifications to the Telemetry service.

This command also returns a backup ID. Use this backup ID when restoring the volume:

$ openstack volume backup restore BACKUP_ID VOLUME_ID

When restoring from a full backup, it is a full restore.

When restoring from an incremental backup, a list of backups is built based on the IDs of the parent backups. A full restore is performed based on the full backup first, then restore is done based on the incremental backup, laying on top of it in order.

You can view a backup list with the openstack volume backup list command. Optional arguments to clarify the status of your backups include: running --name, --status, and --volume to filter through backups by the specified name, status, or volume-id. Search with --all-projects for details of the projects associated with the listed backups.

Because volume backups are dependent on the Block Storage database, you must also back up your Block Storage database regularly to ensure data recovery.

Note

Alternatively, you can export and save the metadata of selected volume backups. Doing so precludes the need to back up the entire Block Storage database. This is useful if you need only a small subset of volumes to survive a catastrophic database failure.

If you specify a UUID encryption key when setting up the volume specifications, the backup metadata ensures that the key will remain valid when you back up and restore the volume.

For more information about how to export and import volume backup metadata, see the section called volume_backups_export_import.

By default, the swift object store is used for the backup repository.

If instead you want to use an NFS export as the backup repository, add the following configuration options to the [DEFAULT] section of the cinder.conf file and restart the Block Storage services:

backup_driver = cinder.backup.drivers.nfs
backup_share = HOST:EXPORT_PATH

For the backup_share option, replace HOST with the DNS resolvable host name or the IP address of the storage server for the NFS share, and EXPORT_PATH with the path to that share. If your environment requires that non-default mount options be specified for the share, set these as follows:

backup_mount_options = MOUNT_OPTIONS

MOUNT_OPTIONS is a comma-separated string of NFS mount options as detailed in the NFS man page.

There are several other options whose default values may be overridden as appropriate for your environment:

backup_compression_algorithm = zlib
backup_sha_block_size_bytes = 32768
backup_file_size = 1999994880

The option backup_compression_algorithm can be set to zlib, bz2, zstd or none. The value none can be a useful setting when the server providing the share for the backup repository itself performs deduplication or compression on the backup data.

The option backup_file_size must be a multiple of backup_sha_block_size_bytes. It is effectively the maximum file size to be used, given your environment, to hold backup data. Volumes larger than this will be stored in multiple files in the backup repository. backup_file_size also determines the buffer size used to produce backup files; on smaller hosts it may need to be scaled down to avoid OOM issues. The backup_sha_block_size_bytes option determines the size of blocks from the cinder volume being backed up on which digital signatures are calculated in order to enable incremental backup capability.

You also have the option of resetting the state of a backup. When creating or restoring a backup, sometimes it may get stuck in the creating or restoring states due to problems like the database or rabbitmq being down. In situations like these resetting the state of the backup can restore it to a functional status.

Run this command to restore the state of a backup:

$ cinder backup-reset-state [--state STATE] BACKUP_ID-1 BACKUP_ID-2 ...

Run this command to create a backup of a snapshot:

$ openstack volume backup create [--incremental] [--force] \
  [--snapshot SNAPSHOT_ID] VOLUME

Where VOLUME is the name or ID of the volume, SNAPSHOT_ID is the ID of the volume's snapshot.

Cancelling

Since Liberty it is possible to cancel an ongoing backup operation on any of the Chunked Backup type of drivers such as Swift, NFS, Google, GlusterFS, and Posix.

To issue a backup cancellation on a backup we must request a force delete on the backup.

$ openstack volume backup delete --force BACKUP_ID

Note

The policy on force delete defaults to admin only.

Even if the backup is immediately deleted, and therefore no longer appears in the listings, the cancellation may take a little bit longer, so please check the status of the source resource to see when it stops being "backing-up".

Note

Before Pike the "backing-up" status would always be stored in the volume, even when backing up a snapshot, so when backing up a snapshot any delete operation on the snapshot that followed a cancellation could result in an error if the snapshot was still mapped. Polling on the volume to stop being "backing-up" prior to the deletion is required to ensure success.

Since Rocky it is also possible to cancel an ongoing restoring operation on any of the Chunked Backup type of drivers.

To issue a backup restoration cancellation we need to alter its status to anything other than restoring. We strongly recommend using the "error" state to avoid any confusion on whether the restore was successful or not.

$ openstack volume backup set --state error BACKUP_ID

Warning

After a restore operation has started, if it is then cancelled, the destination volume is useless, as there is no way of knowing how much data, or if any, was actually restored, hence our recommendation of using the "error" state.

backup_max_operations

With this configuration option will let us select the maximum number of operations, backup and restore, that can be performed concurrently.

This option has a default value of 15, which means that we can have 15 concurrent backups, or 15 concurrent restores, or any combination of backups and restores as long as the sum of the 2 operations don't exceed 15.

The concurrency limitation of this configuration option is also enforced when we run multiple processes for the same backup service using the backup_workers configuration option. It is not a per process restriction, but global to the service, so we won't be able to run backup_max_operations on each one of the processes, but on all the running processes from the same backup service.

Backups and restore operations are both CPU and memory intensive, but thanks to this option we can limit the concurrency and prevent DoS attacks or just service disruptions caused by many concurrent requests that lead to Out of Memory (OOM) kills.

The amount of memory (RAM) used during the operation depends on the configured chunk size as well as the compression ratio achieved on the data during the operation.

Example:

Let's have a look at how much memory would be needed if we use the default backup chunk size (~1.86 GB) while doing a restore to an RBD volume from a non Ceph backend (Swift, NFS etc).

In a restore operation the worst case scenario, from the memory point of view, is when the compression ratio is close to 0% (the compressed data chunk is almost the same size as the uncompressed data).

In this case the memory usage would be ~5.58 GB of data for each chunk: ~5.58 GB = read buffer + decompressed buffer + write buffer used by the librbd library = ~1.86 GB + 1.86 GB + 1.86 GB

For 15 concurrent restore operations, the cinder-backup service will require ~83.7 GB of memory.

Similar calculations can be done for environment specific scenarios and this config option can be set accordingly.