Back up your Block Storage disks

Back up your Block Storage disks While you can use the snapshot functionality (using LVM snapshot), you can also back up your volumes. The advantage of this method is that it reduces the size of the backup; only existing data will be backed up, instead of the entire volume. For this example, assume that a 100 GB volume has been created for an instance, while only 4 gigabytes are used. This process will back up only those 4 gigabytes, with the following tools: lvm2, directly manipulates the volumes. kpartx discovers the partition table created inside the instance. tar creates a minimum-sized backup sha1sum calculates the backup checksum, to check its consistency 1- Create a snapshot of a used volume In order to backup our volume, we first need to create a snapshot of it. An LVM snapshot is the exact copy of a logical volume, which contains data in a frozen state. This prevents data corruption, because data will not be manipulated during the process of creating the volume itself. Remember the volumes created through a nova volume-create exist in an LVM's logical volume. Before creating the snapshot, ensure that you have enough space to save it. As a precaution, you should have at least twice as much space as the potential snapshot size. If insufficient space is available, there is a risk that the snapshot could become corrupted. Use the following command to obtain a list of all volumes: $ lvdisplay In this example, we will refer to a volume called volume-00000001, which is a 10GB volume. This process can be applied to all volumes, not matter their size. At the end of the section, we will present a script that you could use to create scheduled backups. The script itself exploits what we discuss here. First, create the snapshot; this can be achieved while the volume is attached to an instance : $ lvcreate --size 10G --snapshot --name volume-00000001-snapshot /dev/nova-volumes/volume-00000001 We indicate to LVM we want a snapshot of an already existing volume with the --snapshot configuration option. The command includes the size of the space reserved for the snapshot volume, the name of the snapshot, and the path of an already existing volume (In most cases, the path will be /dev/nova-volumes/$volume_name). The size doesn't have to be the same as the volume of the snapshot. The size parameter designates the space that LVM will reserve for the snapshot volume. As a precaution, the size should be the same as that of the original volume, even if we know the whole space is not currently used by the snapshot. We now have a full snapshot, and it only took few seconds ! Run lvdisplay again to verify the snapshot. You should see now your snapshot: --- Logical volume --- LV Name /dev/nova-volumes/volume-00000001 VG Name nova-volumes LV UUID gI8hta-p21U-IW2q-hRN1-nTzN-UC2G-dKbdKr LV Write Access read/write LV snapshot status source of /dev/nova-volumes/volume-00000026-snap [active] LV Status available # open 1 LV Size 15,00 GiB Current LE 3840 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 251:13 --- Logical volume --- LV Name /dev/nova-volumes/volume-00000001-snap VG Name nova-volumes LV UUID HlW3Ep-g5I8-KGQb-IRvi-IRYU-lIKe-wE9zYr LV Write Access read/write LV snapshot status active destination for /dev/nova-volumes/volume-00000026 LV Status available # open 0 LV Size 15,00 GiB Current LE 3840 COW-table size 10,00 GiB COW-table LE 2560 Allocated to snapshot 0,00% Snapshot chunk size 4,00 KiB Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 251:14 2- Partition table discovery If we want to exploit that snapshot with the tar program, we first need to mount our partition on the Block Storage server. kpartx is a small utility which performs table partition discoveries, and maps it. It can be used to view partitions created inside the instance. Without using the partitions created inside instances, we won' t be able to see its content and create efficient backups. $ kpartx -av /dev/nova-volumes/volume-00000001-snapshot If no errors are displayed, it means the tools has been able to find it, and map the partition table. Note that on a Debian flavor distro, you could also use apt-get install kpartx. You can easily check the partition table map by running the following command: $ ls /dev/mapper/nova* You should now see a partition called nova--volumes-volume--00000001--snapshot1 If you created more than one partition on that volumes, you should have accordingly several partitions; for example. nova--volumes-volume--00000001--snapshot2, nova--volumes-volume--00000001--snapshot3 and so forth. We can now mount our partition: $ mount /dev/mapper/nova--volumes-volume--volume--00000001--snapshot1 /mnt If there are no errors, you have successfully mounted the partition. You should now be able to directly access the data that were created inside the instance. If you receive a message asking you to specify a partition, or if you are unable to mount it (despite a well-specified filesystem) there could be two causes: You didn't allocate enough space for the snapshot kpartx was unable to discover the partition table. Allocate more space to the snapshot and try the process again. 3- Use tar in order to create archives Now that the volume has been mounted, you can create a backup of it: $ tar --exclude={"lost+found","some/data/to/exclude"} -czf volume-00000001.tar.gz -C /mnt/ /backup/destination This command will create a tar.gz file containing the data, and data only. This ensures that you do not waste space by backing up empty sectors. 4- Checksum calculation I You should always have the checksum for your backup files. The checksum is a unique identifier for a file. When you transfer that same file over the network, you can run another checksum calculation. If the checksums are different, this indicates that the file is corrupted; thus, the checksum provides a method to ensure your file has not been corrupted during its transfer. The following command runs a checksum for our file, and saves the result to a file : $ sha1sum volume-00000001.tar.gz > volume-00000001.checksum Be aware the sha1sum should be used carefully, since the required time for the calculation is directly proportional to the file's size. For files larger than ~4-6 gigabytes, and depending on your CPU, the process may take a long time. 5- After work cleaning Now that we have an efficient and consistent backup, the following commands will clean up the file system. Unmount the volume: unmount /mnt Delete the partition table: kpartx -dv /dev/nova-volumes/volume-00000001-snapshot Remove the snapshot: lvremove -f /dev/nova-volumes/volume-00000001-snapshot And voila :) You can now repeat these steps for every volume you have. 6- Automate your backups Because you can expect that more and more volumes will be allocated to your Block Storage service, you may want to automate your backups. This script here will assist you on this task. The script performs the operations from the previous example, but also provides a mail report and runs the backup based on the backups_retention_days setting. It is meant to be launched from the server which runs the Block Storage component. Here is an example of a mail report: Backup Start Time - 07/10 at 01:00:01 Current retention - 7 days The backup volume is mounted. Proceed... Removing old backups... : /BACKUPS/EBS-VOL/volume-00000019/volume-00000019_28_09_2011.tar.gz /BACKUPS/EBS-VOL/volume-00000019 - 0 h 1 m and 21 seconds. Size - 3,5G The backup volume is mounted. Proceed... Removing old backups... : /BACKUPS/EBS-VOL/volume-0000001a/volume-0000001a_28_09_2011.tar.gz /BACKUPS/EBS-VOL/volume-0000001a - 0 h 4 m and 15 seconds. Size - 6,9G --------------------------------------- Total backups size - 267G - Used space : 35% Total execution time - 1 h 75 m and 35 seconds The script also provides the ability to SSH to your instances and run a mysqldump into them. In order to make this to work, ensure the connection via the nova's project keys is enabled. If you don't want to run the mysqldumps, you can turn off this functionality by adding enable_mysql_dump=0 to the script.