diff --git a/doc/arch-design-draft/source/design-compute/design-compute-storage.rst b/doc/arch-design-draft/source/design-compute/design-compute-storage.rst index af5e6577b5..c525478a61 100644 --- a/doc/arch-design-draft/source/design-compute/design-compute-storage.rst +++ b/doc/arch-design-draft/source/design-compute/design-compute-storage.rst @@ -1,13 +1,10 @@ -=========================== +========================== Instance storage solutions -=========================== +========================== -As part of the procurement for a compute cluster, you must specify some -storage for the disk on which the instantiated instance runs. There are -three main approaches to providing this temporary-style storage, and it -is important to understand the implications of the choice. - -They are: +As part of the architecture design for a compute cluster, you must specify some +storage for the disk on which the instantiated instance runs. There are three +main approaches to providing temporary storage: * Off compute node storage—shared file system * On compute node storage—shared file system @@ -16,34 +13,38 @@ They are: In general, the questions you should ask when selecting storage are as follows: -* What is the platter count you can achieve? -* Do more spindles result in better I/O despite network access? -* Which one results in the best cost-performance scenario you are aiming for? -* How do you manage the storage operationally? +* What are my workloads? +* Do my workloads have IOPS requirements? +* Are there read, write, or random access performance requirements? +* What is my forecast for the scaling of storage for compute? +* What storage is my enterprise currently using? Can it be re-purposed? +* How do I manage the storage operationally? -Many operators use separate compute and storage hosts. Compute services -and storage services have different requirements, and compute hosts -typically require more CPU and RAM than storage hosts. Therefore, for a -fixed budget, it makes sense to have different configurations for your -compute nodes and your storage nodes. Compute nodes will be invested in -CPU and RAM, and storage nodes will be invested in block storage. +Many operators use separate compute and storage hosts instead of a +hyperconverged solution. Compute services and storage services have different +requirements, and compute hosts typically require more CPU and RAM than storage +hosts. Therefore, for a fixed budget, it makes sense to have different +configurations for your compute nodes and your storage nodes. Compute nodes +will be invested in CPU and RAM, and storage nodes will be invested in block +storage. -However, if you are more restricted in the number of physical hosts you -have available for creating your cloud and you want to be able to -dedicate as many of your hosts as possible to running instances, it -makes sense to run compute and storage on the same machines. +However, if you are more restricted in the number of physical hosts you have +available for creating your cloud and you want to be able to dedicate as many +of your hosts as possible to running instances, it makes sense to run compute +and storage on the same machines or use an existing storage array that is +available. The three main approaches to instance storage are provided in the next few sections. -Off compute node storage—shared file system -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Non-compute node based shared file system +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this option, the disks storing the running instances are hosted in servers outside of the compute nodes. If you use separate compute and storage hosts, you can treat your -compute hosts as "stateless." As long as you do not have any instances +compute hosts as "stateless". As long as you do not have any instances currently running on a compute host, you can take it offline or wipe it completely without having any effect on the rest of your cloud. This simplifies maintenance for the compute hosts. @@ -60,6 +61,7 @@ The main disadvantages to this approach are: * Depending on design, heavy I/O usage from some instances can affect unrelated instances. * Use of the network can decrease performance. +* Scalability can be affected by network architecture. On compute node storage—shared file system ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -79,36 +81,37 @@ However, this option has several disadvantages: * The chassis size of the compute node can limit the number of spindles able to be used in a compute node. * Use of the network can decrease performance. +* Loss of compute nodes decreases storage availability for all hosts. On compute node storage—nonshared file system ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In this option, each compute node is specified with enough disks to -store the instances it hosts. +In this option, each compute node is specified with enough disks to store the +instances it hosts. There are two main advantages: -* Heavy I/O usage on one compute node does not affect instances on - other compute nodes. -* Direct I/O access can increase performance. +* Heavy I/O usage on one compute node does not affect instances on other + compute nodes. Direct I/O access can increase performance. +* Each host can have different storage profiles for hosts aggregation and + availability zones. -This has several disadvantages: +There are several disadvantages: -* If a compute node fails, the instances running on that node are lost. +* If a compute node fails, the data associated with the instances running on + that node is lost. * The chassis size of the compute node can limit the number of spindles able to be used in a compute node. * Migrations of instances from one node to another are more complicated and rely on features that may not continue to be developed. * If additional storage is required, this option does not scale. -Running a shared file system on a storage system apart from the computes -nodes is ideal for clouds where reliability and scalability are the most -important factors. Running a shared file system on the compute nodes -themselves may be best in a scenario where you have to deploy to -preexisting servers for which you have little to no control over their -specifications. Running a nonshared file system on the compute nodes -themselves is a good option for clouds with high I/O requirements and -low concern for reliability. +Running a shared file system on a storage system apart from the compute nodes +is ideal for clouds where reliability and scalability are the most important +factors. Running a shared file system on the compute nodes themselves may be +best in a scenario where you have to deploy to pre-existing servers for which +you have little to no control over their specifications or have specific +storage performance needs but do not have a need for persistent storage. Issues with live migration -------------------------- @@ -123,7 +126,14 @@ Live migration can also be done with nonshared storage, using a feature known as *KVM live block migration*. While an earlier implementation of block-based migration in KVM and QEMU was considered unreliable, there is a newer, more reliable implementation of block-based live migration -as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack. +as of the Mitaka release. + +Live migration and block migration still have some issues: + +* Error reporting has received some attention in Mitaka and Newton but there + are improvements needed. +* Live migration resource tracking issues. +* Live migration of rescued images. Choice of file system ---------------------