[arch-design-draft] Compute design - storage solutions updated

- Updated content in compute arch storage solutions - This completes the Ocata updates for compute-design in arch-design-draft Change-Id: I1b6b484b7b76b5bd9ff05bf7a7de1340f43e4376 Implements: blueprint arch-guide-restructure-ocata
2017-02-20 00:08:08 -05:00 · 2017-02-20 00:08:08 -05:00 · 682de53f10
commit 682de53f10
parent 2190c10503
1 changed files with 51 additions and 41 deletions
--- a/doc/arch-design-draft/source/design-compute/design-compute-storage.rst
+++ b/doc/arch-design-draft/source/design-compute/design-compute-storage.rst
@ -1,13 +1,10 @@
-===========================
+==========================
 Instance storage solutions
-===========================
+==========================
-As part of the procurement for a compute cluster, you must specify some
+As part of the architecture design for a compute cluster, you must specify some
-storage for the disk on which the instantiated instance runs. There are
+storage for the disk on which the instantiated instance runs. There are three
-three main approaches to providing this temporary-style storage, and it
+main approaches to providing temporary storage:
 is important to understand the implications of the choice.
 They are:
 * Off compute node storage—shared file system
 * On compute node storage—shared file system
@ -16,34 +13,38 @@ They are:
 In general, the questions you should ask when selecting storage are as
 follows:
-* What is the platter count you can achieve?
+* What are my workloads?
-* Do more spindles result in better I/O despite network access?
+* Do my workloads have IOPS requirements?
-* Which one results in the best cost-performance scenario you are aiming for?
+* Are there read, write, or random access performance requirements?
-* How do you manage the storage operationally?
+* What is my forecast for the scaling of storage for compute?
 * What storage is my enterprise currently using? Can it be re-purposed?
 * How do I manage the storage operationally?
-Many operators use separate compute and storage hosts. Compute services
+Many operators use separate compute and storage hosts instead of a
-and storage services have different requirements, and compute hosts
+hyperconverged solution. Compute services and storage services have different
-typically require more CPU and RAM than storage hosts. Therefore, for a
+requirements, and compute hosts typically require more CPU and RAM than storage
-fixed budget, it makes sense to have different configurations for your
+hosts. Therefore, for a fixed budget, it makes sense to have different
-compute nodes and your storage nodes. Compute nodes will be invested in
+configurations for your compute nodes and your storage nodes. Compute nodes
-CPU and RAM, and storage nodes will be invested in block storage.
+will be invested in CPU and RAM, and storage nodes will be invested in block
 storage.
-However, if you are more restricted in the number of physical hosts you
+However, if you are more restricted in the number of physical hosts you have
-have available for creating your cloud and you want to be able to
+available for creating your cloud and you want to be able to dedicate as many
-dedicate as many of your hosts as possible to running instances, it
+of your hosts as possible to running instances, it makes sense to run compute
-makes sense to run compute and storage on the same machines.
+and storage on the same machines or use an existing storage array that is
 available.
 The three main approaches to instance storage are provided in the next
 few sections.
-Off compute node storage—shared file system
+Non-compute node based shared file system
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 In this option, the disks storing the running instances are hosted in
 servers outside of the compute nodes.
 If you use separate compute and storage hosts, you can treat your
-compute hosts as "stateless." As long as you do not have any instances
+compute hosts as "stateless". As long as you do not have any instances
 currently running on a compute host, you can take it offline or wipe it
 completely without having any effect on the rest of your cloud. This
 simplifies maintenance for the compute hosts.
@ -60,6 +61,7 @@ The main disadvantages to this approach are:
 * Depending on design, heavy I/O usage from some instances can affect
  unrelated instances.
 * Use of the network can decrease performance.
 * Scalability can be affected by network architecture.
 On compute node storage—shared file system
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -79,36 +81,37 @@ However, this option has several disadvantages:
 * The chassis size of the compute node can limit the number of spindles
  able to be used in a compute node.
 * Use of the network can decrease performance.
 * Loss of compute nodes decreases storage availability for all hosts.
 On compute node storage—nonshared file system
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-In this option, each compute node is specified with enough disks to
+In this option, each compute node is specified with enough disks to store the
-store the instances it hosts.
+instances it hosts.
 There are two main advantages:
-* Heavy I/O usage on one compute node does not affect instances on
+* Heavy I/O usage on one compute node does not affect instances on other
-  other compute nodes.
+  compute nodes. Direct I/O access can increase performance.
-* Direct I/O access can increase performance.
+* Each host can have different storage profiles for hosts aggregation and
  availability zones.
-This has several disadvantages:
+There are several disadvantages:
-* If a compute node fails, the instances running on that node are lost.
+* If a compute node fails, the data associated with the instances running on
  that node is lost.
 * The chassis size of the compute node can limit the number of spindles
  able to be used in a compute node.
 * Migrations of instances from one node to another are more complicated
  and rely on features that may not continue to be developed.
 * If additional storage is required, this option does not scale.
-Running a shared file system on a storage system apart from the computes
+Running a shared file system on a storage system apart from the compute nodes
-nodes is ideal for clouds where reliability and scalability are the most
+is ideal for clouds where reliability and scalability are the most important
-important factors. Running a shared file system on the compute nodes
+factors. Running a shared file system on the compute nodes themselves may be
-themselves may be best in a scenario where you have to deploy to
+best in a scenario where you have to deploy to pre-existing servers for which
-preexisting servers for which you have little to no control over their
+you have little to no control over their specifications or have specific
-specifications. Running a nonshared file system on the compute nodes
+storage performance needs but do not have a need for persistent storage.
 themselves is a good option for clouds with high I/O requirements and
 low concern for reliability.
 Issues with live migration
 --------------------------
@ -123,7 +126,14 @@ Live migration can also be done with nonshared storage, using a feature
 known as *KVM live block migration*. While an earlier implementation of
 block-based migration in KVM and QEMU was considered unreliable, there
 is a newer, more reliable implementation of block-based live migration
-as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack.
+as of the Mitaka release.
 Live migration and block migration still have some issues:
 * Error reporting has received some attention in Mitaka and Newton but there
  are improvements needed.
 * Live migration resource tracking issues.
 * Live migration of rescued images.
 Choice of file system
 ---------------------