[arch-design-draft] Compute design - storage solutions updated

- Updated content in compute arch storage solutions
- This completes the Ocata updates for compute-design in arch-design-draft

Change-Id: I1b6b484b7b76b5bd9ff05bf7a7de1340f43e4376
Implements: blueprint arch-guide-restructure-ocata
This commit is contained in:
Ben Silverman 2017-02-20 00:08:08 -05:00 committed by KATO Tomoyuki
parent 2190c10503
commit 682de53f10

View File

@ -1,13 +1,10 @@
=========================== ==========================
Instance storage solutions Instance storage solutions
=========================== ==========================
As part of the procurement for a compute cluster, you must specify some As part of the architecture design for a compute cluster, you must specify some
storage for the disk on which the instantiated instance runs. There are storage for the disk on which the instantiated instance runs. There are three
three main approaches to providing this temporary-style storage, and it main approaches to providing temporary storage:
is important to understand the implications of the choice.
They are:
* Off compute node storage—shared file system * Off compute node storage—shared file system
* On compute node storage—shared file system * On compute node storage—shared file system
@ -16,34 +13,38 @@ They are:
In general, the questions you should ask when selecting storage are as In general, the questions you should ask when selecting storage are as
follows: follows:
* What is the platter count you can achieve? * What are my workloads?
* Do more spindles result in better I/O despite network access? * Do my workloads have IOPS requirements?
* Which one results in the best cost-performance scenario you are aiming for? * Are there read, write, or random access performance requirements?
* How do you manage the storage operationally? * What is my forecast for the scaling of storage for compute?
* What storage is my enterprise currently using? Can it be re-purposed?
* How do I manage the storage operationally?
Many operators use separate compute and storage hosts. Compute services Many operators use separate compute and storage hosts instead of a
and storage services have different requirements, and compute hosts hyperconverged solution. Compute services and storage services have different
typically require more CPU and RAM than storage hosts. Therefore, for a requirements, and compute hosts typically require more CPU and RAM than storage
fixed budget, it makes sense to have different configurations for your hosts. Therefore, for a fixed budget, it makes sense to have different
compute nodes and your storage nodes. Compute nodes will be invested in configurations for your compute nodes and your storage nodes. Compute nodes
CPU and RAM, and storage nodes will be invested in block storage. will be invested in CPU and RAM, and storage nodes will be invested in block
storage.
However, if you are more restricted in the number of physical hosts you However, if you are more restricted in the number of physical hosts you have
have available for creating your cloud and you want to be able to available for creating your cloud and you want to be able to dedicate as many
dedicate as many of your hosts as possible to running instances, it of your hosts as possible to running instances, it makes sense to run compute
makes sense to run compute and storage on the same machines. and storage on the same machines or use an existing storage array that is
available.
The three main approaches to instance storage are provided in the next The three main approaches to instance storage are provided in the next
few sections. few sections.
Off compute node storage—shared file system Non-compute node based shared file system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In this option, the disks storing the running instances are hosted in In this option, the disks storing the running instances are hosted in
servers outside of the compute nodes. servers outside of the compute nodes.
If you use separate compute and storage hosts, you can treat your If you use separate compute and storage hosts, you can treat your
compute hosts as "stateless." As long as you do not have any instances compute hosts as "stateless". As long as you do not have any instances
currently running on a compute host, you can take it offline or wipe it currently running on a compute host, you can take it offline or wipe it
completely without having any effect on the rest of your cloud. This completely without having any effect on the rest of your cloud. This
simplifies maintenance for the compute hosts. simplifies maintenance for the compute hosts.
@ -60,6 +61,7 @@ The main disadvantages to this approach are:
* Depending on design, heavy I/O usage from some instances can affect * Depending on design, heavy I/O usage from some instances can affect
unrelated instances. unrelated instances.
* Use of the network can decrease performance. * Use of the network can decrease performance.
* Scalability can be affected by network architecture.
On compute node storage—shared file system On compute node storage—shared file system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -79,36 +81,37 @@ However, this option has several disadvantages:
* The chassis size of the compute node can limit the number of spindles * The chassis size of the compute node can limit the number of spindles
able to be used in a compute node. able to be used in a compute node.
* Use of the network can decrease performance. * Use of the network can decrease performance.
* Loss of compute nodes decreases storage availability for all hosts.
On compute node storage—nonshared file system On compute node storage—nonshared file system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In this option, each compute node is specified with enough disks to In this option, each compute node is specified with enough disks to store the
store the instances it hosts. instances it hosts.
There are two main advantages: There are two main advantages:
* Heavy I/O usage on one compute node does not affect instances on * Heavy I/O usage on one compute node does not affect instances on other
other compute nodes. compute nodes. Direct I/O access can increase performance.
* Direct I/O access can increase performance. * Each host can have different storage profiles for hosts aggregation and
availability zones.
This has several disadvantages: There are several disadvantages:
* If a compute node fails, the instances running on that node are lost. * If a compute node fails, the data associated with the instances running on
that node is lost.
* The chassis size of the compute node can limit the number of spindles * The chassis size of the compute node can limit the number of spindles
able to be used in a compute node. able to be used in a compute node.
* Migrations of instances from one node to another are more complicated * Migrations of instances from one node to another are more complicated
and rely on features that may not continue to be developed. and rely on features that may not continue to be developed.
* If additional storage is required, this option does not scale. * If additional storage is required, this option does not scale.
Running a shared file system on a storage system apart from the computes Running a shared file system on a storage system apart from the compute nodes
nodes is ideal for clouds where reliability and scalability are the most is ideal for clouds where reliability and scalability are the most important
important factors. Running a shared file system on the compute nodes factors. Running a shared file system on the compute nodes themselves may be
themselves may be best in a scenario where you have to deploy to best in a scenario where you have to deploy to pre-existing servers for which
preexisting servers for which you have little to no control over their you have little to no control over their specifications or have specific
specifications. Running a nonshared file system on the compute nodes storage performance needs but do not have a need for persistent storage.
themselves is a good option for clouds with high I/O requirements and
low concern for reliability.
Issues with live migration Issues with live migration
-------------------------- --------------------------
@ -123,7 +126,14 @@ Live migration can also be done with nonshared storage, using a feature
known as *KVM live block migration*. While an earlier implementation of known as *KVM live block migration*. While an earlier implementation of
block-based migration in KVM and QEMU was considered unreliable, there block-based migration in KVM and QEMU was considered unreliable, there
is a newer, more reliable implementation of block-based live migration is a newer, more reliable implementation of block-based live migration
as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack. as of the Mitaka release.
Live migration and block migration still have some issues:
* Error reporting has received some attention in Mitaka and Newton but there
are improvements needed.
* Live migration resource tracking issues.
* Live migration of rescued images.
Choice of file system Choice of file system
--------------------- ---------------------