[arch-design-draft] Compute design - storage solutions updated

- Updated content in compute arch storage solutions
- This completes the Ocata updates for compute-design in arch-design-draft

Change-Id: I1b6b484b7b76b5bd9ff05bf7a7de1340f43e4376
Implements: blueprint arch-guide-restructure-ocata
This commit is contained in:
Ben Silverman 2017-02-20 00:08:08 -05:00 committed by KATO Tomoyuki
parent 2190c10503
commit 682de53f10

View File

@ -1,13 +1,10 @@
===========================
==========================
Instance storage solutions
===========================
==========================
As part of the procurement for a compute cluster, you must specify some
storage for the disk on which the instantiated instance runs. There are
three main approaches to providing this temporary-style storage, and it
is important to understand the implications of the choice.
They are:
As part of the architecture design for a compute cluster, you must specify some
storage for the disk on which the instantiated instance runs. There are three
main approaches to providing temporary storage:
* Off compute node storage—shared file system
* On compute node storage—shared file system
@ -16,34 +13,38 @@ They are:
In general, the questions you should ask when selecting storage are as
follows:
* What is the platter count you can achieve?
* Do more spindles result in better I/O despite network access?
* Which one results in the best cost-performance scenario you are aiming for?
* How do you manage the storage operationally?
* What are my workloads?
* Do my workloads have IOPS requirements?
* Are there read, write, or random access performance requirements?
* What is my forecast for the scaling of storage for compute?
* What storage is my enterprise currently using? Can it be re-purposed?
* How do I manage the storage operationally?
Many operators use separate compute and storage hosts. Compute services
and storage services have different requirements, and compute hosts
typically require more CPU and RAM than storage hosts. Therefore, for a
fixed budget, it makes sense to have different configurations for your
compute nodes and your storage nodes. Compute nodes will be invested in
CPU and RAM, and storage nodes will be invested in block storage.
Many operators use separate compute and storage hosts instead of a
hyperconverged solution. Compute services and storage services have different
requirements, and compute hosts typically require more CPU and RAM than storage
hosts. Therefore, for a fixed budget, it makes sense to have different
configurations for your compute nodes and your storage nodes. Compute nodes
will be invested in CPU and RAM, and storage nodes will be invested in block
storage.
However, if you are more restricted in the number of physical hosts you
have available for creating your cloud and you want to be able to
dedicate as many of your hosts as possible to running instances, it
makes sense to run compute and storage on the same machines.
However, if you are more restricted in the number of physical hosts you have
available for creating your cloud and you want to be able to dedicate as many
of your hosts as possible to running instances, it makes sense to run compute
and storage on the same machines or use an existing storage array that is
available.
The three main approaches to instance storage are provided in the next
few sections.
Off compute node storage—shared file system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Non-compute node based shared file system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In this option, the disks storing the running instances are hosted in
servers outside of the compute nodes.
If you use separate compute and storage hosts, you can treat your
compute hosts as "stateless." As long as you do not have any instances
compute hosts as "stateless". As long as you do not have any instances
currently running on a compute host, you can take it offline or wipe it
completely without having any effect on the rest of your cloud. This
simplifies maintenance for the compute hosts.
@ -60,6 +61,7 @@ The main disadvantages to this approach are:
* Depending on design, heavy I/O usage from some instances can affect
unrelated instances.
* Use of the network can decrease performance.
* Scalability can be affected by network architecture.
On compute node storage—shared file system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -79,36 +81,37 @@ However, this option has several disadvantages:
* The chassis size of the compute node can limit the number of spindles
able to be used in a compute node.
* Use of the network can decrease performance.
* Loss of compute nodes decreases storage availability for all hosts.
On compute node storage—nonshared file system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In this option, each compute node is specified with enough disks to
store the instances it hosts.
In this option, each compute node is specified with enough disks to store the
instances it hosts.
There are two main advantages:
* Heavy I/O usage on one compute node does not affect instances on
other compute nodes.
* Direct I/O access can increase performance.
* Heavy I/O usage on one compute node does not affect instances on other
compute nodes. Direct I/O access can increase performance.
* Each host can have different storage profiles for hosts aggregation and
availability zones.
This has several disadvantages:
There are several disadvantages:
* If a compute node fails, the instances running on that node are lost.
* If a compute node fails, the data associated with the instances running on
that node is lost.
* The chassis size of the compute node can limit the number of spindles
able to be used in a compute node.
* Migrations of instances from one node to another are more complicated
and rely on features that may not continue to be developed.
* If additional storage is required, this option does not scale.
Running a shared file system on a storage system apart from the computes
nodes is ideal for clouds where reliability and scalability are the most
important factors. Running a shared file system on the compute nodes
themselves may be best in a scenario where you have to deploy to
preexisting servers for which you have little to no control over their
specifications. Running a nonshared file system on the compute nodes
themselves is a good option for clouds with high I/O requirements and
low concern for reliability.
Running a shared file system on a storage system apart from the compute nodes
is ideal for clouds where reliability and scalability are the most important
factors. Running a shared file system on the compute nodes themselves may be
best in a scenario where you have to deploy to pre-existing servers for which
you have little to no control over their specifications or have specific
storage performance needs but do not have a need for persistent storage.
Issues with live migration
--------------------------
@ -123,7 +126,14 @@ Live migration can also be done with nonshared storage, using a feature
known as *KVM live block migration*. While an earlier implementation of
block-based migration in KVM and QEMU was considered unreliable, there
is a newer, more reliable implementation of block-based live migration
as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack.
as of the Mitaka release.
Live migration and block migration still have some issues:
* Error reporting has received some attention in Mitaka and Newton but there
are improvements needed.
* Live migration resource tracking issues.
* Live migration of rescued images.
Choice of file system
---------------------