[arch-design-draft] Compute design - storage solutions updated
- Updated content in compute arch storage solutions - This completes the Ocata updates for compute-design in arch-design-draft Change-Id: I1b6b484b7b76b5bd9ff05bf7a7de1340f43e4376 Implements: blueprint arch-guide-restructure-ocata
This commit is contained in:
parent
2190c10503
commit
682de53f10
@ -1,13 +1,10 @@
|
|||||||
===========================
|
==========================
|
||||||
Instance storage solutions
|
Instance storage solutions
|
||||||
===========================
|
==========================
|
||||||
|
|
||||||
As part of the procurement for a compute cluster, you must specify some
|
As part of the architecture design for a compute cluster, you must specify some
|
||||||
storage for the disk on which the instantiated instance runs. There are
|
storage for the disk on which the instantiated instance runs. There are three
|
||||||
three main approaches to providing this temporary-style storage, and it
|
main approaches to providing temporary storage:
|
||||||
is important to understand the implications of the choice.
|
|
||||||
|
|
||||||
They are:
|
|
||||||
|
|
||||||
* Off compute node storage—shared file system
|
* Off compute node storage—shared file system
|
||||||
* On compute node storage—shared file system
|
* On compute node storage—shared file system
|
||||||
@ -16,34 +13,38 @@ They are:
|
|||||||
In general, the questions you should ask when selecting storage are as
|
In general, the questions you should ask when selecting storage are as
|
||||||
follows:
|
follows:
|
||||||
|
|
||||||
* What is the platter count you can achieve?
|
* What are my workloads?
|
||||||
* Do more spindles result in better I/O despite network access?
|
* Do my workloads have IOPS requirements?
|
||||||
* Which one results in the best cost-performance scenario you are aiming for?
|
* Are there read, write, or random access performance requirements?
|
||||||
* How do you manage the storage operationally?
|
* What is my forecast for the scaling of storage for compute?
|
||||||
|
* What storage is my enterprise currently using? Can it be re-purposed?
|
||||||
|
* How do I manage the storage operationally?
|
||||||
|
|
||||||
Many operators use separate compute and storage hosts. Compute services
|
Many operators use separate compute and storage hosts instead of a
|
||||||
and storage services have different requirements, and compute hosts
|
hyperconverged solution. Compute services and storage services have different
|
||||||
typically require more CPU and RAM than storage hosts. Therefore, for a
|
requirements, and compute hosts typically require more CPU and RAM than storage
|
||||||
fixed budget, it makes sense to have different configurations for your
|
hosts. Therefore, for a fixed budget, it makes sense to have different
|
||||||
compute nodes and your storage nodes. Compute nodes will be invested in
|
configurations for your compute nodes and your storage nodes. Compute nodes
|
||||||
CPU and RAM, and storage nodes will be invested in block storage.
|
will be invested in CPU and RAM, and storage nodes will be invested in block
|
||||||
|
storage.
|
||||||
|
|
||||||
However, if you are more restricted in the number of physical hosts you
|
However, if you are more restricted in the number of physical hosts you have
|
||||||
have available for creating your cloud and you want to be able to
|
available for creating your cloud and you want to be able to dedicate as many
|
||||||
dedicate as many of your hosts as possible to running instances, it
|
of your hosts as possible to running instances, it makes sense to run compute
|
||||||
makes sense to run compute and storage on the same machines.
|
and storage on the same machines or use an existing storage array that is
|
||||||
|
available.
|
||||||
|
|
||||||
The three main approaches to instance storage are provided in the next
|
The three main approaches to instance storage are provided in the next
|
||||||
few sections.
|
few sections.
|
||||||
|
|
||||||
Off compute node storage—shared file system
|
Non-compute node based shared file system
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
In this option, the disks storing the running instances are hosted in
|
In this option, the disks storing the running instances are hosted in
|
||||||
servers outside of the compute nodes.
|
servers outside of the compute nodes.
|
||||||
|
|
||||||
If you use separate compute and storage hosts, you can treat your
|
If you use separate compute and storage hosts, you can treat your
|
||||||
compute hosts as "stateless." As long as you do not have any instances
|
compute hosts as "stateless". As long as you do not have any instances
|
||||||
currently running on a compute host, you can take it offline or wipe it
|
currently running on a compute host, you can take it offline or wipe it
|
||||||
completely without having any effect on the rest of your cloud. This
|
completely without having any effect on the rest of your cloud. This
|
||||||
simplifies maintenance for the compute hosts.
|
simplifies maintenance for the compute hosts.
|
||||||
@ -60,6 +61,7 @@ The main disadvantages to this approach are:
|
|||||||
* Depending on design, heavy I/O usage from some instances can affect
|
* Depending on design, heavy I/O usage from some instances can affect
|
||||||
unrelated instances.
|
unrelated instances.
|
||||||
* Use of the network can decrease performance.
|
* Use of the network can decrease performance.
|
||||||
|
* Scalability can be affected by network architecture.
|
||||||
|
|
||||||
On compute node storage—shared file system
|
On compute node storage—shared file system
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
@ -79,36 +81,37 @@ However, this option has several disadvantages:
|
|||||||
* The chassis size of the compute node can limit the number of spindles
|
* The chassis size of the compute node can limit the number of spindles
|
||||||
able to be used in a compute node.
|
able to be used in a compute node.
|
||||||
* Use of the network can decrease performance.
|
* Use of the network can decrease performance.
|
||||||
|
* Loss of compute nodes decreases storage availability for all hosts.
|
||||||
|
|
||||||
On compute node storage—nonshared file system
|
On compute node storage—nonshared file system
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
In this option, each compute node is specified with enough disks to
|
In this option, each compute node is specified with enough disks to store the
|
||||||
store the instances it hosts.
|
instances it hosts.
|
||||||
|
|
||||||
There are two main advantages:
|
There are two main advantages:
|
||||||
|
|
||||||
* Heavy I/O usage on one compute node does not affect instances on
|
* Heavy I/O usage on one compute node does not affect instances on other
|
||||||
other compute nodes.
|
compute nodes. Direct I/O access can increase performance.
|
||||||
* Direct I/O access can increase performance.
|
* Each host can have different storage profiles for hosts aggregation and
|
||||||
|
availability zones.
|
||||||
|
|
||||||
This has several disadvantages:
|
There are several disadvantages:
|
||||||
|
|
||||||
* If a compute node fails, the instances running on that node are lost.
|
* If a compute node fails, the data associated with the instances running on
|
||||||
|
that node is lost.
|
||||||
* The chassis size of the compute node can limit the number of spindles
|
* The chassis size of the compute node can limit the number of spindles
|
||||||
able to be used in a compute node.
|
able to be used in a compute node.
|
||||||
* Migrations of instances from one node to another are more complicated
|
* Migrations of instances from one node to another are more complicated
|
||||||
and rely on features that may not continue to be developed.
|
and rely on features that may not continue to be developed.
|
||||||
* If additional storage is required, this option does not scale.
|
* If additional storage is required, this option does not scale.
|
||||||
|
|
||||||
Running a shared file system on a storage system apart from the computes
|
Running a shared file system on a storage system apart from the compute nodes
|
||||||
nodes is ideal for clouds where reliability and scalability are the most
|
is ideal for clouds where reliability and scalability are the most important
|
||||||
important factors. Running a shared file system on the compute nodes
|
factors. Running a shared file system on the compute nodes themselves may be
|
||||||
themselves may be best in a scenario where you have to deploy to
|
best in a scenario where you have to deploy to pre-existing servers for which
|
||||||
preexisting servers for which you have little to no control over their
|
you have little to no control over their specifications or have specific
|
||||||
specifications. Running a nonshared file system on the compute nodes
|
storage performance needs but do not have a need for persistent storage.
|
||||||
themselves is a good option for clouds with high I/O requirements and
|
|
||||||
low concern for reliability.
|
|
||||||
|
|
||||||
Issues with live migration
|
Issues with live migration
|
||||||
--------------------------
|
--------------------------
|
||||||
@ -123,7 +126,14 @@ Live migration can also be done with nonshared storage, using a feature
|
|||||||
known as *KVM live block migration*. While an earlier implementation of
|
known as *KVM live block migration*. While an earlier implementation of
|
||||||
block-based migration in KVM and QEMU was considered unreliable, there
|
block-based migration in KVM and QEMU was considered unreliable, there
|
||||||
is a newer, more reliable implementation of block-based live migration
|
is a newer, more reliable implementation of block-based live migration
|
||||||
as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack.
|
as of the Mitaka release.
|
||||||
|
|
||||||
|
Live migration and block migration still have some issues:
|
||||||
|
|
||||||
|
* Error reporting has received some attention in Mitaka and Newton but there
|
||||||
|
are improvements needed.
|
||||||
|
* Live migration resource tracking issues.
|
||||||
|
* Live migration of rescued images.
|
||||||
|
|
||||||
Choice of file system
|
Choice of file system
|
||||||
---------------------
|
---------------------
|
||||||
|
Loading…
x
Reference in New Issue
Block a user