[arch-design-draft] Compute design - storage solutions updated
- Updated content in compute arch storage solutions - This completes the Ocata updates for compute-design in arch-design-draft Change-Id: I1b6b484b7b76b5bd9ff05bf7a7de1340f43e4376 Implements: blueprint arch-guide-restructure-ocata
This commit is contained in:
parent
2190c10503
commit
682de53f10
@ -1,13 +1,10 @@
|
||||
===========================
|
||||
==========================
|
||||
Instance storage solutions
|
||||
===========================
|
||||
==========================
|
||||
|
||||
As part of the procurement for a compute cluster, you must specify some
|
||||
storage for the disk on which the instantiated instance runs. There are
|
||||
three main approaches to providing this temporary-style storage, and it
|
||||
is important to understand the implications of the choice.
|
||||
|
||||
They are:
|
||||
As part of the architecture design for a compute cluster, you must specify some
|
||||
storage for the disk on which the instantiated instance runs. There are three
|
||||
main approaches to providing temporary storage:
|
||||
|
||||
* Off compute node storage—shared file system
|
||||
* On compute node storage—shared file system
|
||||
@ -16,34 +13,38 @@ They are:
|
||||
In general, the questions you should ask when selecting storage are as
|
||||
follows:
|
||||
|
||||
* What is the platter count you can achieve?
|
||||
* Do more spindles result in better I/O despite network access?
|
||||
* Which one results in the best cost-performance scenario you are aiming for?
|
||||
* How do you manage the storage operationally?
|
||||
* What are my workloads?
|
||||
* Do my workloads have IOPS requirements?
|
||||
* Are there read, write, or random access performance requirements?
|
||||
* What is my forecast for the scaling of storage for compute?
|
||||
* What storage is my enterprise currently using? Can it be re-purposed?
|
||||
* How do I manage the storage operationally?
|
||||
|
||||
Many operators use separate compute and storage hosts. Compute services
|
||||
and storage services have different requirements, and compute hosts
|
||||
typically require more CPU and RAM than storage hosts. Therefore, for a
|
||||
fixed budget, it makes sense to have different configurations for your
|
||||
compute nodes and your storage nodes. Compute nodes will be invested in
|
||||
CPU and RAM, and storage nodes will be invested in block storage.
|
||||
Many operators use separate compute and storage hosts instead of a
|
||||
hyperconverged solution. Compute services and storage services have different
|
||||
requirements, and compute hosts typically require more CPU and RAM than storage
|
||||
hosts. Therefore, for a fixed budget, it makes sense to have different
|
||||
configurations for your compute nodes and your storage nodes. Compute nodes
|
||||
will be invested in CPU and RAM, and storage nodes will be invested in block
|
||||
storage.
|
||||
|
||||
However, if you are more restricted in the number of physical hosts you
|
||||
have available for creating your cloud and you want to be able to
|
||||
dedicate as many of your hosts as possible to running instances, it
|
||||
makes sense to run compute and storage on the same machines.
|
||||
However, if you are more restricted in the number of physical hosts you have
|
||||
available for creating your cloud and you want to be able to dedicate as many
|
||||
of your hosts as possible to running instances, it makes sense to run compute
|
||||
and storage on the same machines or use an existing storage array that is
|
||||
available.
|
||||
|
||||
The three main approaches to instance storage are provided in the next
|
||||
few sections.
|
||||
|
||||
Off compute node storage—shared file system
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Non-compute node based shared file system
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In this option, the disks storing the running instances are hosted in
|
||||
servers outside of the compute nodes.
|
||||
|
||||
If you use separate compute and storage hosts, you can treat your
|
||||
compute hosts as "stateless." As long as you do not have any instances
|
||||
compute hosts as "stateless". As long as you do not have any instances
|
||||
currently running on a compute host, you can take it offline or wipe it
|
||||
completely without having any effect on the rest of your cloud. This
|
||||
simplifies maintenance for the compute hosts.
|
||||
@ -60,6 +61,7 @@ The main disadvantages to this approach are:
|
||||
* Depending on design, heavy I/O usage from some instances can affect
|
||||
unrelated instances.
|
||||
* Use of the network can decrease performance.
|
||||
* Scalability can be affected by network architecture.
|
||||
|
||||
On compute node storage—shared file system
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -79,36 +81,37 @@ However, this option has several disadvantages:
|
||||
* The chassis size of the compute node can limit the number of spindles
|
||||
able to be used in a compute node.
|
||||
* Use of the network can decrease performance.
|
||||
* Loss of compute nodes decreases storage availability for all hosts.
|
||||
|
||||
On compute node storage—nonshared file system
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In this option, each compute node is specified with enough disks to
|
||||
store the instances it hosts.
|
||||
In this option, each compute node is specified with enough disks to store the
|
||||
instances it hosts.
|
||||
|
||||
There are two main advantages:
|
||||
|
||||
* Heavy I/O usage on one compute node does not affect instances on
|
||||
other compute nodes.
|
||||
* Direct I/O access can increase performance.
|
||||
* Heavy I/O usage on one compute node does not affect instances on other
|
||||
compute nodes. Direct I/O access can increase performance.
|
||||
* Each host can have different storage profiles for hosts aggregation and
|
||||
availability zones.
|
||||
|
||||
This has several disadvantages:
|
||||
There are several disadvantages:
|
||||
|
||||
* If a compute node fails, the instances running on that node are lost.
|
||||
* If a compute node fails, the data associated with the instances running on
|
||||
that node is lost.
|
||||
* The chassis size of the compute node can limit the number of spindles
|
||||
able to be used in a compute node.
|
||||
* Migrations of instances from one node to another are more complicated
|
||||
and rely on features that may not continue to be developed.
|
||||
* If additional storage is required, this option does not scale.
|
||||
|
||||
Running a shared file system on a storage system apart from the computes
|
||||
nodes is ideal for clouds where reliability and scalability are the most
|
||||
important factors. Running a shared file system on the compute nodes
|
||||
themselves may be best in a scenario where you have to deploy to
|
||||
preexisting servers for which you have little to no control over their
|
||||
specifications. Running a nonshared file system on the compute nodes
|
||||
themselves is a good option for clouds with high I/O requirements and
|
||||
low concern for reliability.
|
||||
Running a shared file system on a storage system apart from the compute nodes
|
||||
is ideal for clouds where reliability and scalability are the most important
|
||||
factors. Running a shared file system on the compute nodes themselves may be
|
||||
best in a scenario where you have to deploy to pre-existing servers for which
|
||||
you have little to no control over their specifications or have specific
|
||||
storage performance needs but do not have a need for persistent storage.
|
||||
|
||||
Issues with live migration
|
||||
--------------------------
|
||||
@ -123,7 +126,14 @@ Live migration can also be done with nonshared storage, using a feature
|
||||
known as *KVM live block migration*. While an earlier implementation of
|
||||
block-based migration in KVM and QEMU was considered unreliable, there
|
||||
is a newer, more reliable implementation of block-based live migration
|
||||
as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack.
|
||||
as of the Mitaka release.
|
||||
|
||||
Live migration and block migration still have some issues:
|
||||
|
||||
* Error reporting has received some attention in Mitaka and Newton but there
|
||||
are improvements needed.
|
||||
* Live migration resource tracking issues.
|
||||
* Live migration of rescued images.
|
||||
|
||||
Choice of file system
|
||||
---------------------
|
||||
|
Loading…
x
Reference in New Issue
Block a user