Consolidate User Requirements in Arch Guide

Collects user requirement information from various chapters in the Architecture Design Guide and consolidates them into a single Customer Requirements chapter. Change-Id: If15c053da58f00bbba9e51424fb4d772e677a5e9 Closes-bug: #1548149 Implements: blueprint archguide-mitaka-reorg
2016-03-08 16:40:08 +11:00 · 2016-03-08 16:40:08 +11:00 · 3097d5395b
commit 3097d5395b
parent ae5046c43c
10 changed files with 604 additions and 22 deletions
--- a/doc/arch-design-draft/source/customer-requirements-business-considerations.rst
+++ b/doc/arch-design-draft/source/customer-requirements-business-considerations.rst
@ -0,0 +1,147 @@
 =======================
 Business considerations
 =======================
 Cost
 ~~~~
 Financial factors are a primary concern for any organization. Cost
 considerations may influence the type of cloud that you build.
 For example, a general purpose cloud is unlikely to be the most
 cost-effective environment for specialized applications.
 Unless business needs dictate that cost is a critical factor,
 cost should not be the sole consideration when choosing or designing a cloud.
 As a general guideline, increasing the complexity of a cloud architecture
 increases the cost of building and maintaining it. For example, a hybrid or
 multi-site cloud architecture involving multiple vendors and technical
 architectures may require higher setup and operational costs because of the
 need for more sophisticated orchestration and brokerage tools than in other
 architectures. However, overall operational costs might be lower by virtue of
 using a cloud brokerage tool to deploy the workloads to the most cost effective
 platform.
 Consider the following costs categories when designing a cloud:
 *  Compute resources
 *  Networking resources
 *  Replication
 *  Storage
 *  Management
 *  Operational costs
 It is also important to be consider how costs will increase as your cloud
 scales. Choices that have a negligible impact in small systems may considerably
 increase costs in large systems. In these cases, it is important to minimize
 capital expenditure (CapEx) at all layers of the stack. Operators of massively
 scalable OpenStack clouds require the use of dependable commodity hardware and
 freely available open source software components to reduce deployment costs and
 operational expenses. Initiatives like OpenCompute (more information available
 at http://www.opencompute.org) provide additional information and pointers.
 Factors to consider include power, cooling, and the physical design of the
 chassis. Through customization, it is possible to optimize your hardware and
 systems for specific types of workloads when working at scale.
 Time-to-market
 ~~~~~~~~~~~~~~
 The ability to deliver services or products within a flexible time
 frame is a common business factor when building a cloud. Allowing users to
 self-provision and gain access to compute, network, and
 storage resources on-demand may decrease time-to-market for new products
 and applications.
 You must balance the time required to build a new cloud platform against the
 time saved by migrating users away from legacy platforms. In some cases,
 existing infrastructure may influence your architecture choices. For example,
 using multiple cloud platforms may be a good option when there is an existing
 investment in several applications, as it could be faster to tie the
 investments together rather than migrating the components and refactoring them
 to a single platform.
 Revenue opportunity
 ~~~~~~~~~~~~~~~~~~~
 Revenue opportunities vary based on the intent and use case of the cloud.
 The requirements of a commercial, customer-facing product are often very
 different from an internal, private cloud. You must consider what features
 make your design most attractive to your users.
 Compliance and geo-location
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 An organization may have certain legal obligations and regulatory
 compliance measures which could require certain workloads or data to not
 be located in certain regions. See :ref:`legal-requirements`.
 Compliance considerations are particularly important for multi-site clouds.
 Considerations include:
 - federal legal requirements
 - local jurisdictional legal and compliance requirements
 - image consistency and availability
 - storage replication and availability (both block and file/object storage)
 - authentication, authorization, and auditing (AAA)
 Geographical considerations may also impact the cost of building or leasing
 data centers. Considerations include:
 - floor space
 - floor weight
 - rack height and type
 - environmental considerations
 - power usage and power usage efficiency (PUE)
 - physical security
 Auditing
 ~~~~~~~~
 A well-considered auditing plan is essential for quickly finding issues.
 Keeping track of changes made to security groups and tenant changes can be
 useful in rolling back the changes if they affect production. For example,
 if all security group rules for a tenant disappeared, the ability to quickly
 track down the issue would be important for operational and legal reasons.
 Security
 ~~~~~~~~
 The importance of security varies based on the type of organization using
 a cloud. For example, government and financial institutions often have
 very high security requirements. Security should be implemented according to
 asset, threat, and vulnerability risk assessment matrices.
 See :ref:`security-requirements`.
 Service level agreements
 ~~~~~~~~~~~~~~~~~~~~~~~~
 Service level agreements (SLA) must be developed in conjuction with business,
 technical, and legal input. Small, private clouds may operate under an informal
 SLA, but hybrid or public clouds generally require more formal agreements with
 their users.
 For a user of a massively scalable OpenStack public cloud, there are no
 expectations for control over security, performance, or availability. Users
 expect only SLAs related to uptime of API services, and very basic SLAs for
 services offered. It is the user's responsibility to address these issues on
 their own. The exception to this expectation is the rare case of a massively
 scalable cloud infrastructure built for a private or government organization
 that has specific requirements.
 High performance systems have SLA requirements for a minimum quality of service
 with regard to guaranteed uptime, latency, and bandwidth. The level of the
 SLA can have a significant impact on the network architecture and
 requirements for redundancy in the systems.
 Hybrid cloud designs must accommodate differences in SLAs between providers,
 and consider their enforceability.
--- a/doc/arch-design-draft/source/customer-requirements-performance-considerations.rst
+++ b/doc/arch-design-draft/source/customer-requirements-performance-considerations.rst
@ -0,0 +1,232 @@
 ==========================
 Performance considerations
 ==========================
 Performance is a critical considertion when designing any cloud, and becomes
 increasingly important as size and complexity grow. While single-site, private
 clouds can be closely controlled, multi-site and hybrid deployments require
 more careful planning to reduce problems such as network latency between sites.
 For example, you should consider the time required to
 run a workload in different clouds and methods for reducing this time.
 This may require moving data closer to applications or applications
 closer to the data they process, and grouping functionality so that
 connections that require low latency take place over a single cloud
 rather than spanning clouds.
 This may also require a CMP that can determine which cloud can most
 efficiently run which types of workloads.
 Using native OpenStack tools can help improve performance.
 For example, you can use Telemetry to measure performance and the
 Orchestration service (heat) to react to changes in demand.
 .. note::
   Orchestration requires special client configurations to integrate
   with Amazon Web Services. For other types of clouds, use CMP features.
 Cloud resource deployment
 The cloud user expects repeatable, dependable, and deterministic processes
 for launching and deploying cloud resources. You could deliver this through
 a web-based interface or publicly available API endpoints. All appropriate
 options for requesting cloud resources must be available through some type
 of user interface, a command-line interface (CLI), or API endpoints.
 Consumption model
 Cloud users expect a fully self-service and on-demand consumption model.
 When an OpenStack cloud reaches the massively scalable size, expect
 consumption as a service in each and every way.
 * Everything must be capable of automation. For example, everything from
   compute hardware, storage hardware, networking hardware, to the installation
   and configuration of the supporting software. Manual processes are
   impractical in a massively scalable OpenStack design architecture.
 * Massively scalable OpenStack clouds require extensive metering and
   monitoring functionality to maximize the operational efficiency by keeping
   the operator informed about the status and state of the infrastructure. This
   includes full scale metering of the hardware and software status. A
   corresponding framework of logging and alerting is also required to store
   and enable operations to act on the meters provided by the metering and
   monitoring solutions. The cloud operator also needs a solution that uses the
   data provided by the metering and monitoring solution to provide capacity
   planning and capacity trending analysis.
 Location
 For many use cases the proximity of the user to their workloads has a
 direct influence on the performance of the application and therefore
 should be taken into consideration in the design. Certain applications
 require zero to minimal latency that can only be achieved by deploying
 the cloud in multiple locations. These locations could be in different
 data centers, cities, countries or geographical regions, depending on
 the user requirement and location of the users.
 Input-Output requirements
 Input-Output performance requirements require researching and
 modeling before deciding on a final storage framework. Running
 benchmarks for Input-Output performance provides a baseline for
 expected performance levels. If these tests include details, then
 the resulting data can help model behavior and results during
 different workloads. Running scripted smaller benchmarks during the
 lifecycle of the architecture helps record the system health at
 different points in time. The data from these scripted benchmarks
 assist in future scoping and gaining a deeper understanding of an
 organization's needs.
 Scale
 Scaling storage solutions in a storage-focused OpenStack
 architecture design is driven by initial requirements, including
 :term:`IOPS`, capacity, bandwidth, and future needs. Planning
 capacity based on projected needs over the course of a budget cycle
 is important for a design. The architecture should balance cost and
 capacity, while also allowing flexibility to implement new
 technologies and methods as they become available.
 Network considerations
 ~~~~~~~~~~~~~~~~~~~~~~
 It is important to consider the functionality, security, scalability,
 availability, and testability of the network when choosing a CMP and cloud
 provider.
 * Decide on a network framework and design minimum functionality tests.
  This ensures testing and functionality persists during and after
  upgrades.
 * Scalability across multiple cloud providers may dictate which underlying
  network framework you choose in different cloud providers.
  It is important to present the network API functions and to verify
  that functionality persists across all cloud endpoints chosen.
 * High availability implementations vary in functionality and design.
  Examples of some common methods are active-hot-standby, active-passive,
  and active-active.
  Development of high availability and test frameworks is necessary to
  insure understanding of functionality and limitations.
 * Consider the security of data between the client and the endpoint,
  and of traffic that traverses the multiple clouds.
 For example, degraded video streams and low quality VoIP sessions negatively
 impact user experience and may lead to productivity and economic loss.
 Network misconfigurations
 Configuring incorrect IP addresses, VLANs, and routers can cause
 outages to areas of the network or, in the worst-case scenario, the
 entire cloud infrastructure. Automate network configurations to
 minimize the opportunity for operator error as it can cause
 disruptive problems.
 Capacity planning
 Cloud networks require management for capacity and growth over time.
 Capacity planning includes the purchase of network circuits and
 hardware that can potentially have lead times measured in months or
 years.
 Network tuning
 Configure cloud networks to minimize link loss, packet loss, packet
 storms, broadcast storms, and loops.
 Single Point Of Failure (SPOF)
 Consider high availability at the physical and environmental layers.
 If there is a single point of failure due to only one upstream link,
 or only one power supply, an outage can become unavoidable.
 Complexity
 An overly complex network design can be difficult to maintain and
 troubleshoot. While device-level configuration can ease maintenance
 concerns and automated tools can handle overlay networks, avoid or
 document non-traditional interconnects between functions and
 specialized hardware to prevent outages.
 Non-standard features
 There are additional risks that arise from configuring the cloud
 network to take advantage of vendor specific features. One example
 is multi-link aggregation (MLAG) used to provide redundancy at the
 aggregator switch level of the network. MLAG is not a standard and,
 as a result, each vendor has their own proprietary implementation of
 the feature. MLAG architectures are not interoperable across switch
 vendors, which leads to vendor lock-in, and can cause delays or
 inability when upgrading components.
 Dynamic resource expansion or bursting
 An application that requires additional resources may suit a multiple
 cloud architecture. For example, a retailer needs additional resources
 during the holiday season, but does not want to add private cloud
 resources to meet the peak demand.
 The user can accommodate the increased load by bursting to
 a public cloud for these peak load periods. These bursts could be
 for long or short cycles ranging from hourly to yearly.
 Consistency of images and templates across different sites
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 It is essential that the deployment of instances is consistent across
 different sites and built into the infrastructure. If OpenStack
 Object Storage is used as a back end for the Image service, it is
 possible to create repositories of consistent images across multiple
 sites. Having central endpoints with multiple storage nodes allows
 consistent centralized storage for every site.
 Not using a centralized object store increases the operational overhead
 of maintaining a consistent image library. This could include
 development of a replication mechanism to handle the transport of images
 and the changes to the images across multiple sites.
 Migration, availability, site loss and recovery
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Outages can cause partial or full loss of site functionality. Strategies
 should be implemented to understand and plan for recovery scenarios.
 *  The deployed applications need to continue to function and, more
   importantly, you must consider the impact on the performance and
   reliability of the application when a site is unavailable.
 *  It is important to understand what happens to the replication of
   objects and data between the sites when a site goes down. If this
   causes queues to start building up, consider how long these queues
   can safely exist until an error occurs.
 *  After an outage, ensure the method for resuming proper operations of
   a site is implemented when it comes back online. We recommend you
   architect the recovery to avoid race conditions.
 Disaster recovery and business continuity
 Cheaper storage makes the public cloud suitable for maintaining
 backup applications.
 Migration scenarios
 Hybrid cloud architecture enables the migration of
 applications between different clouds.
 Provider availability or implementation details
 Business changes can affect provider availability.
 Likewise, changes in a provider's service can disrupt
 a hybrid cloud environment or increase costs.
 Provider API changes
 Consumers of external clouds rarely have control over provider
 changes to APIs, and changes can break compatibility.
 Using only the most common and basic APIs can minimize potential conflicts.
 Image portability
  As of the Kilo release, there is no common image format that is
  usable by all clouds. Conversion or recreation of images is necessary
  if migrating between clouds. To simplify deployment, use the smallest
  and simplest images feasible, install only what is necessary, and
  use a deployment manager such as Chef or Puppet. Do not use golden
  images to speed up the process unless you repeatedly deploy the same
  images on the same cloud.
 API differences
  Avoid using a hybrid cloud deployment with more than just
  OpenStack (or with different versions of OpenStack) as API changes
  can cause compatibility issues.
 Business or technical diversity
 Organizations leveraging cloud-based services can embrace business
 diversity and utilize a hybrid cloud design to spread their
 workloads across multiple cloud providers. This ensures that
 no single cloud provider is the sole host for an application.
--- a/doc/arch-design-draft/source/customer-requirements-usage-considerations.rst
+++ b/doc/arch-design-draft/source/customer-requirements-usage-considerations.rst
@ -0,0 +1,194 @@
 ====================
 Usage considerations
 ====================
 Application readiness
 ~~~~~~~~~~~~~~~~~~~~~
 Some applications are tolerant of a lack of synchronized object
 storage, while others may need those objects to be replicated and
 available across regions. Understanding how the cloud implementation
 impacts new and existing applications is important for risk mitigation,
 and the overall success of a cloud project. Applications may have to be
 written or rewritten for an infrastructure with little to no redundancy,
 or with the cloud in mind.
 Application momentum
 Businesses with existing applications may find that it is
 more cost effective to integrate applications on multiple
 cloud platforms than migrating them to a single platform.
 No predefined usage model
 The lack of a pre-defined usage model enables the user to run a wide
 variety of applications without having to know the application
 requirements in advance. This provides a degree of independence and
 flexibility that no other cloud scenarios are able to provide.
 On-demand and self-service application
 By definition, a cloud provides end users with the ability to
 self-provision computing power, storage, networks, and software in a
 simple and flexible way. The user must be able to scale their
 resources up to a substantial level without disrupting the
 underlying host operations. One of the benefits of using a general
 purpose cloud architecture is the ability to start with limited
 resources and increase them over time as the user demand grows.
 Cloud type
 ~~~~~~~~~~
 Public cloud
 For a company interested in building a commercial public cloud
 offering based on OpenStack, the general purpose architecture model
 might be the best choice. Designers are not always going to know the
 purposes or workloads for which the end users will use the cloud.
 Internal consumption (private) cloud
 Organizations need to determine if it is logical to create their own
 clouds internally. Using a private cloud, organizations are able to
 maintain complete control over architectural and cloud components.
 Hybrid cloud
 Users may want to combine using the internal cloud with access
 to an external cloud. If that case is likely, it might be worth
 exploring the possibility of taking a multi-cloud approach with
 regard to at least some of the architectural elements.
 Tools
 ~~~~~
 Complex clouds, in particular hybrid clouds, may require tools to
 facilitate working across multiple clouds.
 Broker between clouds
 Brokering software evaluates relative costs between different
 cloud platforms. Cloud Management Platforms (CMP)
 allow the designer to determine the right location for the
 workload based on predetermined criteria.
 Facilitate orchestration across the clouds
 CMPs simplify the migration of application workloads between
 public, private, and hybrid cloud platforms.
 We recommend using cloud orchestration tools for managing a diverse
 portfolio of systems and applications across multiple cloud platforms.
 Workload considerations
 ~~~~~~~~~~~~~~~~~~~~~~~
 A workload can be a single application or a suite of applications
 that work together. It can also be a duplicate set of applications that
 need to run on multiple cloud environments.
 In a hybrid cloud deployment, the same workload often needs to function
 equally well on radically different public and private cloud environments.
 The architecture needs to address these potential conflicts,
 complexity, and platform incompatibilities.
 Federated hypervisor and instance management
 Adding self-service, charge back, and transparent delivery of
 the resources from a federated pool can be cost effective.
 In a hybrid cloud environment, this is a particularly important
 consideration. Look for a cloud that provides cross-platform
 hypervisor support and robust instance management tools.
 Application portfolio integration
 An enterprise cloud delivers efficient application portfolio
 management and deployments by leveraging self-service features
 and rules according to use.
 Integrating existing cloud environments is a common driver
 when building hybrid cloud architectures.
 Capacity planning
 ~~~~~~~~~~~~~~~~~
 Capacity and the placement of workloads are key design considerations
 for clouds. One of the primary reasons many organizations use a hybrid cloud
 is to increase capacity without making large capital investments.
 The long-term capacity plan for these designs must
 incorporate growth over time to prevent permanent consumption of more
 expensive external clouds. To avoid this scenario, account for future
 applications' capacity requirements and plan growth appropriately.
 It is difficult to predict the amount of load a particular
 application might incur if the number of users fluctuates, or the
 application experiences an unexpected increase in use.
 It is possible to define application requirements in terms of
 vCPU, RAM, bandwidth, or other resources and plan appropriately.
 However, other clouds might not use the same meter or even the same
 oversubscription rates.
 Oversubscription is a method to emulate more capacity than
 may physically be present. For example, a physical hypervisor node with 32 GB
 RAM may host 24 instances, each provisioned with 2 GB RAM.
 As long as all 24 instances do not concurrently use 2 full
 gigabytes, this arrangement works well.
 However, some hosts take oversubscription to extremes and,
 as a result, performance can be inconsistent.
 If at all possible, determine what the oversubscription rates
 of each host are and plan capacity accordingly.
 Utilization
 ~~~~~~~~~~~
 A CMP must be aware of what workloads are running, where they are
 running, and their preferred utilizations.
 For example, in most cases it is desirable to run as many workloads
 internally as possible, utilizing other resources only when necessary.
 On the other hand, situations exist in which the opposite is true,
 such as when an internal cloud is only for development and stressing
 it is undesirable. A cost model of various scenarios and
 consideration of internal priorities helps with this decision.
 To improve efficiency, automate these decisions when possible.
 The Telemetry service (ceilometer) provides information on the usage
 of various OpenStack components. Note the following:
 * If Telemetry must retain a large amount of data, for
  example when monitoring a large or active cloud, we recommend
  using a NoSQL back end such as MongoDB.
 * You must monitor connections to non-OpenStack clouds
  and report this information to the CMP.
 Authentication
 ~~~~~~~~~~~~~~
 It is recommended to have a single authentication domain rather than a
 separate implementation for each and every site. This requires an
 authentication mechanism that is highly available and distributed to
 ensure continuous operation. Authentication server locality might be
 required and should be planned for.
 Storage
 ~~~~~~~
 OpenStack compatibility
 Interoperability and integration with OpenStack can be paramount in
 deciding on a storage hardware and storage management platform.
 Interoperability and integration includes factors such as OpenStack
 Block Storage interoperability, OpenStack Object Storage
 compatibility, and hypervisor compatibility (which affects the
 ability to use storage for ephemeral instance storage).
 Storage management
 You must address a range of storage management-related
 considerations in the design of a storage-focused OpenStack cloud.
 These considerations include, but are not limited to, backup
 strategy (and restore strategy, since a backup that cannot be
 restored is useless), data valuation-hierarchical storage
 management, retention strategy, data placement, and workflow
 automation.
 Data grids
 Data grids are helpful when answering questions around data
 valuation. Data grids improve decision making through correlation of
 access patterns, ownership, and business-unit revenue with other
 metadata values to deliver actionable information about data.
--- a/doc/arch-design-draft/source/customer-requirements.rst
+++ b/doc/arch-design-draft/source/customer-requirements.rst
@ -0,0 +1,14 @@
 =====================
 Customer requirements
 =====================
 A customer's business requirements impact cloud design. These requirements
 can be broken down into three general areas: business considerations,
 usage considerations, and performance considerations.
 .. toctree::
   :maxdepth: 2
   customer-requirements-business-considerations.rst
   customer-requirements-usage-considerations.rst
   customer-requirements-performance-considerations.rst
--- a/doc/arch-design-draft/source/high-availability.rst
+++ b/doc/arch-design-draft/source/high-availability.rst
@ -1,3 +1,5 @@
 .. _high-availability:
 =================
 High availability
 =================
@ -186,4 +188,3 @@ for applications to perform well.
   When running embedded object store methods, ensure that you do not
   instigate extra data replication as this may cause performance issues.
--- a/doc/arch-design-draft/source/index.rst
+++ b/doc/arch-design-draft/source/index.rst
@ -26,7 +26,7 @@ Contents
   introduction.rst
   identifying-stakeholders.rst
   functional-requirements.rst
-   user-requirements.rst
+   customer-requirements.rst
   operator-requirements.rst
   capacity-planning-scaling.rst
   high-availability.rst
@ -40,4 +40,3 @@ Search in this guide
 ~~~~~~~~~~~~~~~~~~~~
 * :ref:`search`
--- a/doc/arch-design-draft/source/introduction.rst
+++ b/doc/arch-design-draft/source/introduction.rst
@ -37,16 +37,16 @@ developing cloud architecture design documents. The sections covered are:
 *  :doc:`Functional requirements <functional-requirements>`: Information for
   SMEs on deployment methods and how they will affect deployment cost.
-*  :doc:`User requirements<user-requirements>`: Information for SMEs on
+*  :doc:`Customer requirements <customer-requirements>`: Information for SMEs
-   business and technical requirements.
+   on business and technical requirements.
 *  :doc:`Operator requirements <operator-requirements>`: Information on
   :term:`Service Level Agreement (SLA)` considerations, selecting the right
   hardware for servers and switches, and integration with external
   :term:`identity provider`.
-*  :doc:`Capacity planning and scaling<capacity-planning-scaling>`: Information
+*  :doc:`Capacity planning and scaling <capacity-planning-scaling>`:
-   on storage and networking.
+   Information on storage and networking.
 *  :doc:`High Availability <high-availability>`: Separation of data plane and
   control plane, and how to eliminate single points of failure.
--- a/doc/arch-design-draft/source/legal-requirements.rst
+++ b/doc/arch-design-draft/source/legal-requirements.rst
@ -1,3 +1,5 @@
 .. _legal-requirements:
 ==================
 Legal requirements
 ==================
--- a/doc/arch-design-draft/source/security-requirements.rst
+++ b/doc/arch-design-draft/source/security-requirements.rst
@ -1,3 +1,5 @@
 .. _security-requirements:
 =====================
 Security requirements
 =====================
--- a/doc/arch-design-draft/source/user-requirements.rst
+++ b/doc/arch-design-draft/source/user-requirements.rst
@ -1,9 +0,0 @@
 =================
 User requirements
 =================
 .. toctree::
   :maxdepth: 2