diff --git a/doc/source/admin_guide.rst b/doc/source/admin_guide.rst index 91ee2d00c3..e5ea68f9fb 100644 --- a/doc/source/admin_guide.rst +++ b/doc/source/admin_guide.rst @@ -496,133 +496,65 @@ When you specify a policy the containers created also include the policy index, thus even when running a container_only report, you will need to specify the policy not using the default. ------------------------------------ -Geographically Distributed Clusters ------------------------------------ +----------------------------------------------- +Geographically Distributed Swift Considerations +----------------------------------------------- -Swift's default configuration is currently designed to work in a -single region, where a region is defined as a group of machines with -high-bandwidth, low-latency links between them. However, configuration -options exist that make running a performant multi-region Swift -cluster possible. +Swift provides two features that may be used to distribute replicas of objects +across multiple geographically distributed data-centers: with +:doc:`overview_global_cluster` object replicas may be dispersed across devices +from different data-centers by using `regions` in ring device descriptors; with +:doc:`overview_container_sync` objects may be copied between independent Swift +clusters in each data-center. The operation and configuration of each are +described in their respective documentation. The following points should be +considered when selecting the feature that is most appropriate for a particular +use case: -For the rest of this section, we will assume a two-region Swift -cluster: region 1 in San Francisco (SF), and region 2 in New York -(NY). Each region shall contain within it 3 zones, numbered 1, 2, and -3, for a total of 6 zones. + #. Global Clusters allows the distribution of object replicas across + data-centers to be controlled by the cluster operator on per-policy basis, + since the distribution is determined by the assignment of devices from + each data-center in each policy's ring file. With Container Sync the end + user controls the distribution of objects across clusters on a + per-container basis. -~~~~~~~~~~~~~ -read_affinity -~~~~~~~~~~~~~ + #. Global Clusters requires an operator to coordinate ring deployments across + multiple data-centers. Container Sync allows for independent management of + separate Swift clusters in each data-center, and for existing Swift + clusters to be used as peers in Container Sync relationships without + deploying new policies/rings. -This setting, combined with sorting_method setting, makes the proxy server prefer local backend servers for -GET and HEAD requests over non-local ones. For example, it is -preferable for an SF proxy server to service object GET requests -by talking to SF object servers, as the client will receive lower -latency and higher throughput. + #. Global Clusters seamlessly supports features that may rely on + cross-container operations such as large objects and versioned writes. + Container Sync requires the end user to ensure that all required + containers are sync'd for these features to work in all data-centers. -By default, Swift randomly chooses one of the three replicas to give -to the client, thereby spreading the load evenly. In the case of a -geographically-distributed cluster, the administrator is likely to -prioritize keeping traffic local over even distribution of results. -This is where the read_affinity setting comes in. + #. Global Clusters makes objects available for GET or HEAD requests in both + data-centers even if a replica of the object has not yet been + asynchronously migrated between data-centers, by forwarding requests + between data-centers. Container Sync is unable to serve requests for an + object in a particular data-center until the asynchronous sync process has + copied the object to that data-center. -Example:: + #. Global Clusters may require less storage capacity than Container Sync to + achieve equivalent durability of objects in each data-center. Global + Clusters can restore replicas that are lost or corrupted in one + data-center using replicas from other data-centers. Container Sync + requires each data-center to independently manage the durability of + objects, which may result in each data-center storing more replicas than + with Global Clusters. - [app:proxy-server] - sorting_method = affinity - read_affinity = r1=100 - -This will make the proxy attempt to service GET and HEAD requests from -backends in region 1 before contacting any backends in region 2. -However, if no region 1 backends are available (due to replica -placement, failed hardware, or other reasons), then the proxy will -fall back to backend servers in other regions. - -Example:: - - [app:proxy-server] - sorting_method = affinity - read_affinity = r1z1=100, r1=200 - -This will make the proxy attempt to service GET and HEAD requests from -backends in region 1 zone 1, then backends in region 1, then any other -backends. If a proxy is physically close to a particular zone or -zones, this can provide bandwidth savings. For example, if a zone -corresponds to servers in a particular rack, and the proxy server is -in that same rack, then setting read_affinity to prefer reads from -within the rack will result in less traffic between the top-of-rack -switches. - -The read_affinity setting may contain any number of region/zone -specifiers; the priority number (after the equals sign) determines the -ordering in which backend servers will be contacted. A lower number -means higher priority. - -Note that read_affinity only affects the ordering of primary nodes -(see ring docs for definition of primary node), not the ordering of -handoff nodes. - -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -write_affinity and write_affinity_node_count -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -This setting makes the proxy server prefer local backend servers for -object PUT requests over non-local ones. For example, it may be -preferable for an SF proxy server to service object PUT requests -by talking to SF object servers, as the client will receive lower -latency and higher throughput. However, if this setting is used, note -that a NY proxy server handling a GET request for an object that was -PUT using write affinity may have to fetch it across the WAN link, as -the object won't immediately have any replicas in NY. However, -replication will move the object's replicas to their proper homes in -both SF and NY. - -Note that only object PUT requests are affected by the write_affinity -setting; POST, GET, HEAD, DELETE, OPTIONS, and account/container PUT -requests are not affected. - -This setting lets you trade data distribution for throughput. If -write_affinity is enabled, then object replicas will initially be -stored all within a particular region or zone, thereby decreasing the -quality of the data distribution, but the replicas will be distributed -over fast WAN links, giving higher throughput to clients. Note that -the replicators will eventually move objects to their proper, -well-distributed homes. - -The write_affinity setting is useful only when you don't typically -read objects immediately after writing them. For example, consider a -workload of mainly backups: if you have a bunch of machines in NY that -periodically write backups to Swift, then odds are that you don't then -immediately read those backups in SF. If your workload doesn't look -like that, then you probably shouldn't use write_affinity. - -The write_affinity_node_count setting is only useful in conjunction -with write_affinity; it governs how many local object servers will be -tried before falling back to non-local ones. - -Example:: - - [app:proxy-server] - write_affinity = r1 - write_affinity_node_count = 2 * replicas - -Assuming 3 replicas, this configuration will make object PUTs try -storing the object's replicas on up to 6 disks ("2 * replicas") in -region 1 ("r1"). Proxy server tries to find 3 devices for storing the -object. While a device is unavailable, it queries the ring for the 4th -device and so on until 6th device. If the 6th disk is still unavailable, -the last replica will be sent to other region. It doesn't mean there'll -have 6 replicas in region 1. - - -You should be aware that, if you have data coming into SF faster than -your replicators are transferring it to NY, then your cluster's data distribution -will get worse and worse over time as objects pile up in SF. If this -happens, it is recommended to disable write_affinity and simply let -object PUTs traverse the WAN link, as that will naturally limit the -object growth rate to what your WAN link can handle. + #. Global Clusters execute all account/container metadata updates + synchronously to account/container replicas in all data-centers, which may + incur delays when making updates across WANs. Container Sync only copies + objects between data-centers and all Swift internal traffic is + confined to each data-center. + #. Global Clusters does not yet guarantee the availability of objects stored + in Erasure Coded policies when one data-center is offline. With Container + Sync the availability of objects in each data-center is independent of the + state of other data-centers once objects have been synced. Container Sync + also allows objects to be stored using different policy types in different + data-centers. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Checking handoff partition distribution diff --git a/doc/source/index.rst b/doc/source/index.rst index 4784d91337..dbe54e8a41 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -52,6 +52,7 @@ Overview and Concepts ratelimit overview_large_objects overview_object_versioning + overview_global_cluster overview_container_sync overview_expiring_objects cors diff --git a/doc/source/overview_global_cluster.rst b/doc/source/overview_global_cluster.rst new file mode 100644 index 0000000000..4a7e13b48c --- /dev/null +++ b/doc/source/overview_global_cluster.rst @@ -0,0 +1,133 @@ +=============== +Global Clusters +=============== + +-------- +Overview +-------- + +Swift's default configuration is currently designed to work in a +single region, where a region is defined as a group of machines with +high-bandwidth, low-latency links between them. However, configuration +options exist that make running a performant multi-region Swift +cluster possible. + +For the rest of this section, we will assume a two-region Swift +cluster: region 1 in San Francisco (SF), and region 2 in New York +(NY). Each region shall contain within it 3 zones, numbered 1, 2, and +3, for a total of 6 zones. + +--------------------------- +Configuring Global Clusters +--------------------------- +~~~~~~~~~~~~~ +read_affinity +~~~~~~~~~~~~~ + +This setting, combined with sorting_method setting, makes the proxy +server prefer local backend servers for GET and HEAD requests over +non-local ones. For example, it is preferable for an SF proxy server +to service object GET requests by talking to SF object servers, as the +client will receive lower latency and higher throughput. + +By default, Swift randomly chooses one of the three replicas to give +to the client, thereby spreading the load evenly. In the case of a +geographically-distributed cluster, the administrator is likely to +prioritize keeping traffic local over even distribution of results. +This is where the read_affinity setting comes in. + +Example:: + + [app:proxy-server] + sorting_method = affinity + read_affinity = r1=100 + +This will make the proxy attempt to service GET and HEAD requests from +backends in region 1 before contacting any backends in region 2. +However, if no region 1 backends are available (due to replica +placement, failed hardware, or other reasons), then the proxy will +fall back to backend servers in other regions. + +Example:: + + [app:proxy-server] + sorting_method = affinity + read_affinity = r1z1=100, r1=200 + +This will make the proxy attempt to service GET and HEAD requests from +backends in region 1 zone 1, then backends in region 1, then any other +backends. If a proxy is physically close to a particular zone or +zones, this can provide bandwidth savings. For example, if a zone +corresponds to servers in a particular rack, and the proxy server is +in that same rack, then setting read_affinity to prefer reads from +within the rack will result in less traffic between the top-of-rack +switches. + +The read_affinity setting may contain any number of region/zone +specifiers; the priority number (after the equals sign) determines the +ordering in which backend servers will be contacted. A lower number +means higher priority. + +Note that read_affinity only affects the ordering of primary nodes +(see ring docs for definition of primary node), not the ordering of +handoff nodes. + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +write_affinity and write_affinity_node_count +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This setting makes the proxy server prefer local backend servers for +object PUT requests over non-local ones. For example, it may be +preferable for an SF proxy server to service object PUT requests +by talking to SF object servers, as the client will receive lower +latency and higher throughput. However, if this setting is used, note +that a NY proxy server handling a GET request for an object that was +PUT using write affinity may have to fetch it across the WAN link, as +the object won't immediately have any replicas in NY. However, +replication will move the object's replicas to their proper homes in +both SF and NY. + +Note that only object PUT requests are affected by the write_affinity +setting; POST, GET, HEAD, DELETE, OPTIONS, and account/container PUT +requests are not affected. + +This setting lets you trade data distribution for throughput. If +write_affinity is enabled, then object replicas will initially be +stored all within a particular region or zone, thereby decreasing the +quality of the data distribution, but the replicas will be distributed +over fast WAN links, giving higher throughput to clients. Note that +the replicators will eventually move objects to their proper, +well-distributed homes. + +The write_affinity setting is useful only when you don't typically +read objects immediately after writing them. For example, consider a +workload of mainly backups: if you have a bunch of machines in NY that +periodically write backups to Swift, then odds are that you don't then +immediately read those backups in SF. If your workload doesn't look +like that, then you probably shouldn't use write_affinity. + +The write_affinity_node_count setting is only useful in conjunction +with write_affinity; it governs how many local object servers will be +tried before falling back to non-local ones. + +Example:: + + [app:proxy-server] + write_affinity = r1 + write_affinity_node_count = 2 * replicas + +Assuming 3 replicas, this configuration will make object PUTs try +storing the object's replicas on up to 6 disks ("2 * replicas") in +region 1 ("r1"). Proxy server tries to find 3 devices for storing the +object. While a device is unavailable, it queries the ring for the 4th +device and so on until 6th device. If the 6th disk is still unavailable, +the last replica will be sent to other region. It doesn't mean there'll +have 6 replicas in region 1. + + +You should be aware that, if you have data coming into SF faster than +your replicators are transferring it to NY, then your cluster's data +distribution will get worse and worse over time as objects pile up in SF. +If this happens, it is recommended to disable write_affinity and simply let +object PUTs traverse the WAN link, as that will naturally limit the +object growth rate to what your WAN link can handle.