Add doc entry to check partition count

An high or increasing partition count due to storing handoffs can have some severe side-effects, and replication might never be able to catch up. This patch adds a note to the admin_guide how to check this. Change-Id: Ib4e161d68f1a82236dbf5fac13ef9a13ac4bbf18
2016-06-09 06:17:22 +00:00 · 2016-06-09 06:17:22 +00:00 · 699953508a
commit 699953508a
parent 11c5ef7d22
1 changed files with 80 additions and 1 deletions
--- a/doc/source/admin_guide.rst
+++ b/doc/source/admin_guide.rst
@ -617,13 +617,90 @@ have 6 replicas in region 1.
 You should be aware that, if you have data coming into SF faster than
-your link to NY can transfer it, then your cluster's data distribution
+your replicators are transferring it to NY, then your cluster's data distribution
 will get worse and worse over time as objects pile up in SF. If this
 happens, it is recommended to disable write_affinity and simply let
 object PUTs traverse the WAN link, as that will naturally limit the
 object growth rate to what your WAN link can handle.
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Checking handoff partition distribution
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 You can check if handoff partitions are piling up on a server by
 comparing the expected number of partitions with the actual number on
 your disks. First get the number of partitions that are currently
 assigned to a server using the ``dispersion`` command from
 ``swift-ring-builder``::
    swift-ring-builder sample.builder dispersion --verbose
    Dispersion is 0.000000, Balance is 0.000000, Overload is 0.00%
    Required overload is 0.000000%
    --------------------------------------------------------------------------
    Tier                           Parts      %    Max     0     1     2     3
    --------------------------------------------------------------------------
    r1                              8192   0.00      2     0     0  8192     0
    r1z1                            4096   0.00      1  4096  4096     0     0
    r1z1-172.16.10.1                4096   0.00      1  4096  4096     0     0
    r1z1-172.16.10.1/sda1           4096   0.00      1  4096  4096     0     0
    r1z2                            4096   0.00      1  4096  4096     0     0
    r1z2-172.16.10.2                4096   0.00      1  4096  4096     0     0
    r1z2-172.16.10.2/sda1           4096   0.00      1  4096  4096     0     0
    r1z3                            4096   0.00      1  4096  4096     0     0
    r1z3-172.16.10.3                4096   0.00      1  4096  4096     0     0
    r1z3-172.16.10.3/sda1           4096   0.00      1  4096  4096     0     0
    r1z4                            4096   0.00      1  4096  4096     0     0
    r1z4-172.16.20.4                4096   0.00      1  4096  4096     0     0
    r1z4-172.16.20.4/sda1           4096   0.00      1  4096  4096     0     0
    r2                              8192   0.00      2     0  8192     0     0
    r2z1                            4096   0.00      1  4096  4096     0     0
    r2z1-172.16.20.1                4096   0.00      1  4096  4096     0     0
    r2z1-172.16.20.1/sda1           4096   0.00      1  4096  4096     0     0
    r2z2                            4096   0.00      1  4096  4096     0     0
    r2z2-172.16.20.2                4096   0.00      1  4096  4096     0     0
    r2z2-172.16.20.2/sda1           4096   0.00      1  4096  4096     0     0
 As you can see from the output, each server should store 4096 partitions, and
 each region should store 8192 partitions. This example used a partition power
 of 13 and 3 replicas.
 With write_affinity enabled it is expected to have a higher number of
 partitions on disk compared to the value reported by the
 swift-ring-builder dispersion command. The number of additional (handoff)
 partitions in region r1 depends on your cluster size, the amount
 of incoming data as well as the replication speed.
 Let's use the example from above with 6 nodes in 2 regions, and write_affinity
 configured to write to region r1 first. `swift-ring-builder` reported that
 each node should store 4096 partitions::
 Expected partitions for region r2:                                      8192
 Handoffs stored across 4 nodes in region r1:                 8192 / 4 = 2048
 Maximum number of partitions on each server in region r1: 2048 + 4096 = 6144
 Worst case is that handoff partitions in region 1 are populated with new
 object replicas faster than replication is able to move them to region 2.
 In that case you will see ~ 6144 partitions per
 server in region r1. Your actual number should be lower and
 between 4096 and 6144 partitions (preferably on the lower side).
 Now count the number of object partitions on a given server in region 1,
 for example on 172.16.10.1.  Note that the pathnames might be
 different; `/srv/node/` is the default mount location, and `objects`
 applies only to storage policy 0 (storage policy 1 would use
 `objects-1` and so on)::
    find -L /srv/node/ -maxdepth 3 -type d -wholename "*objects/*" | wc -l
 If this number is always on the upper end of the expected partition
 number range (4096 to 6144) or increasing you should check your
 replication speed and maybe even disable write_affinity.
 Please refer to the next section how to collect metrics from Swift, and
 especially :ref:`swift-recon -r <recon-replication>` how to check replication
 stats.
 --------------------------------
 Cluster Telemetry and Monitoring
 --------------------------------
@ -748,6 +825,8 @@ This information can also be queried via the swift-recon command line utility::
                            Time to wait for a response from a server
      --swiftdir=SWIFTDIR   Default = /etc/swift
 .. _recon-replication:
 For example, to obtain container replication info from all hosts in zone "3"::
    fhines@ubuntu:~$ swift-recon container -r --zone 3