Add doc entry to check partition count

An high or increasing partition count due to storing handoffs can have
some severe side-effects, and replication might never be able to catch
up. This patch adds a note to the admin_guide how to check this.

Change-Id: Ib4e161d68f1a82236dbf5fac13ef9a13ac4bbf18
This commit is contained in:
Christian Schwede 2016-06-09 06:17:22 +00:00
parent 11c5ef7d22
commit 699953508a

View File

@ -617,13 +617,90 @@ have 6 replicas in region 1.
You should be aware that, if you have data coming into SF faster than You should be aware that, if you have data coming into SF faster than
your link to NY can transfer it, then your cluster's data distribution your replicators are transferring it to NY, then your cluster's data distribution
will get worse and worse over time as objects pile up in SF. If this will get worse and worse over time as objects pile up in SF. If this
happens, it is recommended to disable write_affinity and simply let happens, it is recommended to disable write_affinity and simply let
object PUTs traverse the WAN link, as that will naturally limit the object PUTs traverse the WAN link, as that will naturally limit the
object growth rate to what your WAN link can handle. object growth rate to what your WAN link can handle.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Checking handoff partition distribution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can check if handoff partitions are piling up on a server by
comparing the expected number of partitions with the actual number on
your disks. First get the number of partitions that are currently
assigned to a server using the ``dispersion`` command from
``swift-ring-builder``::
swift-ring-builder sample.builder dispersion --verbose
Dispersion is 0.000000, Balance is 0.000000, Overload is 0.00%
Required overload is 0.000000%
--------------------------------------------------------------------------
Tier Parts % Max 0 1 2 3
--------------------------------------------------------------------------
r1 8192 0.00 2 0 0 8192 0
r1z1 4096 0.00 1 4096 4096 0 0
r1z1-172.16.10.1 4096 0.00 1 4096 4096 0 0
r1z1-172.16.10.1/sda1 4096 0.00 1 4096 4096 0 0
r1z2 4096 0.00 1 4096 4096 0 0
r1z2-172.16.10.2 4096 0.00 1 4096 4096 0 0
r1z2-172.16.10.2/sda1 4096 0.00 1 4096 4096 0 0
r1z3 4096 0.00 1 4096 4096 0 0
r1z3-172.16.10.3 4096 0.00 1 4096 4096 0 0
r1z3-172.16.10.3/sda1 4096 0.00 1 4096 4096 0 0
r1z4 4096 0.00 1 4096 4096 0 0
r1z4-172.16.20.4 4096 0.00 1 4096 4096 0 0
r1z4-172.16.20.4/sda1 4096 0.00 1 4096 4096 0 0
r2 8192 0.00 2 0 8192 0 0
r2z1 4096 0.00 1 4096 4096 0 0
r2z1-172.16.20.1 4096 0.00 1 4096 4096 0 0
r2z1-172.16.20.1/sda1 4096 0.00 1 4096 4096 0 0
r2z2 4096 0.00 1 4096 4096 0 0
r2z2-172.16.20.2 4096 0.00 1 4096 4096 0 0
r2z2-172.16.20.2/sda1 4096 0.00 1 4096 4096 0 0
As you can see from the output, each server should store 4096 partitions, and
each region should store 8192 partitions. This example used a partition power
of 13 and 3 replicas.
With write_affinity enabled it is expected to have a higher number of
partitions on disk compared to the value reported by the
swift-ring-builder dispersion command. The number of additional (handoff)
partitions in region r1 depends on your cluster size, the amount
of incoming data as well as the replication speed.
Let's use the example from above with 6 nodes in 2 regions, and write_affinity
configured to write to region r1 first. `swift-ring-builder` reported that
each node should store 4096 partitions::
Expected partitions for region r2: 8192
Handoffs stored across 4 nodes in region r1: 8192 / 4 = 2048
Maximum number of partitions on each server in region r1: 2048 + 4096 = 6144
Worst case is that handoff partitions in region 1 are populated with new
object replicas faster than replication is able to move them to region 2.
In that case you will see ~ 6144 partitions per
server in region r1. Your actual number should be lower and
between 4096 and 6144 partitions (preferably on the lower side).
Now count the number of object partitions on a given server in region 1,
for example on 172.16.10.1. Note that the pathnames might be
different; `/srv/node/` is the default mount location, and `objects`
applies only to storage policy 0 (storage policy 1 would use
`objects-1` and so on)::
find -L /srv/node/ -maxdepth 3 -type d -wholename "*objects/*" | wc -l
If this number is always on the upper end of the expected partition
number range (4096 to 6144) or increasing you should check your
replication speed and maybe even disable write_affinity.
Please refer to the next section how to collect metrics from Swift, and
especially :ref:`swift-recon -r <recon-replication>` how to check replication
stats.
-------------------------------- --------------------------------
Cluster Telemetry and Monitoring Cluster Telemetry and Monitoring
-------------------------------- --------------------------------
@ -748,6 +825,8 @@ This information can also be queried via the swift-recon command line utility::
Time to wait for a response from a server Time to wait for a response from a server
--swiftdir=SWIFTDIR Default = /etc/swift --swiftdir=SWIFTDIR Default = /etc/swift
.. _recon-replication:
For example, to obtain container replication info from all hosts in zone "3":: For example, to obtain container replication info from all hosts in zone "3"::
fhines@ubuntu:~$ swift-recon container -r --zone 3 fhines@ubuntu:~$ swift-recon container -r --zone 3