Commit Graph

151 Commits

Author SHA1 Message Date
06cf5d298f Add databases_per_second to db daemons
Most daemons have a "go as fast as you can then sleep for 30 seconds"
strategy towards resource utilization; the object-updater and
object-auditor however have some "X_per_second" options that allow
operators much better control over how they spend their I/O budget.

This change extends that pattern into the account-replicator,
container-replicator, and container-sharder which have been known to peg
CPUs when they're not IO limited.

Partial-Bug: #1784753
Change-Id: Ib7f2497794fa2f384a1a6ab500b657c624426384
2018-10-30 22:28:05 +00:00
904e7c97f1 Add more doc and test for cors_expose_headers option
In follow-up to the related change, mention the new
cors_expose_headers option (and other proxy-server.conf
options) in the CORS doc.

Add a test for the cors options being loaded into the
proxy server.

Improve CORS comments in docs.

Change-Id: I647d8f9e9cbd98de05443638628414b1e87d1a76
Related-Change: I5ca90a052f27c98a514a96ee2299bfa1b6d46334
2018-09-17 12:35:25 -07:00
cfeb32c66b Adding keep_idle config value to socket
User can cofigure KEEPIDLE time for sockets in TCP connection.
The default value is the old value which is 600.

Change-Id: Ib7fb166deb8a87ae4e97ba0671048b1ec079a2ef
Closes-Bug:1759606
2018-09-15 01:30:53 +02:00
d5c532a94e object-updater: add concurrent updates
The object updater now supports two configuration settings:
"concurrency" and "updater_workers". The latter controls how many
worker processes are spawned, while the former controls how many
concurrent container updates are performed by each worker
process. This should speed the processing of async_pendings.

There is a change to the semantics of the configuration
options. Previously, "concurrency" controlled the number of worker
processes spawned, and "updater_workers" did not exist. I switched the
meanings for consistency with other configuration options. In the
object reconstructor, object replicator, object server, object
expirer, container replicator, container server, account replicator,
account server, and account reaper, "concurrency" refers to the number
of concurrent tasks performed within one process (for reference, the
container updater and object auditor use "concurrency" to mean number
of processes).

On upgrade, a node configured with concurrency=N will still handle
async updates N-at-a-time, but will do so using only one process
instead of N.

UpgradeImpact:

If you have a config file like this:

    [object-updater]
    concurrency = <N>

and you want to take advantage of faster updates, then do this:

    [object-updater]
    concurrency = 8  # the default; you can omit this line
    updater_workers = <N>

If you want updates to be processed exactly as before, do this:

    [object-updater]
    concurrency = 1
    updater_workers = <N>

Change-Id: I17e18088e61f664e1b9942d66423666d0cae1689
2018-06-13 17:39:34 -07:00
36dbd38e48 Add s3api headers to allowed_headers by default
Previously, these headers had to be added by operators to their
object-server.conf when enabling swift3 middleware. Since s3api
is now imported into swift we should go ahead and add these headers
by default too.

Change-Id: Ib82e175096716e42aecdab48f01f079e09da6a1d
Signed-off-by: Thiago da Silva <thiago@redhat.com>
2018-05-29 16:02:50 -04:00
c28004deb0 Multiprocess object replicator
Add a multiprocess mode to the object replicator. Setting the
"replicator_workers" setting to a positive value N will result in the
replicator using up to N worker processes to perform replication
tasks.

At most one worker per disk will be spawned, so one can set
replicator_workers=99999999 to always get one worker per disk
regardless of the number of disks in each node. This is the same
behavior that the object reconstructor has.

Worker process logs will have a bit of information prepended so
operators can tell which messages came from which worker. It looks
like this:

  [worker 1/2 pid=16529] 154/154 (100.00%) partitions replicated in 1.02s (150.87/sec, 0s remaining)

The prefix is "[worker M/N pid=P] ", where M is the worker's index, N
is the total number of workers, and P is the process ID. Every message
from the replicator's logger will have the prefix; this includes
messages from down in diskfile, but does not include things printed to
stdout or stderr.

Drive-by fix: don't dump recon stats when replicating only certain
policies. When running the object replicator with replicator_workers >
0 and "--policies=X,Y,Z", the replicator would update recon stats
after running. Since it only ran on a subset of objects, it should not
update recon, much like it doesn't update recon when run with
--devices or --partitions.

Change-Id: I6802a9ad9f1f9b9dafb99d8b095af0fdbf174dc5
2018-04-24 04:05:08 +00:00
f64c00b00a Improve object-updater's stats logging
The object updater has five different stats, but its logging only told
you two of them (successes and failures), and it only told you after
finishing all the async_pendings for a device. If you have a cluster
that's been sick and has millions upon millions of async_pendings
laying around, then your object-updaters are frustratingly
silent. I've seen one cluster with around 8 million async_pendings per
disk where the object-updaters only emitted stats every 12 hours.

Yes, if you have StatsD logging set up properly, you can go look at
your graphs and get real-time feedback on what it's doing. If you
don't have that, all you get is a frustrating silence.

Now, the object updater tells you all of its stats (successes,
failures, quarantines due to bad pickles, unlinks, and errors), and it
tells you incremental progress every five minutes. The logging at the
end of a pass remains and has been expanded to also include all stats.

Also included is a small change to what counts as an error: unmounted
drives no longer do. The goal is that only abnormal things count as
errors, like permission problems, malformed filenames, and so
on. These are things that should never happen, but if they do, may
require operator intervention. Drives fail, so logging an error upon
encountering an unmounted drive is not useful.

Change-Id: Idbddd507f0b633d14dffb7a9834fce93a10359ab
2018-01-17 13:59:23 -08:00
fc12d63c76 Remove repeated text from deployment guide
Fix what appears to be a cut and paste error.

Change-Id: Iccf97ebbf75c8f97095a4493ea6a8beb074df099
2017-12-06 10:29:52 -08:00
e199192cae Replace replication_one_per_device by custom count
This commit replaces boolean replication_one_per_device by an integer
replication_concurrency_per_device. The new configuration parameter is
passed to utils.lock_path() which now accept as an argument a limit for
the number of locks that can be acquired for a specific path.

Instead of trying to lock path/.lock, utils.lock_path() now tries to lock
files path/.lock-X, where X is in the range (0, N), N being the limit for
the number of locks allowed for the path. The default value of limit is
set to 1.

Change-Id: I3c3193344c7a57a8a4fc7932d1b10e702efd3572
2017-10-24 16:17:41 +01:00
9a09641a7c Merge "Add cautionary note re delay_reaping in account-server.conf-sample" 2017-09-28 01:19:33 +00:00
93fc9d2de8 Add cautionary note re delay_reaping in account-server.conf-sample
Change-Id: I2c3eea783321338316eecf467d30ba0b3217256c
Related-Bug: #1514528
2017-09-27 22:52:47 +01:00
5c76b9e691 Add concurrent_gets to proxy.conf man page
Change-Id: Iab1beff4899d096936c0e5915f3ec32364b3e517
Closes-Bug: #1559347
2017-09-27 14:11:14 +01:00
1e79f828ad Remove all post_as_copy related code and configes
It was deprecated and we discussed on this topic in Denver PTG
for Queen cycle. Main motivation for this work is that deprecated
post_as_copy option and its gate blocks future symlink work.

Change-Id: I411893db1565864ed5beb6ae75c38b982a574476
2017-09-16 05:50:41 +00:00
c93c0c0c6e [Trivialfix]Fix typos in swift
Fix typos that found in swift.

Change-Id: I52fad1a4882cec4456f22174b46d54e42ec66d97
2017-08-04 07:50:10 +00:00
701a172afa Add multiple worker processes strategy to reconstructor
This change adds a new Strategy concept to the daemon module similar to
how we manage WSGI workers.  We need to leverage multiple python
processes to get the concurrency properties we need.  More workers will
rebalance much faster on dense chassis with many devices.

Currently the default is still only one process, and no workers.  Set
reconstructor_workers in the [object-reconstructor] section to some
whole number <= the number of devices on a node to get that many
reconstructor workers.

Each worker will operate on a different subset of disks.

Once mode works as before, but tends to want to update recon drops a
little bit more.

If you change the rings, the strategy will shutdown workers and spawn
new ones.

You can kill the worker pids and the daemon strategy will respawn them.

New per-disk reconstructor stats are dumped to recon under the
object_reconstruction_per_disk key.  To maintain legacy compatibility
and replication monitoring based on cycle times they are aggregated
every stats_interval (default 5 mins).

Change-Id: I28925a37f3985c9082b5a06e76af4dc3ec813abe
2017-07-26 16:55:10 -07:00
5b10cf530b Add more structure to the deployment guide
Previously it was hard to navigate to a particular config section in
the deployment guide, and not possible to provide a link directly to
one section.

This patch makes each config section a heading so that it appears in
navigation tables and can be easily linked to. A list of config
sections is also added at the start of each server section.

Change-Id: Iecb0637fde521600a9163fa66b3dbdc176a71dff
Related-Bug: #1626290
2017-07-20 17:01:36 +01:00
9c5628b4f1 Add reconstructor section to deployment guide
Change-Id: I062998e813718828b7adf4e7c3f877b6a31633c0
Closes-Bug: #1626290
2017-07-20 11:40:17 +01:00
c3f6e82ae1 Merge "Write-affinity aware object deletion" 2017-07-06 14:00:05 +00:00
f1e1dbb80a Merge "Make eventlet.tpool's thread count configurable in object server" 2017-07-04 11:49:24 +00:00
831eb6e3ce Write-affinity aware object deletion
When deleting objects in multi-region swift delpoyment with write
affinity configured, users always get 404 when deleting object before
it's replcated to approriate nodes.

This patch adds a config item 'write_affinity_handoff_delete_count' so
that operator could define how many local handoff nodes should swift
send request to get more candidates for the final response, or by
default just leave it to swift to calculate the appropriate number.

Change-Id: Ic4ef82e4fc1a91c85bdbc6bf41705a76f16d1341
Closes-Bug: #1503161
2017-06-27 22:42:02 +12:00
d9c4913e3b Make eventlet.tpool's thread count configurable in object server
If you're running servers_per_port > 0 and threads_per_disk = 0 (as it
should be with servers_per_port on), each object-server process will
have 20 IO threads waiting around to service eventlet.tpool
calls. This is far too many; with servers_per_port, there's no real
benefit to having so many IO threads.

This commit makes it so that, when servers_per_port > 0, each object
server defaults to having one main thread and one IO thread.

Also, eventlet's tpool size is now configurable via the object-server
config file. If a tpool size is set, that's what we'll use regardless
of servers_per_port. This allows operators with an excess of threads
to remove some regardless of servers_per_port.

Change-Id: I8f8914b7e70f2510393eb7c5e6be9708631ac027
Closes-Bug: 1554233
2017-06-23 16:16:03 +10:00
2d18ecdf4b Merge "Replace slowdown option with *_per_second option" 2017-06-22 01:18:26 +00:00
a8bc94c7e3 Replace slowdown option with *_per_second option
container and object updaters sleeps "slowdown" (default 0.01) seconds
after every processed container/object. Because time.sleep call adds overhead,
use ratelimit_sleep from common.utils instead. Same as in auditor.

Change-Id: I362aa0f13c78ad03ce1f76ee0257b0646f981212
2017-06-16 19:22:00 +00:00
b9322a2f08 Merge "Add link from policies overview to per-policy proxy-server conf" 2017-05-30 19:13:56 +00:00
227cef9933 Add link from policies overview to per-policy proxy-server conf
- add proxy server per policy config as an optional
  step in the configuration of a policy, with link to
  the deployment guide

- add reverse link from deployment guide per-policy
  config doc section to storage policies docs

Drive-by fix an incorrect test comment

Change-Id: Ib95310193270a63c9d1e321c6e7de240e00b387f
Related-Change: I3f718f425f525baa80045ba067950c752bcaaefc
2017-05-26 10:41:35 +01:00
d487bf7fb1 Remove tempauth docs from deployment guide
Instead, link to the middleware list and auth overview, as well as
referring readers to proxy-server.conf-sample

TempAuth-related content that was previously in the deployment guide has
been moved to TempAuth's own docs, which have been cleaned up a bit.

Change-Id: I00070bb09294362c069f7ee9426ac570bc1b3ddb
2017-05-25 12:35:46 -07:00
45884c1102 Enable per policy proxy config options
This is an alternative approach to that proposed in [1]

Adds support for optional per-policy config sections
to be added in proxy-server.conf. This is highly desirable
to allow per-policy affinity options to be set for use with
duplicated EC policies [2] and composite rings [3].

Certain options found in per-policy conf sections will
override their equivalents that may be set in the
[app:proxy-server] section. Currently the options
handled that way are:

  sorting_method
  read_affinity
  write_affinity
  write_affinity_node_count

For example:

  [proxy-server:policy:0]
  sorting_method = affinity
  read_affinity = r1=100
  write_affinity = r1
  write_affinity_node_count = 1 * replicas

The corresponding attributes of the proxy-server Application
are now available from instances of an OverrideConf object
that is obtained from Application.get_policy_options(policy).

[1] Related-Change: I9104fc789ba85ab3ab5ccd34096125b482821389
[2] Related-Change: Idd155401982a2c48110c30b480966a863f6bd305
[3] Related-Change: I0d8928b55020592f8e75321d1f7678688301d797

Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: I3f718f425f525baa80045ba067950c752bcaaefc
2017-05-23 20:22:30 +01:00
f02ec4de81 Add read and write affinity options to deployment guide
Add entries for these options in the deployment guide and
make the text in proxy-server.conf-sample and man page
consistent.

Change-Id: I5854ddb3e5864ddbeaf9ac2c930bfafdb47517c3
2017-05-18 10:42:44 -07:00
21396bc106 keep consistent naming convention of swift and urls
Change-Id: Iddd4f69abf77a5c643ce8b164fc6cfd72c068229
2017-03-23 02:28:41 +00:00
9b47de3095 Enable cluster-wide CORS Expose-Headers setting
An operator proposing a web UX to its customers might want to allow web
browser to access some headers by default (eg: X-Storage-Policy,
 X-Container-Read, ...). This commit adds a new setting to the
proxy-server to allow some headers to be added cluster-wide to the CORS
header Access-Control-Expose-Headers.

Change-Id: I5ca90a052f27c98a514a96ee2299bfa1b6d46334
2017-02-25 19:00:28 +01:00
63b351893d Merge "Default object_post_as_copy to False" 2017-01-24 20:58:34 +00:00
4ee20dba48 Default object_post_as_copy to False
Additionally, emit deprecation warnings when running POST-as-COPY

Change-Id: I11324e711057f7332577fd38f9bff82bdc6aac90
2017-01-20 12:37:01 -05:00
69f7be99a6 Move documented reclaim_age option to correct location
The reclaim_age is a DiskFile option, it doesn't make sense for two
different object services or nodes to use different values.

I also driveby cleanup the reclaim_age plumbing from get_hashes to
cleanup_ondisk_files since it's a method on the Manager and has access
to the configured reclaim_age.  This fixes a bug where finalize_put
wouldn't use the [DEFAULT]/object-server configured reclaim_age - which
is normally benign but leads to weird behavior on DELETE requests with
really small reclaim_age.

There's a couple of places in the replicator and reconstructor that
reach into their manager to borrow the reclaim_age when emptying out
the aborted PUTs that failed to cleanup their files in tmp - but that
timeout doesn't really need to be coupled with reclaim_age and that
method could have just as reasonably been implemented on the Manager.

UpgradeImpact: Previously the reclaim_age was documented to be
configurable in various object-* services config sections, but that did
not work correctly unless you also configured the option for the
object-server because of REPLICATE request rehash cleanup.  All object
services must use the same reclaim_age.  If you require a non-default
reclaim age it should be set in the [DEFAULT] section.  If there are
different non-default values, the greater should be used for all object
services and configured only in the [DEFAULT] section.

If you specify a reclaim_age value in any object related config you
should move it to *only* the [DEFAULT] section before you upgrade.  If
you configure a reclaim_age less that your consistency window you are
likely to be eaten by a Grue.

Closes-Bug: #1626296

Change-Id: I2b9189941ac29f6e3be69f76ff1c416315270916
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
2017-01-13 03:10:47 +00:00
9b98c89983 Revises 'url' to 'URL' and 'json' to 'JSON'
Change-Id: I44743fbb9bcbce3a50ed6770264ba0f4b17803d7
2016-09-30 22:21:03 +09:00
8bf2233b40 Documantation enhancements of nice/ionice feature
Based on comments from patch #238799.

Change-Id: I9455cf6dc7fd12fee62439ff3c5f3255287ab1be
2016-08-19 07:39:49 +02:00
ed772236c7 Change schedule priority of daemon/server in config
The goal is to modify schedule priority and I/O scheduling class and
priority of daemon/server via configuration.
Setting is optional, default keeps current behaviour.

Use case:
Prioritize object-server to object-auditor, because all user's requests
needed to be served in peak hours and audit could wait.

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
DocImpact
Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396
2016-08-10 23:56:15 +02:00
90627f903a Add region in ring structure & deployment guide
Deployment guide does not talk about the region. Also, it does not
specify that regions and zones need to be ints.

This patch adds brief description about region and changes numbers
to int. Also, adds region in the document that talks about ring data
struture.

Change-Id: I04ce42fb3e5c1f08e7f7ff6be23482cee8bdeb71
Partial-Bug: #1583551
2016-07-12 15:18:56 +00:00
521ec6b9b1 Merge "Add region in swift-ring-builder add" 2016-07-08 23:12:39 +00:00
54ed084234 Add region in swift-ring-builder add
In the swift deployment guide, region is missing from the syntax of
adding a new device to the swift-ring-builder.

This patch adds region in the syntax.

Change-Id: I43e247c92d461efd530c0f82ca3daddcb9e2ba5b
Closes-Bug: #1584127
2016-07-08 15:55:49 +00:00
6f230c7ea0 Fixed inconsistent naming conventions
Fixed naming conventions of Keystone, Swift and proxy servers in
the docs.

Change-Id: I294afd8d7bffa8c1fc299f5812effacb9ad08910
2016-07-07 21:40:21 +00:00
86e9e827ba add explicit HA info to the deployment guide
Change-Id: I7614952c523080fe50eaf839b54a8064439817ce
2016-05-31 11:27:43 -07:00
a403faadd4 Merge "Allow fallocate_reserve to be a percentage" 2016-05-12 08:18:39 +00:00
6a88f27eb0 Merge "Remove threads_per_disk setting" 2016-05-11 01:36:43 +00:00
cf48e75c25 change default ports for servers
Changing the recommended ports for Swift services
from ports 6000-6002 to unused ports 6200-6202;
so they do not conflict with X-Windows or other services.

Updated SAIO docs.

DocImpact
Closes-Bug: #1521339
Change-Id: Ie1c778b159792c8e259e2a54cb86051686ac9d18
2016-04-29 14:47:38 -04:00
9d6a055b31 Remove threads_per_disk setting
This patch removes the threads_per_disk setting. It was already a deprecated
setting and by default set to 0, which effectively meant to not use a per-disk
thread pool at all. Users are encouraged to use servers_per_port instead.

DocImpact

Change-Id: Ie76be5c8a74d60a1330627caace19e06d1b9383c
2016-04-28 12:06:24 -05:00
0da9da5131 Allow fallocate_reserve to be a percentage
Add the ability to set the fallocate_reserve value as a percentage.
This happens automatically when adding the '%' at the end of the value.
Having the ability to set a % of free space rather than a byte value is
useful especially when drive sizes are heterogenous.

The default for fallocate_reserve has been adjusted to 1%, having the
fallocate_reserve set seems sensible for all deploys and percentages are
far safer to default than byte values (across drives of any size).

Tests added for using fallocate_reserve as a percentage.

Duplicate tests for fallocate_reserve have been removed.

Docs updated to reflect the fallocate_reserve change.

Change-Id: I4aea613a708205c917e81d6b2861396655e73238
2016-04-23 08:02:00 -05:00
33f06dc48f Fixed Sphinx errors
doc/source/deployment_guide.rst:1372: ERROR: Malformed table.
swift/obj/diskfile.py:docstring of swift.obj.diskfile.BaseDiskFileManager.yield_hashes:13: ERROR: Unexpected indentation.
doc/source/ops_runbook/diagnose.rst:188: WARNING: Inline emphasis start-string without end-string.

Change-Id: Id20eb62eb5baebb3814e7af5676badb94f17dee5
2016-04-09 18:47:58 +02:00
4be3701805 Merge "Auditor will clean up stale rsync tempfiles" 2016-03-23 22:21:51 +00:00
1d03803a85 Auditor will clean up stale rsync tempfiles
DiskFile already fills in the _ondisk_info attribute when it tries to open
a diskfile - even if the DiskFile's fileset is not valid or deleted.
During this process the rsync tempfiles would be discovered and logged,
but no-one would attempt to clean them up - even if they were really old.

Instead of logging and ignoring unexpected files when validate a DiskFile
fileset we'll add unexpected files to the unexpected key in the
_ondisk_info attribute.

With a little bit of re-organization in the auditor's object_audit method
to get things into a single return path we can add an unconditional check
for unexpected files and remove those that are "old enough".

Since the replicator will kill any rsync processes that are running longer
than the configured rsync_timeout we know that any rsync tempfiles older
than this can be deleted.

Split unlink_older_than in common.utils into two functions to allow an
explicit list of previously discovered paths to be passed in to avoid an
extra listdir.  Since the getmtime handling already ignores OSError
there's less concern of race condition where a previous discovered
unexpected file is reaped by rsync while we're attempting to clean it up.

Update some doc on the new config option.

Closes-Bug: #1554005

Change-Id: Id67681cb77f605e3491b8afcb9c69d769e154283
2016-03-23 19:34:34 +00:00
32847d2f48 Merge "Docs: Container sync does not require POST-as-COPY" 2016-03-23 17:08:26 +00:00