e199192cae
This commit replaces boolean replication_one_per_device by an integer replication_concurrency_per_device. The new configuration parameter is passed to utils.lock_path() which now accept as an argument a limit for the number of locks that can be acquired for a specific path. Instead of trying to lock path/.lock, utils.lock_path() now tries to lock files path/.lock-X, where X is in the range (0, N), N being the limit for the number of locks allowed for the path. The default value of limit is set to 1. Change-Id: I3c3193344c7a57a8a4fc7932d1b10e702efd3572
2323 lines
148 KiB
ReStructuredText
2323 lines
148 KiB
ReStructuredText
|
|
Deployment Guide
|
|
================
|
|
|
|
-----------------------
|
|
Hardware Considerations
|
|
-----------------------
|
|
|
|
Swift is designed to run on commodity hardware. At Rackspace, our storage
|
|
servers are currently running fairly generic 4U servers with 24 2T SATA
|
|
drives and 8 cores of processing power. RAID on the storage drives is not
|
|
required and not recommended. Swift's disk usage pattern is the worst
|
|
case possible for RAID, and performance degrades very quickly using RAID 5
|
|
or 6.
|
|
|
|
------------------
|
|
Deployment Options
|
|
------------------
|
|
|
|
The Swift services run completely autonomously, which provides for a lot of
|
|
flexibility when architecting the hardware deployment for Swift. The 4 main
|
|
services are:
|
|
|
|
#. Proxy Services
|
|
#. Object Services
|
|
#. Container Services
|
|
#. Account Services
|
|
|
|
The Proxy Services are more CPU and network I/O intensive. If you are using
|
|
10g networking to the proxy, or are terminating SSL traffic at the proxy,
|
|
greater CPU power will be required.
|
|
|
|
The Object, Container, and Account Services (Storage Services) are more disk
|
|
and network I/O intensive.
|
|
|
|
The easiest deployment is to install all services on each server. There is
|
|
nothing wrong with doing this, as it scales each service out horizontally.
|
|
|
|
At Rackspace, we put the Proxy Services on their own servers and all of the
|
|
Storage Services on the same server. This allows us to send 10g networking to
|
|
the proxy and 1g to the storage servers, and keep load balancing to the
|
|
proxies more manageable. Storage Services scale out horizontally as storage
|
|
servers are added, and we can scale overall API throughput by adding more
|
|
Proxies.
|
|
|
|
If you need more throughput to either Account or Container Services, they may
|
|
each be deployed to their own servers. For example you might use faster (but
|
|
more expensive) SAS or even SSD drives to get faster disk I/O to the databases.
|
|
|
|
A high-availability (HA) deployment of Swift requires that multiple proxy
|
|
servers are deployed and requests are load-balanced between them. Each proxy
|
|
server instance is stateless and able to respond to requests for the entire
|
|
cluster.
|
|
|
|
Load balancing and network design is left as an exercise to the reader,
|
|
but this is a very important part of the cluster, so time should be spent
|
|
designing the network for a Swift cluster.
|
|
|
|
|
|
---------------------
|
|
Web Front End Options
|
|
---------------------
|
|
|
|
Swift comes with an integral web front end. However, it can also be deployed
|
|
as a request processor of an Apache2 using mod_wsgi as described in
|
|
:doc:`Apache Deployment Guide <apache_deployment_guide>`.
|
|
|
|
.. _ring-preparing:
|
|
|
|
------------------
|
|
Preparing the Ring
|
|
------------------
|
|
|
|
The first step is to determine the number of partitions that will be in the
|
|
ring. We recommend that there be a minimum of 100 partitions per drive to
|
|
insure even distribution across the drives. A good starting point might be
|
|
to figure out the maximum number of drives the cluster will contain, and then
|
|
multiply by 100, and then round up to the nearest power of two.
|
|
|
|
For example, imagine we are building a cluster that will have no more than
|
|
5,000 drives. That would mean that we would have a total number of 500,000
|
|
partitions, which is pretty close to 2^19, rounded up.
|
|
|
|
It is also a good idea to keep the number of partitions small (relatively).
|
|
The more partitions there are, the more work that has to be done by the
|
|
replicators and other backend jobs and the more memory the rings consume in
|
|
process. The goal is to find a good balance between small rings and maximum
|
|
cluster size.
|
|
|
|
The next step is to determine the number of replicas to store of the data.
|
|
Currently it is recommended to use 3 (as this is the only value that has
|
|
been tested). The higher the number, the more storage that is used but the
|
|
less likely you are to lose data.
|
|
|
|
It is also important to determine how many zones the cluster should have. It is
|
|
recommended to start with a minimum of 5 zones. You can start with fewer, but
|
|
our testing has shown that having at least five zones is optimal when failures
|
|
occur. We also recommend trying to configure the zones at as high a level as
|
|
possible to create as much isolation as possible. Some example things to take
|
|
into consideration can include physical location, power availability, and
|
|
network connectivity. For example, in a small cluster you might decide to
|
|
split the zones up by cabinet, with each cabinet having its own power and
|
|
network connectivity. The zone concept is very abstract, so feel free to use
|
|
it in whatever way best isolates your data from failure. Each zone exists
|
|
in a region.
|
|
|
|
A region is also an abstract concept that may be used to distinguish between
|
|
geographically separated areas as well as can be used within same datacenter.
|
|
Regions and zones are referenced by a positive integer.
|
|
|
|
You can now start building the ring with::
|
|
|
|
swift-ring-builder <builder_file> create <part_power> <replicas> <min_part_hours>
|
|
|
|
This will start the ring build process creating the <builder_file> with
|
|
2^<part_power> partitions. <min_part_hours> is the time in hours before a
|
|
specific partition can be moved in succession (24 is a good value for this).
|
|
|
|
Devices can be added to the ring with::
|
|
|
|
swift-ring-builder <builder_file> add r<region>z<zone>-<ip>:<port>/<device_name>_<meta> <weight>
|
|
|
|
This will add a device to the ring where <builder_file> is the name of the
|
|
builder file that was created previously, <region> is the number of the region
|
|
the zone is in, <zone> is the number of the zone this device is in, <ip> is
|
|
the ip address of the server the device is in, <port> is the port number that
|
|
the server is running on, <device_name> is the name of the device on the server
|
|
(for example: sdb1), <meta> is a string of metadata for the device (optional),
|
|
and <weight> is a float weight that determines how many partitions are put on
|
|
the device relative to the rest of the devices in the cluster (a good starting
|
|
point is 100.0 x TB on the drive).Add each device that will be initially in the
|
|
cluster.
|
|
|
|
Once all of the devices are added to the ring, run::
|
|
|
|
swift-ring-builder <builder_file> rebalance
|
|
|
|
This will distribute the partitions across the drives in the ring. It is
|
|
important whenever making changes to the ring to make all the changes
|
|
required before running rebalance. This will ensure that the ring stays as
|
|
balanced as possible, and as few partitions are moved as possible.
|
|
|
|
The above process should be done to make a ring for each storage service
|
|
(Account, Container and Object). The builder files will be needed in future
|
|
changes to the ring, so it is very important that these be kept and backed up.
|
|
The resulting .tar.gz ring file should be pushed to all of the servers in the
|
|
cluster. For more information about building rings, running
|
|
swift-ring-builder with no options will display help text with available
|
|
commands and options. More information on how the ring works internally
|
|
can be found in the :doc:`Ring Overview <overview_ring>`.
|
|
|
|
.. _server-per-port-configuration:
|
|
|
|
-------------------------------
|
|
Running object-servers Per Disk
|
|
-------------------------------
|
|
|
|
The lack of true asynchronous file I/O on Linux leaves the object-server
|
|
workers vulnerable to misbehaving disks. Because any object-server worker can
|
|
service a request for any disk, and a slow I/O request blocks the eventlet hub,
|
|
a single slow disk can impair an entire storage node. This also prevents
|
|
object servers from fully utilizing all their disks during heavy load.
|
|
|
|
Another way to get full I/O isolation is to give each disk on a storage node a
|
|
different port in the storage policy rings. Then set the
|
|
:ref:`servers_per_port <object-server-default-options>`
|
|
option in the object-server config. NOTE: while the purpose of this config
|
|
setting is to run one or more object-server worker processes per *disk*, the
|
|
implementation just runs object-servers per unique port of local devices in the
|
|
rings. The deployer must combine this option with appropriately-configured
|
|
rings to benefit from this feature.
|
|
|
|
Here's an example (abbreviated) old-style ring (2 node cluster with 2 disks
|
|
each)::
|
|
|
|
Devices: id region zone ip address port replication ip replication port name
|
|
0 1 1 1.1.0.1 6200 1.1.0.1 6200 d1
|
|
1 1 1 1.1.0.1 6200 1.1.0.1 6200 d2
|
|
2 1 2 1.1.0.2 6200 1.1.0.2 6200 d3
|
|
3 1 2 1.1.0.2 6200 1.1.0.2 6200 d4
|
|
|
|
And here's the same ring set up for `servers_per_port`::
|
|
|
|
Devices: id region zone ip address port replication ip replication port name
|
|
0 1 1 1.1.0.1 6200 1.1.0.1 6200 d1
|
|
1 1 1 1.1.0.1 6201 1.1.0.1 6201 d2
|
|
2 1 2 1.1.0.2 6200 1.1.0.2 6200 d3
|
|
3 1 2 1.1.0.2 6201 1.1.0.2 6201 d4
|
|
|
|
When migrating from normal to `servers_per_port`, perform these steps in order:
|
|
|
|
#. Upgrade Swift code to a version capable of doing `servers_per_port`.
|
|
|
|
#. Enable `servers_per_port` with a > 0 value
|
|
|
|
#. Restart `swift-object-server` processes with a SIGHUP. At this point, you
|
|
will have the `servers_per_port` number of `swift-object-server` processes
|
|
serving all requests for all disks on each node. This preserves
|
|
availability, but you should perform the next step as quickly as possible.
|
|
|
|
#. Push out new rings that actually have different ports per disk on each
|
|
server. One of the ports in the new ring should be the same as the port
|
|
used in the old ring ("6200" in the example above). This will cover
|
|
existing proxy-server processes who haven't loaded the new ring yet. They
|
|
can still talk to any storage node regardless of whether or not that
|
|
storage node has loaded the ring and started object-server processes on the
|
|
new ports.
|
|
|
|
If you do not run a separate object-server for replication, then this setting
|
|
must be available to the object-replicator and object-reconstructor (i.e.
|
|
appear in the [DEFAULT] config section).
|
|
|
|
.. _general-service-configuration:
|
|
|
|
-----------------------------
|
|
General Service Configuration
|
|
-----------------------------
|
|
|
|
Most Swift services fall into two categories. Swift's wsgi servers and
|
|
background daemons.
|
|
|
|
For more information specific to the configuration of Swift's wsgi servers
|
|
with paste deploy see :ref:`general-server-configuration`.
|
|
|
|
Configuration for servers and daemons can be expressed together in the same
|
|
file for each type of server, or separately. If a required section for the
|
|
service trying to start is missing there will be an error. The sections not
|
|
used by the service are ignored.
|
|
|
|
Consider the example of an object storage node. By convention, configuration
|
|
for the object-server, object-updater, object-replicator, object-auditor, and
|
|
object-reconstructor exist in a single file ``/etc/swift/object-server.conf``::
|
|
|
|
[DEFAULT]
|
|
reclaim_age = 604800
|
|
|
|
[pipeline:main]
|
|
pipeline = object-server
|
|
|
|
[app:object-server]
|
|
use = egg:swift#object
|
|
|
|
[object-replicator]
|
|
|
|
[object-updater]
|
|
|
|
[object-auditor]
|
|
|
|
Swift services expect a configuration path as the first argument::
|
|
|
|
$ swift-object-auditor
|
|
Usage: swift-object-auditor CONFIG [options]
|
|
|
|
Error: missing config path argument
|
|
|
|
If you omit the object-auditor section this file could not be used as the
|
|
configuration path when starting the ``swift-object-auditor`` daemon::
|
|
|
|
$ swift-object-auditor /etc/swift/object-server.conf
|
|
Unable to find object-auditor config section in /etc/swift/object-server.conf
|
|
|
|
If the configuration path is a directory instead of a file all of the files in
|
|
the directory with the file extension ".conf" will be combined to generate the
|
|
configuration object which is delivered to the Swift service. This is
|
|
referred to generally as "directory based configuration".
|
|
|
|
Directory based configuration leverages ConfigParser's native multi-file
|
|
support. Files ending in ".conf" in the given directory are parsed in
|
|
lexicographical order. Filenames starting with '.' are ignored. A mixture of
|
|
file and directory configuration paths is not supported - if the configuration
|
|
path is a file only that file will be parsed.
|
|
|
|
The Swift service management tool ``swift-init`` has adopted the convention of
|
|
looking for ``/etc/swift/{type}-server.conf.d/`` if the file
|
|
``/etc/swift/{type}-server.conf`` file does not exist.
|
|
|
|
When using directory based configuration, if the same option under the same
|
|
section appears more than once in different files, the last value parsed is
|
|
said to override previous occurrences. You can ensure proper override
|
|
precedence by prefixing the files in the configuration directory with
|
|
numerical values.::
|
|
|
|
/etc/swift/
|
|
default.base
|
|
object-server.conf.d/
|
|
000_default.conf -> ../default.base
|
|
001_default-override.conf
|
|
010_server.conf
|
|
020_replicator.conf
|
|
030_updater.conf
|
|
040_auditor.conf
|
|
|
|
You can inspect the resulting combined configuration object using the
|
|
``swift-config`` command line tool
|
|
|
|
.. _general-server-configuration:
|
|
|
|
----------------------------
|
|
General Server Configuration
|
|
----------------------------
|
|
|
|
Swift uses paste.deploy (http://pythonpaste.org/deploy/) to manage server
|
|
configurations.
|
|
|
|
Default configuration options are set in the `[DEFAULT]` section, and any
|
|
options specified there can be overridden in any of the other sections BUT
|
|
ONLY BY USING THE SYNTAX ``set option_name = value``. This is the unfortunate
|
|
way paste.deploy works and I'll try to explain it in full.
|
|
|
|
First, here's an example paste.deploy configuration file::
|
|
|
|
[DEFAULT]
|
|
name1 = globalvalue
|
|
name2 = globalvalue
|
|
name3 = globalvalue
|
|
set name4 = globalvalue
|
|
|
|
[pipeline:main]
|
|
pipeline = myapp
|
|
|
|
[app:myapp]
|
|
use = egg:mypkg#myapp
|
|
name2 = localvalue
|
|
set name3 = localvalue
|
|
set name5 = localvalue
|
|
name6 = localvalue
|
|
|
|
The resulting configuration that myapp receives is::
|
|
|
|
global {'__file__': '/etc/mypkg/wsgi.conf', 'here': '/etc/mypkg',
|
|
'name1': 'globalvalue',
|
|
'name2': 'globalvalue',
|
|
'name3': 'localvalue',
|
|
'name4': 'globalvalue',
|
|
'name5': 'localvalue',
|
|
'set name4': 'globalvalue'}
|
|
local {'name6': 'localvalue'}
|
|
|
|
So, `name1` got the global value which is fine since it's only in the `DEFAULT`
|
|
section anyway.
|
|
|
|
`name2` got the global value from `DEFAULT` even though it appears to be
|
|
overridden in the `app:myapp` subsection. This is just the unfortunate way
|
|
paste.deploy works (at least at the time of this writing.)
|
|
|
|
`name3` got the local value from the `app:myapp` subsection because it is using
|
|
the special paste.deploy syntax of ``set option_name = value``. So, if you want
|
|
a default value for most app/filters but want to override it in one
|
|
subsection, this is how you do it.
|
|
|
|
`name4` got the global value from `DEFAULT` since it's only in that section
|
|
anyway. But, since we used the ``set`` syntax in the `DEFAULT` section even
|
|
though we shouldn't, notice we also got a ``set name4`` variable. Weird, but
|
|
probably not harmful.
|
|
|
|
`name5` got the local value from the `app:myapp` subsection since it's only
|
|
there anyway, but notice that it is in the global configuration and not the
|
|
local configuration. This is because we used the ``set`` syntax to set the
|
|
value. Again, weird, but not harmful since Swift just treats the two sets of
|
|
configuration values as one set anyway.
|
|
|
|
`name6` got the local value from `app:myapp` subsection since it's only there,
|
|
and since we didn't use the ``set`` syntax, it's only in the local
|
|
configuration and not the global one. Though, as indicated above, there is no
|
|
special distinction with Swift.
|
|
|
|
That's quite an explanation for something that should be so much simpler, but
|
|
it might be important to know how paste.deploy interprets configuration files.
|
|
The main rule to remember when working with Swift configuration files is:
|
|
|
|
.. note::
|
|
|
|
Use the ``set option_name = value`` syntax in subsections if the option is
|
|
also set in the ``[DEFAULT]`` section. Don't get in the habit of always
|
|
using the ``set`` syntax or you'll probably mess up your non-paste.deploy
|
|
configuration files.
|
|
|
|
--------------------
|
|
Common configuration
|
|
--------------------
|
|
|
|
An example of common configuration file can be found at etc/swift.conf-sample
|
|
|
|
The following configuration options are available:
|
|
|
|
=================== ========== =============================================
|
|
Option Default Description
|
|
------------------- ---------- ---------------------------------------------
|
|
max_header_size 8192 max_header_size is the max number of bytes in
|
|
the utf8 encoding of each header. Using 8192
|
|
as default because eventlet use 8192 as max
|
|
size of header line. This value may need to
|
|
be increased when using identity v3 API
|
|
tokens including more than 7 catalog entries.
|
|
See also include_service_catalog in
|
|
proxy-server.conf-sample (documented in
|
|
overview_auth.rst).
|
|
extra_header_count 0 By default the maximum number of allowed
|
|
headers depends on the number of max
|
|
allowed metadata settings plus a default
|
|
value of 32 for regular http headers.
|
|
If for some reason this is not enough (custom
|
|
middleware for example) it can be increased
|
|
with the extra_header_count constraint.
|
|
=================== ========== =============================================
|
|
|
|
---------------------------
|
|
Object Server Configuration
|
|
---------------------------
|
|
|
|
An Example Object Server configuration can be found at
|
|
etc/object-server.conf-sample in the source code repository.
|
|
|
|
The following configuration sections are available:
|
|
|
|
* :ref:`[DEFAULT] <object-server-default-options>`
|
|
* `[object-server]`_
|
|
* `[object-replicator]`_
|
|
* `[object-reconstructor]`_
|
|
* `[object-updater]`_
|
|
* `[object-auditor]`_
|
|
|
|
.. _object-server-default-options:
|
|
|
|
*********
|
|
[DEFAULT]
|
|
*********
|
|
|
|
================================ ========== ============================================
|
|
Option Default Description
|
|
-------------------------------- ---------- --------------------------------------------
|
|
swift_dir /etc/swift Swift configuration directory
|
|
devices /srv/node Parent directory of where devices are
|
|
mounted
|
|
mount_check true Whether or not check if the devices are
|
|
mounted to prevent accidentally writing
|
|
to the root device
|
|
bind_ip 0.0.0.0 IP Address for server to bind to
|
|
bind_port 6200 Port for server to bind to
|
|
bind_timeout 30 Seconds to attempt bind before giving up
|
|
backlog 4096 Maximum number of allowed pending
|
|
connections
|
|
workers auto Override the number of pre-forked workers
|
|
that will accept connections. If set it
|
|
should be an integer, zero means no fork.
|
|
If unset, it will try to default to the
|
|
number of effective cpu cores and fallback
|
|
to one. Increasing the number of workers
|
|
helps slow filesystem operations in one
|
|
request from negatively impacting other
|
|
requests, but only the
|
|
:ref:`servers_per_port
|
|
<server-per-port-configuration>` option
|
|
provides complete I/O isolation with no
|
|
measurable overhead.
|
|
servers_per_port 0 If each disk in each storage policy ring
|
|
has unique port numbers for its "ip"
|
|
value, you can use this setting to have
|
|
each object-server worker only service
|
|
requests for the single disk matching the
|
|
port in the ring. The value of this
|
|
setting determines how many worker
|
|
processes run for each port (disk) in the
|
|
ring. If you have 24 disks per server, and
|
|
this setting is 4, then each storage node
|
|
will have 1 + (24 * 4) = 97 total
|
|
object-server processes running. This
|
|
gives complete I/O isolation, drastically
|
|
reducing the impact of slow disks on
|
|
storage node performance. The
|
|
object-replicator and object-reconstructor
|
|
need to see this setting too, so it must
|
|
be in the [DEFAULT] section.
|
|
See :ref:`server-per-port-configuration`.
|
|
max_clients 1024 Maximum number of clients one worker can
|
|
process simultaneously (it will actually
|
|
accept(2) N + 1). Setting this to one (1)
|
|
will only handle one request at a time,
|
|
without accepting another request
|
|
concurrently.
|
|
disable_fallocate false Disable "fast fail" fallocate checks if
|
|
the underlying filesystem does not support
|
|
it.
|
|
log_name swift Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
log_max_line_length 0 Caps the length of log lines to the
|
|
value given; no limit if set to 0, the
|
|
default.
|
|
log_custom_handlers None Comma-separated list of functions to call
|
|
to setup custom log handlers.
|
|
log_udp_host Override log_address
|
|
log_udp_port 514 UDP log port
|
|
log_statsd_host None Enables StatsD logging; IPv4/IPv6
|
|
address or a hostname. If a
|
|
hostname resolves to an IPv4 and IPv6
|
|
address, the IPv4 address will be
|
|
used.
|
|
log_statsd_port 8125
|
|
log_statsd_default_sample_rate 1.0
|
|
log_statsd_sample_rate_factor 1.0
|
|
log_statsd_metric_prefix
|
|
eventlet_debug false If true, turn on debug logging for
|
|
eventlet
|
|
fallocate_reserve 1% You can set fallocate_reserve to the
|
|
number of bytes or percentage of disk
|
|
space you'd like fallocate to reserve,
|
|
whether there is space for the given
|
|
file size or not. Percentage will be used
|
|
if the value ends with a '%'. This is
|
|
useful for systems that behave badly when
|
|
they completely run out of space; you can
|
|
make the services pretend they're out of
|
|
space early.
|
|
conn_timeout 0.5 Time to wait while attempting to connect
|
|
to another backend node.
|
|
node_timeout 3 Time to wait while sending each chunk of
|
|
data to another backend node.
|
|
client_timeout 60 Time to wait while receiving each chunk of
|
|
data from a client or another backend node
|
|
network_chunk_size 65536 Size of chunks to read/write over the
|
|
network
|
|
disk_chunk_size 65536 Size of chunks to read/write to disk
|
|
container_update_timeout 1 Time to wait while sending a container
|
|
update on object update.
|
|
reclaim_age 604800 Time elapsed in seconds before the tombstone
|
|
file representing a deleted object can be
|
|
reclaimed. This is the maximum window for
|
|
your consistency engine. If a node that was
|
|
disconnected from the cluster because of a
|
|
fault is reintroduced into the cluster after
|
|
this window without having its data purged
|
|
it will result in dark data. This setting
|
|
should be consistent across all object
|
|
services.
|
|
nice_priority None Scheduling priority of server processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server processes.
|
|
I/O niceness class values are IOPRIO_CLASS_RT
|
|
(realtime), IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower the I/O
|
|
priority of the process. Work only with
|
|
ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
================================ ========== ============================================
|
|
|
|
.. _object-server-options:
|
|
|
|
***************
|
|
[object-server]
|
|
***************
|
|
|
|
================================== ====================== ===============================================
|
|
Option Default Description
|
|
---------------------------------- ---------------------- -----------------------------------------------
|
|
use paste.deploy entry point for the
|
|
object server. For most cases,
|
|
this should be
|
|
`egg:swift#object`.
|
|
set log_name object-server Label used when logging
|
|
set log_facility LOG_LOCAL0 Syslog log facility
|
|
set log_level INFO Logging level
|
|
set log_requests True Whether or not to log each
|
|
request
|
|
set log_address /dev/log Logging directory
|
|
user swift User to run as
|
|
max_upload_time 86400 Maximum time allowed to upload an
|
|
object
|
|
slow 0 If > 0, Minimum time in seconds for a PUT or
|
|
DELETE request to complete. This is only
|
|
useful to simulate slow devices during testing
|
|
and development.
|
|
mb_per_sync 512 On PUT requests, sync file every
|
|
n MB
|
|
keep_cache_size 5242880 Largest object size to keep in
|
|
buffer cache
|
|
keep_cache_private false Allow non-public objects to stay
|
|
in kernel's buffer cache
|
|
allowed_headers Content-Disposition, Comma separated list of headers
|
|
Content-Encoding, that can be set in metadata on an object.
|
|
X-Delete-At, This list is in addition to
|
|
X-Object-Manifest, X-Object-Meta-* headers and cannot include
|
|
X-Static-Large-Object Content-Type, etag, Content-Length, or deleted
|
|
auto_create_account_prefix . Prefix used when automatically
|
|
creating accounts.
|
|
replication_server Configure parameter for creating
|
|
specific server. To handle all verbs,
|
|
including replication verbs, do not
|
|
specify "replication_server"
|
|
(this is the default). To only
|
|
handle replication, set to a True
|
|
value (e.g. "True" or "1").
|
|
To handle only non-replication
|
|
verbs, set to "False". Unless you
|
|
have a separate replication network, you
|
|
should not specify any value for
|
|
"replication_server".
|
|
replication_concurrency 4 Set to restrict the number of
|
|
concurrent incoming SSYNC
|
|
requests; set to 0 for unlimited
|
|
replication_concurrency_per_device 1 Set to restrict the number of
|
|
concurrent incoming SSYNC
|
|
requests per device; set to 0 for
|
|
unlimited requests per devices.
|
|
This can help control I/O to each
|
|
device. This does not override
|
|
replication_concurrency described
|
|
above, so you may need to adjust
|
|
both parameters depending on your
|
|
hardware or network capacity.
|
|
replication_lock_timeout 15 Number of seconds to wait for an
|
|
existing replication device lock
|
|
before giving up.
|
|
replication_failure_threshold 100 The number of subrequest failures
|
|
before the
|
|
replication_failure_ratio is
|
|
checked
|
|
replication_failure_ratio 1.0 If the value of failures /
|
|
successes of SSYNC
|
|
subrequests exceeds this ratio,
|
|
the overall SSYNC request
|
|
will be aborted
|
|
splice no Use splice() for zero-copy object
|
|
GETs. This requires Linux kernel
|
|
version 3.0 or greater. If you set
|
|
"splice = yes" but the kernel
|
|
does not support it, error messages
|
|
will appear in the object server
|
|
logs at startup, but your object
|
|
servers should continue to function.
|
|
nice_priority None Scheduling priority of server processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server processes.
|
|
I/O niceness class values are IOPRIO_CLASS_RT
|
|
(realtime), IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower the I/O
|
|
priority of the process. Work only with
|
|
ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
eventlet_tpool_num_threads auto The number of threads in eventlet's thread pool.
|
|
Most IO will occur in the object server's main
|
|
thread, but certain "heavy" IO operations will
|
|
occur in separate IO threads, managed by
|
|
eventlet.
|
|
The default value is auto, whose actual value
|
|
is dependent on the servers_per_port value.
|
|
If servers_per_port is zero then it uses
|
|
eventlet's default (currently 20 threads).
|
|
If the servers_per_port is nonzero then it'll
|
|
only use 1 thread per process.
|
|
This value can be overridden with an integer
|
|
value.
|
|
================================== ====================== ===============================================
|
|
|
|
*******************
|
|
[object-replicator]
|
|
*******************
|
|
|
|
=========================== ======================== ================================
|
|
Option Default Description
|
|
--------------------------- ------------------------ --------------------------------
|
|
log_name object-replicator Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
daemonize yes Whether or not to run replication
|
|
as a daemon
|
|
interval 30 Time in seconds to wait between
|
|
replication passes
|
|
concurrency 1 Number of replication workers to
|
|
spawn
|
|
sync_method rsync The sync method to use; default
|
|
is rsync but you can use ssync to
|
|
try the EXPERIMENTAL
|
|
all-swift-code-no-rsync-callouts
|
|
method. Once ssync is verified as
|
|
or better than, rsync, we plan to
|
|
deprecate rsync so we can move on
|
|
with more features for
|
|
replication.
|
|
rsync_timeout 900 Max duration of a partition rsync
|
|
rsync_bwlimit 0 Bandwidth limit for rsync in kB/s.
|
|
0 means unlimited.
|
|
rsync_io_timeout 30 Timeout value sent to rsync
|
|
--timeout and --contimeout
|
|
options
|
|
rsync_compress no Allow rsync to compress data
|
|
which is transmitted to destination
|
|
node during sync. However, this
|
|
is applicable only when destination
|
|
node is in a different region
|
|
than the local one.
|
|
NOTE: Objects that are already
|
|
compressed (for example: .tar.gz,
|
|
.mp3) might slow down the syncing
|
|
process.
|
|
stats_interval 300 Interval in seconds between
|
|
logging replication statistics
|
|
handoffs_first false If set to True, partitions that
|
|
are not supposed to be on the
|
|
node will be replicated first.
|
|
The default setting should not be
|
|
changed, except for extreme
|
|
situations.
|
|
handoff_delete auto By default handoff partitions
|
|
will be removed when it has
|
|
successfully replicated to all
|
|
the canonical nodes. If set to an
|
|
integer n, it will remove the
|
|
partition if it is successfully
|
|
replicated to n nodes. The
|
|
default setting should not be
|
|
changed, except for extreme
|
|
situations.
|
|
node_timeout DEFAULT or 10 Request timeout to external
|
|
services. This uses what's set
|
|
here, or what's set in the
|
|
DEFAULT section, or 10 (though
|
|
other sections use 3 as the final
|
|
default).
|
|
http_timeout 60 Max duration of an http request.
|
|
This is for REPLICATE finalization
|
|
calls and so should be longer
|
|
than node_timeout.
|
|
lockup_timeout 1800 Attempts to kill all workers if
|
|
nothing replicates for
|
|
lockup_timeout seconds
|
|
rsync_module {replication_ip}::object Format of the rsync module where
|
|
the replicator will send data.
|
|
The configuration value can
|
|
include some variables that will
|
|
be extracted from the ring.
|
|
Variables must follow the format
|
|
{NAME} where NAME is one of: ip,
|
|
port, replication_ip,
|
|
replication_port, region, zone,
|
|
device, meta. See
|
|
etc/rsyncd.conf-sample for some
|
|
examples.
|
|
rsync_error_log_line_length 0 Limits how long rsync error log
|
|
lines are
|
|
ring_check_interval 15 Interval for checking new ring
|
|
file
|
|
recon_cache_path /var/cache/swift Path to recon cache
|
|
nice_priority None Scheduling priority of server
|
|
processes. Niceness values
|
|
range from -20 (most favorable
|
|
to the process) to 19 (least
|
|
favorable to the process).
|
|
The default does not modify
|
|
priority.
|
|
ionice_class None I/O scheduling class of server
|
|
processes. I/O niceness class
|
|
values are IOPRIO_CLASS_RT (realtime),
|
|
IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify
|
|
class and priority.
|
|
Linux supports io scheduling
|
|
priorities and classes since
|
|
2.6.13 with the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority
|
|
is a number which goes from
|
|
0 to 7. The higher the value,
|
|
the lower the I/O priority of
|
|
the process.
|
|
Work only with ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE
|
|
is set.
|
|
=========================== ======================== ================================
|
|
|
|
**********************
|
|
[object-reconstructor]
|
|
**********************
|
|
|
|
=========================== ======================== ================================
|
|
Option Default Description
|
|
--------------------------- ------------------------ --------------------------------
|
|
log_name object-reconstructor Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
daemonize yes Whether or not to run
|
|
reconstruction as a daemon
|
|
interval 30 Time in seconds to wait between
|
|
reconstruction passes
|
|
reconstructor_workers 0 Maximum number of worker processes
|
|
to spawn. Each worker will handle
|
|
a subset of devices. Devices will
|
|
be assigned evenly among the workers
|
|
so that workers cycle at similar
|
|
intervals (which can lead to fewer
|
|
workers than requested). You can not
|
|
have more workers than devices. If
|
|
you have no devices only a single
|
|
worker is spawned.
|
|
concurrency 1 Number of reconstruction threads to
|
|
spawn per reconstructor process.
|
|
stats_interval 300 Interval in seconds between
|
|
logging reconstruction statistics
|
|
handoffs_only false The handoffs_only mode option is for
|
|
special case emergency situations
|
|
during rebalance such as disk full in
|
|
the cluster. This option SHOULD NOT
|
|
BE CHANGED, except for extreme
|
|
situations. When handoffs_only mode
|
|
is enabled the reconstructor will
|
|
*only* revert fragments from handoff
|
|
nodes to primary nodes and will not
|
|
sync primary nodes with neighboring
|
|
primary nodes. This will force the
|
|
reconstructor to sync and delete
|
|
handoffs' fragments more quickly and
|
|
minimize the time of the rebalance by
|
|
limiting the number of rebuilds. The
|
|
handoffs_only option is only for
|
|
temporary use and should be disabled
|
|
as soon as the emergency situation
|
|
has been resolved.
|
|
node_timeout DEFAULT or 10 Request timeout to external
|
|
services. The value used is the value
|
|
set in this section, or the value set
|
|
in the DEFAULT section, or 10.
|
|
http_timeout 60 Max duration of an http request.
|
|
This is for REPLICATE finalization
|
|
calls and so should be longer
|
|
than node_timeout.
|
|
lockup_timeout 1800 Attempts to kill all threads if
|
|
no fragment has been reconstructed
|
|
for lockup_timeout seconds.
|
|
ring_check_interval 15 Interval for checking new ring
|
|
file
|
|
recon_cache_path /var/cache/swift Path to recon cache
|
|
nice_priority None Scheduling priority of server
|
|
processes. Niceness values
|
|
range from -20 (most favorable
|
|
to the process) to 19 (least
|
|
favorable to the process).
|
|
The default does not modify
|
|
priority.
|
|
ionice_class None I/O scheduling class of server
|
|
processes. I/O niceness class
|
|
values are IOPRIO_CLASS_RT (realtime),
|
|
IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify
|
|
class and priority.
|
|
Linux supports io scheduling
|
|
priorities and classes since
|
|
2.6.13 with the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority
|
|
is a number which goes from
|
|
0 to 7. The higher the value,
|
|
the lower the I/O priority of
|
|
the process.
|
|
Work only with ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE
|
|
is set.
|
|
=========================== ======================== ================================
|
|
|
|
****************
|
|
[object-updater]
|
|
****************
|
|
|
|
=================== =================== ==========================================
|
|
Option Default Description
|
|
------------------- ------------------- ------------------------------------------
|
|
log_name object-updater Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
interval 300 Minimum time for a pass to take
|
|
concurrency 1 Number of updater workers to spawn
|
|
node_timeout DEFAULT or 10 Request timeout to external services. This
|
|
uses what's set here, or what's set in the
|
|
DEFAULT section, or 10 (though other
|
|
sections use 3 as the final default).
|
|
objects_per_second 50 Maximum objects updated per second.
|
|
Should be tuned according to individual
|
|
system specs. 0 is unlimited.
|
|
slowdown 0.01 Time in seconds to wait between objects.
|
|
Deprecated in favor of objects_per_second.
|
|
recon_cache_path /var/cache/swift Path to recon cache
|
|
nice_priority None Scheduling priority of server processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server processes.
|
|
I/O niceness class values are IOPRIO_CLASS_RT
|
|
(realtime), IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower the I/O
|
|
priority of the process. Work only with
|
|
ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
=================== =================== ==========================================
|
|
|
|
****************
|
|
[object-auditor]
|
|
****************
|
|
|
|
=========================== =================== ==========================================
|
|
Option Default Description
|
|
--------------------------- ------------------- ------------------------------------------
|
|
log_name object-auditor Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
log_time 3600 Frequency of status logs in seconds.
|
|
interval 30 Time in seconds to wait between
|
|
auditor passes
|
|
disk_chunk_size 65536 Size of chunks read during auditing
|
|
files_per_second 20 Maximum files audited per second per
|
|
auditor process. Should be tuned according
|
|
to individual system specs. 0 is unlimited.
|
|
bytes_per_second 10000000 Maximum bytes audited per second per
|
|
auditor process. Should be tuned according
|
|
to individual system specs. 0 is unlimited.
|
|
concurrency 1 The number of parallel processes to use
|
|
for checksum auditing.
|
|
zero_byte_files_per_second 50
|
|
object_size_stats
|
|
recon_cache_path /var/cache/swift Path to recon cache
|
|
rsync_tempfile_timeout auto Time elapsed in seconds before rsync
|
|
tempfiles will be unlinked. Config value
|
|
of "auto" try to use object-replicator's
|
|
rsync_timeout + 900 or fallback to 86400
|
|
(1 day).
|
|
nice_priority None Scheduling priority of server processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server processes.
|
|
I/O niceness class values are IOPRIO_CLASS_RT
|
|
(realtime), IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower the I/O
|
|
priority of the process. Work only with
|
|
ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
=========================== =================== ==========================================
|
|
|
|
------------------------------
|
|
Container Server Configuration
|
|
------------------------------
|
|
|
|
An example Container Server configuration can be found at
|
|
etc/container-server.conf-sample in the source code repository.
|
|
|
|
The following configuration sections are available:
|
|
|
|
* :ref:`[DEFAULT] <container_server_default_options>`
|
|
* `[container-server]`_
|
|
* `[container-replicator]`_
|
|
* `[container-updater]`_
|
|
* `[container-auditor]`_
|
|
|
|
.. _container_server_default_options:
|
|
|
|
*********
|
|
[DEFAULT]
|
|
*********
|
|
|
|
=============================== ========== ============================================
|
|
Option Default Description
|
|
------------------------------- ---------- --------------------------------------------
|
|
swift_dir /etc/swift Swift configuration directory
|
|
devices /srv/node Parent directory of where devices are mounted
|
|
mount_check true Whether or not check if the devices are
|
|
mounted to prevent accidentally writing
|
|
to the root device
|
|
bind_ip 0.0.0.0 IP Address for server to bind to
|
|
bind_port 6201 Port for server to bind to
|
|
bind_timeout 30 Seconds to attempt bind before giving up
|
|
backlog 4096 Maximum number of allowed pending
|
|
connections
|
|
workers auto Override the number of pre-forked workers
|
|
that will accept connections. If set it
|
|
should be an integer, zero means no fork. If
|
|
unset, it will try to default to the number
|
|
of effective cpu cores and fallback to one.
|
|
Increasing the number of workers may reduce
|
|
the possibility of slow file system
|
|
operations in one request from negatively
|
|
impacting other requests. See
|
|
:ref:`general-service-tuning`.
|
|
max_clients 1024 Maximum number of clients one worker can
|
|
process simultaneously (it will actually
|
|
accept(2) N + 1). Setting this to one (1)
|
|
will only handle one request at a time,
|
|
without accepting another request
|
|
concurrently.
|
|
user swift User to run as
|
|
disable_fallocate false Disable "fast fail" fallocate checks if the
|
|
underlying filesystem does not support it.
|
|
log_name swift Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
log_max_line_length 0 Caps the length of log lines to the
|
|
value given; no limit if set to 0, the
|
|
default.
|
|
log_custom_handlers None Comma-separated list of functions to call
|
|
to setup custom log handlers.
|
|
log_udp_host Override log_address
|
|
log_udp_port 514 UDP log port
|
|
log_statsd_host None Enables StatsD logging; IPv4/IPv6
|
|
address or a hostname. If a
|
|
hostname resolves to an IPv4 and IPv6
|
|
address, the IPv4 address will be
|
|
used.
|
|
log_statsd_port 8125
|
|
log_statsd_default_sample_rate 1.0
|
|
log_statsd_sample_rate_factor 1.0
|
|
log_statsd_metric_prefix
|
|
eventlet_debug false If true, turn on debug logging for eventlet
|
|
fallocate_reserve 1% You can set fallocate_reserve to the
|
|
number of bytes or percentage of disk
|
|
space you'd like fallocate to reserve,
|
|
whether there is space for the given
|
|
file size or not. Percentage will be used
|
|
if the value ends with a '%'. This is
|
|
useful for systems that behave badly when
|
|
they completely run out of space; you can
|
|
make the services pretend they're out of
|
|
space early.
|
|
db_preallocation off If you don't mind the extra disk space usage
|
|
in overhead, you can turn this on to preallocate
|
|
disk space with SQLite databases to decrease
|
|
fragmentation.
|
|
nice_priority None Scheduling priority of server processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server processes.
|
|
I/O niceness class values are IOPRIO_CLASS_RT
|
|
(realtime), IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13
|
|
with the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server processes.
|
|
I/O niceness priority is a number which
|
|
goes from 0 to 7. The higher the value,
|
|
the lower the I/O priority of the process.
|
|
Work only with ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
=============================== ========== ============================================
|
|
|
|
******************
|
|
[container-server]
|
|
******************
|
|
|
|
============================== ================ ========================================
|
|
Option Default Description
|
|
------------------------------ ---------------- ----------------------------------------
|
|
use paste.deploy entry point for the
|
|
container server. For most cases, this
|
|
should be `egg:swift#container`.
|
|
set log_name container-server Label used when logging
|
|
set log_facility LOG_LOCAL0 Syslog log facility
|
|
set log_level INFO Logging level
|
|
set log_requests True Whether or not to log each
|
|
request
|
|
set log_address /dev/log Logging directory
|
|
node_timeout 3 Request timeout to external services
|
|
conn_timeout 0.5 Connection timeout to external services
|
|
allow_versions false Enable/Disable object versioning feature
|
|
auto_create_account_prefix . Prefix used when automatically
|
|
replication_server Configure parameter for creating
|
|
specific server. To handle all verbs,
|
|
including replication verbs, do not
|
|
specify "replication_server"
|
|
(this is the default). To only
|
|
handle replication, set to a True
|
|
value (e.g. "True" or "1").
|
|
To handle only non-replication
|
|
verbs, set to "False". Unless you
|
|
have a separate replication network, you
|
|
should not specify any value for
|
|
"replication_server".
|
|
nice_priority None Scheduling priority of server processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server processes.
|
|
I/O niceness class values are
|
|
IOPRIO_CLASS_RT (realtime),
|
|
IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower the I/O
|
|
priority of the process. Work only with
|
|
ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
============================== ================ ========================================
|
|
|
|
**********************
|
|
[container-replicator]
|
|
**********************
|
|
|
|
================== =========================== =============================
|
|
Option Default Description
|
|
------------------ --------------------------- -----------------------------
|
|
log_name container-replicator Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
per_diff 1000 Maximum number of database
|
|
rows that will be sync'd in a
|
|
single HTTP replication
|
|
request. Databases with less
|
|
than or equal to this number
|
|
of differing rows will always
|
|
be sync'd using an HTTP
|
|
replication request rather
|
|
than using rsync.
|
|
max_diffs 100 Maximum number of HTTP
|
|
replication requests attempted
|
|
on each replication pass for
|
|
any one container. This caps
|
|
how long the replicator will
|
|
spend trying to sync a given
|
|
database per pass so the other
|
|
databases don't get starved.
|
|
concurrency 8 Number of replication workers
|
|
to spawn
|
|
interval 30 Time in seconds to wait
|
|
between replication passes
|
|
node_timeout 10 Request timeout to external
|
|
services
|
|
conn_timeout 0.5 Connection timeout to external
|
|
services
|
|
reclaim_age 604800 Time elapsed in seconds before
|
|
a container can be reclaimed
|
|
rsync_module {replication_ip}::container Format of the rsync module
|
|
where the replicator will send
|
|
data. The configuration value
|
|
can include some variables
|
|
that will be extracted from
|
|
the ring. Variables must
|
|
follow the format {NAME} where
|
|
NAME is one of: ip, port,
|
|
replication_ip,
|
|
replication_port, region,
|
|
zone, device, meta. See
|
|
etc/rsyncd.conf-sample for
|
|
some examples.
|
|
rsync_compress no Allow rsync to compress data
|
|
which is transmitted to
|
|
destination node during sync.
|
|
However, this is applicable
|
|
only when destination node is
|
|
in a different region than the
|
|
local one. NOTE: Objects that
|
|
are already compressed (for
|
|
example: .tar.gz, mp3) might
|
|
slow down the syncing process.
|
|
recon_cache_path /var/cache/swift Path to recon cache
|
|
nice_priority None Scheduling priority of server
|
|
processes. Niceness values
|
|
range from -20 (most favorable
|
|
to the process) to 19 (least
|
|
favorable to the process).
|
|
The default does not modify
|
|
priority.
|
|
ionice_class None I/O scheduling class of server
|
|
processes. I/O niceness class
|
|
values are
|
|
IOPRIO_CLASS_RT (realtime),
|
|
IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify
|
|
class and priority. Linux
|
|
supports io scheduling
|
|
priorities and classes since
|
|
2.6.13 with the CFQ io
|
|
scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of
|
|
server processes. I/O niceness
|
|
priority is a number which goes
|
|
from 0 to 7.
|
|
The higher the value, the lower
|
|
the I/O priority of the process.
|
|
Work only with ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE
|
|
is set.
|
|
================== =========================== =============================
|
|
|
|
*******************
|
|
[container-updater]
|
|
*******************
|
|
|
|
======================== ================= ==================================
|
|
Option Default Description
|
|
------------------------ ----------------- ----------------------------------
|
|
log_name container-updater Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
interval 300 Minimum time for a pass to take
|
|
concurrency 4 Number of updater workers to spawn
|
|
node_timeout 3 Request timeout to external
|
|
services
|
|
conn_timeout 0.5 Connection timeout to external
|
|
services
|
|
containers_per_second 50 Maximum containers updated per second.
|
|
Should be tuned according to individual
|
|
system specs. 0 is unlimited.
|
|
|
|
slowdown 0.01 Time in seconds to wait between
|
|
containers. Deprecated in favor of
|
|
containers_per_second.
|
|
account_suppression_time 60 Seconds to suppress updating an
|
|
account that has generated an
|
|
error (timeout, not yet found,
|
|
etc.)
|
|
recon_cache_path /var/cache/swift Path to recon cache
|
|
nice_priority None Scheduling priority of server
|
|
processes. Niceness values range
|
|
from -20 (most favorable to the
|
|
process) to 19 (least favorable
|
|
to the process). The default does
|
|
not modify priority.
|
|
ionice_class None I/O scheduling class of server
|
|
processes. I/O niceness class
|
|
values are IOPRIO_CLASS_RT (realtime),
|
|
IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower
|
|
the I/O priority of the process.
|
|
Work only with ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
======================== ================= ==================================
|
|
|
|
*******************
|
|
[container-auditor]
|
|
*******************
|
|
|
|
===================== ================= =======================================
|
|
Option Default Description
|
|
--------------------- ----------------- ---------------------------------------
|
|
log_name container-auditor Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
interval 1800 Minimum time for a pass to take
|
|
containers_per_second 200 Maximum containers audited per second.
|
|
Should be tuned according to individual
|
|
system specs. 0 is unlimited.
|
|
recon_cache_path /var/cache/swift Path to recon cache
|
|
nice_priority None Scheduling priority of server processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server processes.
|
|
I/O niceness class values are
|
|
IOPRIO_CLASS_RT (realtime),
|
|
IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower the I/O
|
|
priority of the process. Work only with
|
|
ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
===================== ================= =======================================
|
|
|
|
----------------------------
|
|
Account Server Configuration
|
|
----------------------------
|
|
|
|
An example Account Server configuration can be found at
|
|
etc/account-server.conf-sample in the source code repository.
|
|
|
|
The following configuration sections are available:
|
|
|
|
* :ref:`[DEFAULT] <account_server_default_options>`
|
|
* `[account-server]`_
|
|
* `[account-replicator]`_
|
|
* `[account-auditor]`_
|
|
* `[account-reaper]`_
|
|
|
|
.. _account_server_default_options:
|
|
|
|
*********
|
|
[DEFAULT]
|
|
*********
|
|
|
|
=============================== ========== =============================================
|
|
Option Default Description
|
|
------------------------------- ---------- ---------------------------------------------
|
|
swift_dir /etc/swift Swift configuration directory
|
|
devices /srv/node Parent directory or where devices are mounted
|
|
mount_check true Whether or not check if the devices are
|
|
mounted to prevent accidentally writing
|
|
to the root device
|
|
bind_ip 0.0.0.0 IP Address for server to bind to
|
|
bind_port 6202 Port for server to bind to
|
|
bind_timeout 30 Seconds to attempt bind before giving up
|
|
backlog 4096 Maximum number of allowed pending
|
|
connections
|
|
workers auto Override the number of pre-forked workers
|
|
that will accept connections. If set it
|
|
should be an integer, zero means no fork. If
|
|
unset, it will try to default to the number
|
|
of effective cpu cores and fallback to one.
|
|
Increasing the number of workers may reduce
|
|
the possibility of slow file system
|
|
operations in one request from negatively
|
|
impacting other requests. See
|
|
:ref:`general-service-tuning`.
|
|
max_clients 1024 Maximum number of clients one worker can
|
|
process simultaneously (it will actually
|
|
accept(2) N + 1). Setting this to one (1)
|
|
will only handle one request at a time,
|
|
without accepting another request
|
|
concurrently.
|
|
user swift User to run as
|
|
db_preallocation off If you don't mind the extra disk space usage in
|
|
overhead, you can turn this on to preallocate
|
|
disk space with SQLite databases to decrease
|
|
fragmentation.
|
|
disable_fallocate false Disable "fast fail" fallocate checks if the
|
|
underlying filesystem does not support it.
|
|
log_name swift Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
log_max_line_length 0 Caps the length of log lines to the
|
|
value given; no limit if set to 0, the
|
|
default.
|
|
log_custom_handlers None Comma-separated list of functions to call
|
|
to setup custom log handlers.
|
|
log_udp_host Override log_address
|
|
log_udp_port 514 UDP log port
|
|
log_statsd_host None Enables StatsD logging; IPv4/IPv6
|
|
address or a hostname. If a
|
|
hostname resolves to an IPv4 and IPv6
|
|
address, the IPv4 address will be
|
|
used.
|
|
log_statsd_port 8125
|
|
log_statsd_default_sample_rate 1.0
|
|
log_statsd_sample_rate_factor 1.0
|
|
log_statsd_metric_prefix
|
|
eventlet_debug false If true, turn on debug logging for eventlet
|
|
fallocate_reserve 1% You can set fallocate_reserve to the
|
|
number of bytes or percentage of disk
|
|
space you'd like fallocate to reserve,
|
|
whether there is space for the given
|
|
file size or not. Percentage will be used
|
|
if the value ends with a '%'. This is
|
|
useful for systems that behave badly when
|
|
they completely run out of space; you can
|
|
make the services pretend they're out of
|
|
space early.
|
|
nice_priority None Scheduling priority of server processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server processes.
|
|
I/O niceness class values are IOPRIO_CLASS_RT
|
|
(realtime), IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server processes.
|
|
I/O niceness priority is a number which
|
|
goes from 0 to 7. The higher the value,
|
|
the lower the I/O priority of the process.
|
|
Work only with ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
=============================== ========== =============================================
|
|
|
|
****************
|
|
[account-server]
|
|
****************
|
|
|
|
============================= ============== ==========================================
|
|
Option Default Description
|
|
----------------------------- -------------- ------------------------------------------
|
|
use Entry point for paste.deploy for the account
|
|
server. For most cases, this should be
|
|
`egg:swift#account`.
|
|
set log_name account-server Label used when logging
|
|
set log_facility LOG_LOCAL0 Syslog log facility
|
|
set log_level INFO Logging level
|
|
set log_requests True Whether or not to log each
|
|
request
|
|
set log_address /dev/log Logging directory
|
|
auto_create_account_prefix . Prefix used when automatically
|
|
creating accounts.
|
|
replication_server Configure parameter for creating
|
|
specific server. To handle all verbs,
|
|
including replication verbs, do not
|
|
specify "replication_server"
|
|
(this is the default). To only
|
|
handle replication, set to a True
|
|
value (e.g. "True" or "1").
|
|
To handle only non-replication
|
|
verbs, set to "False". Unless you
|
|
have a separate replication network, you
|
|
should not specify any value for
|
|
"replication_server".
|
|
nice_priority None Scheduling priority of server processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server processes.
|
|
I/O niceness class values are IOPRIO_CLASS_RT
|
|
(realtime), IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower the I/O
|
|
priority of the process. Work only with
|
|
ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
============================= ============== ==========================================
|
|
|
|
********************
|
|
[account-replicator]
|
|
********************
|
|
|
|
================== ========================= ===============================
|
|
Option Default Description
|
|
------------------ ------------------------- -------------------------------
|
|
log_name account-replicator Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
per_diff 1000 Maximum number of database rows
|
|
that will be sync'd in a single
|
|
HTTP replication request.
|
|
Databases with less than or
|
|
equal to this number of
|
|
differing rows will always be
|
|
sync'd using an HTTP replication
|
|
request rather than using rsync.
|
|
max_diffs 100 Maximum number of HTTP
|
|
replication requests attempted
|
|
on each replication pass for any
|
|
one container. This caps how
|
|
long the replicator will spend
|
|
trying to sync a given database
|
|
per pass so the other databases
|
|
don't get starved.
|
|
concurrency 8 Number of replication workers
|
|
to spawn
|
|
interval 30 Time in seconds to wait between
|
|
replication passes
|
|
node_timeout 10 Request timeout to external
|
|
services
|
|
conn_timeout 0.5 Connection timeout to external
|
|
services
|
|
reclaim_age 604800 Time elapsed in seconds before
|
|
an account can be reclaimed
|
|
rsync_module {replication_ip}::account Format of the rsync module where
|
|
the replicator will send data.
|
|
The configuration value can
|
|
include some variables that will
|
|
be extracted from the ring.
|
|
Variables must follow the format
|
|
{NAME} where NAME is one of: ip,
|
|
port, replication_ip,
|
|
replication_port, region, zone,
|
|
device, meta. See
|
|
etc/rsyncd.conf-sample for some
|
|
examples.
|
|
rsync_compress no Allow rsync to compress data
|
|
which is transmitted to
|
|
destination node during sync.
|
|
However, this is applicable only
|
|
when destination node is in a
|
|
different region than the local
|
|
one. NOTE: Objects that are
|
|
already compressed (for example:
|
|
.tar.gz, mp3) might slow down
|
|
the syncing process.
|
|
recon_cache_path /var/cache/swift Path to recon cache
|
|
nice_priority None Scheduling priority of server
|
|
processes. Niceness values
|
|
range from -20 (most favorable
|
|
to the process) to 19 (least
|
|
favorable to the process).
|
|
The default does not modify
|
|
priority.
|
|
ionice_class None I/O scheduling class of server
|
|
processes. I/O niceness class
|
|
values are IOPRIO_CLASS_RT
|
|
(realtime), IOPRIO_CLASS_BE
|
|
(best-effort), and IOPRIO_CLASS_IDLE
|
|
(idle).
|
|
The default does not modify
|
|
class and priority. Linux supports
|
|
io scheduling priorities and classes
|
|
since 2.6.13 with the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority
|
|
is a number which goes from 0 to 7.
|
|
The higher the value, the lower
|
|
the I/O priority of the process.
|
|
Work only with ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE
|
|
is set.
|
|
================== ========================= ===============================
|
|
|
|
*****************
|
|
[account-auditor]
|
|
*****************
|
|
|
|
==================== ================ =======================================
|
|
Option Default Description
|
|
-------------------- ---------------- ---------------------------------------
|
|
log_name account-auditor Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
interval 1800 Minimum time for a pass to take
|
|
accounts_per_second 200 Maximum accounts audited per second.
|
|
Should be tuned according to individual
|
|
system specs. 0 is unlimited.
|
|
recon_cache_path /var/cache/swift Path to recon cache
|
|
nice_priority None Scheduling priority of server processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server processes.
|
|
I/O niceness class values are
|
|
IOPRIO_CLASS_RT (realtime),
|
|
IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower the I/O
|
|
priority of the process. Work only with
|
|
ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
==================== ================ =======================================
|
|
|
|
****************
|
|
[account-reaper]
|
|
****************
|
|
|
|
================== =============== =========================================
|
|
Option Default Description
|
|
------------------ --------------- -----------------------------------------
|
|
log_name account-reaper Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_address /dev/log Logging directory
|
|
concurrency 25 Number of replication workers to spawn
|
|
interval 3600 Minimum time for a pass to take
|
|
node_timeout 10 Request timeout to external services
|
|
conn_timeout 0.5 Connection timeout to external services
|
|
delay_reaping 0 Normally, the reaper begins deleting
|
|
account information for deleted accounts
|
|
immediately; you can set this to delay
|
|
its work however. The value is in seconds,
|
|
2592000 = 30 days, for example. The sum of
|
|
this value and the container-updater
|
|
``interval`` should be less than the
|
|
account-replicator ``reclaim_age``. This
|
|
ensures that once the account-reaper has
|
|
deleted a container there is sufficient
|
|
time for the container-updater to report
|
|
to the account before the account DB is
|
|
removed.
|
|
reap_warn_after 2892000 If the account fails to be be reaped due
|
|
to a persistent error, the account reaper
|
|
will log a message such as:
|
|
Account <name> has not been reaped since <date>
|
|
You can search logs for this message if
|
|
space is not being reclaimed after you
|
|
delete account(s). This is in addition to
|
|
any time requested by delay_reaping.
|
|
nice_priority None Scheduling priority of server processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server processes.
|
|
I/O niceness class values are IOPRIO_CLASS_RT
|
|
(realtime), IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower the I/O
|
|
priority of the process. Work only with
|
|
ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
================== =============== =========================================
|
|
|
|
.. _proxy-server-config:
|
|
|
|
--------------------------
|
|
Proxy Server Configuration
|
|
--------------------------
|
|
|
|
An example Proxy Server configuration can be found at
|
|
etc/proxy-server.conf-sample in the source code repository.
|
|
|
|
The following configuration sections are available:
|
|
|
|
An example Account Server configuration can be found at
|
|
etc/account-server.conf-sample in the source code repository.
|
|
|
|
The following configuration sections are available:
|
|
|
|
* :ref:`[DEFAULT] <proxy_server_default_options>`
|
|
* `[proxy-server]`_
|
|
* Individual sections for `Proxy middlewares`_
|
|
|
|
.. _proxy_server_default_options:
|
|
|
|
*********
|
|
[DEFAULT]
|
|
*********
|
|
|
|
==================================== ======================== ========================================
|
|
Option Default Description
|
|
------------------------------------ ------------------------ ----------------------------------------
|
|
bind_ip 0.0.0.0 IP Address for server to
|
|
bind to
|
|
bind_port 80 Port for server to bind to
|
|
bind_timeout 30 Seconds to attempt bind before
|
|
giving up
|
|
backlog 4096 Maximum number of allowed pending
|
|
connections
|
|
swift_dir /etc/swift Swift configuration directory
|
|
workers auto Override the number of
|
|
pre-forked workers that will
|
|
accept connections. If set it
|
|
should be an integer, zero
|
|
means no fork. If unset, it
|
|
will try to default to the
|
|
number of effective cpu cores
|
|
and fallback to one. See
|
|
:ref:`general-service-tuning`.
|
|
max_clients 1024 Maximum number of clients one
|
|
worker can process
|
|
simultaneously (it will
|
|
actually accept(2) N +
|
|
1). Setting this to one (1)
|
|
will only handle one request at
|
|
a time, without accepting
|
|
another request
|
|
concurrently.
|
|
user swift User to run as
|
|
cert_file Path to the ssl .crt. This
|
|
should be enabled for testing
|
|
purposes only.
|
|
key_file Path to the ssl .key. This
|
|
should be enabled for testing
|
|
purposes only.
|
|
cors_allow_origin This is a list of hosts that
|
|
are included with any CORS
|
|
request by default and
|
|
returned with the
|
|
Access-Control-Allow-Origin
|
|
header in addition to what
|
|
the container has set.
|
|
strict_cors_mode True
|
|
cors_expose_headers This is a list of headers that
|
|
are included in the header
|
|
Access-Control-Expose-Headers
|
|
in addition to what the container
|
|
has set.
|
|
client_timeout 60
|
|
trans_id_suffix This optional suffix (default is empty)
|
|
that would be appended to the swift
|
|
transaction id allows one to easily
|
|
figure out from which cluster that
|
|
X-Trans-Id belongs to. This is very
|
|
useful when one is managing more than
|
|
one swift cluster.
|
|
log_name swift Label used when logging
|
|
log_facility LOG_LOCAL0 Syslog log facility
|
|
log_level INFO Logging level
|
|
log_headers False
|
|
log_address /dev/log Logging directory
|
|
log_max_line_length 0 Caps the length of log
|
|
lines to the value given;
|
|
no limit if set to 0, the
|
|
default.
|
|
log_custom_handlers None Comma separated list of functions
|
|
to call to setup custom log
|
|
handlers.
|
|
log_udp_host Override log_address
|
|
log_udp_port 514 UDP log port
|
|
log_statsd_host None Enables StatsD logging; IPv4/IPv6
|
|
address or a hostname. If a
|
|
hostname resolves to an IPv4 and IPv6
|
|
address, the IPv4 address will be
|
|
used.
|
|
log_statsd_port 8125
|
|
log_statsd_default_sample_rate 1.0
|
|
log_statsd_sample_rate_factor 1.0
|
|
log_statsd_metric_prefix
|
|
eventlet_debug false If true, turn on debug logging
|
|
for eventlet
|
|
|
|
expose_info true Enables exposing configuration
|
|
settings via HTTP GET /info.
|
|
admin_key Key to use for admin calls that
|
|
are HMAC signed. Default
|
|
is empty, which will
|
|
disable admin calls to
|
|
/info.
|
|
disallowed_sections swift.valid_api_versions Allows the ability to withhold
|
|
sections from showing up in the
|
|
public calls to /info. You can
|
|
withhold subsections by separating
|
|
the dict level with a ".".
|
|
expiring_objects_container_divisor 86400
|
|
expiring_objects_account_name expiring_objects
|
|
nice_priority None Scheduling priority of server
|
|
processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server
|
|
processes. I/O niceness class values
|
|
are IOPRIO_CLASS_RT (realtime),
|
|
IOPRIO_CLASS_BE (best-effort) and
|
|
IOPRIO_CLASS_IDLE (idle).
|
|
The default does not
|
|
modify class and priority. Linux
|
|
supports io scheduling priorities
|
|
and classes since 2.6.13 with
|
|
the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower
|
|
the I/O priority of the process.
|
|
Work only with ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
==================================== ======================== ========================================
|
|
|
|
**************
|
|
[proxy-server]
|
|
**************
|
|
|
|
====================================== =============== =====================================
|
|
Option Default Description
|
|
-------------------------------------- --------------- -------------------------------------
|
|
use Entry point for paste.deploy for
|
|
the proxy server. For most
|
|
cases, this should be
|
|
`egg:swift#proxy`.
|
|
set log_name proxy-server Label used when logging
|
|
set log_facility LOG_LOCAL0 Syslog log facility
|
|
set log_level INFO Log level
|
|
set log_headers True If True, log headers in each
|
|
request
|
|
set log_handoffs True If True, the proxy will log
|
|
whenever it has to failover to a
|
|
handoff node
|
|
recheck_account_existence 60 Cache timeout in seconds to
|
|
send memcached for account
|
|
existence
|
|
recheck_container_existence 60 Cache timeout in seconds to
|
|
send memcached for container
|
|
existence
|
|
object_chunk_size 65536 Chunk size to read from
|
|
object servers
|
|
client_chunk_size 65536 Chunk size to read from
|
|
clients
|
|
memcache_servers 127.0.0.1:11211 Comma separated list of
|
|
memcached servers
|
|
ip:port or [ipv6addr]:port
|
|
memcache_max_connections 2 Max number of connections to
|
|
each memcached server per
|
|
worker
|
|
node_timeout 10 Request timeout to external
|
|
services
|
|
recoverable_node_timeout node_timeout Request timeout to external
|
|
services for requests that, on
|
|
failure, can be recovered
|
|
from. For example, object GET.
|
|
client_timeout 60 Timeout to read one chunk
|
|
from a client
|
|
conn_timeout 0.5 Connection timeout to
|
|
external services
|
|
error_suppression_interval 60 Time in seconds that must
|
|
elapse since the last error
|
|
for a node to be considered
|
|
no longer error limited
|
|
error_suppression_limit 10 Error count to consider a
|
|
node error limited
|
|
allow_account_management false Whether account PUTs and DELETEs
|
|
are even callable
|
|
account_autocreate false If set to 'true' authorized
|
|
accounts that do not yet exist
|
|
within the Swift cluster will
|
|
be automatically created.
|
|
max_containers_per_account 0 If set to a positive value,
|
|
trying to create a container
|
|
when the account already has at
|
|
least this maximum containers
|
|
will result in a 403 Forbidden.
|
|
Note: This is a soft limit,
|
|
meaning a user might exceed the
|
|
cap for
|
|
recheck_account_existence before
|
|
the 403s kick in.
|
|
max_containers_whitelist This is a comma separated list
|
|
of account names that ignore
|
|
the max_containers_per_account
|
|
cap.
|
|
rate_limit_after_segment 10 Rate limit the download of
|
|
large object segments after
|
|
this segment is downloaded.
|
|
rate_limit_segments_per_sec 1 Rate limit large object
|
|
downloads at this rate.
|
|
request_node_count 2 * replicas Set to the number of nodes to
|
|
contact for a normal request.
|
|
You can use '* replicas' at the
|
|
end to have it use the number
|
|
given times the number of
|
|
replicas for the ring being used
|
|
for the request.
|
|
swift_owner_headers <see the sample These are the headers whose
|
|
conf file for values will only be shown to
|
|
the list of swift_owners. The exact
|
|
default definition of a swift_owner is
|
|
headers> up to the auth system in use,
|
|
but usually indicates
|
|
administrative responsibilities.
|
|
sorting_method shuffle Storage nodes can be chosen at
|
|
random (shuffle), by using timing
|
|
measurements (timing), or by using
|
|
an explicit match (affinity).
|
|
Using timing measurements may allow
|
|
for lower overall latency, while
|
|
using affinity allows for finer
|
|
control. In both the timing and
|
|
affinity cases, equally-sorting nodes
|
|
are still randomly chosen to spread
|
|
load. This option may be overridden
|
|
in a per-policy configuration
|
|
section.
|
|
timing_expiry 300 If the "timing" sorting_method is
|
|
used, the timings will only be valid
|
|
for the number of seconds configured
|
|
by timing_expiry.
|
|
concurrent_gets off Use replica count number of
|
|
threads concurrently during a
|
|
GET/HEAD and return with the
|
|
first successful response. In
|
|
the EC case, this parameter only
|
|
affects an EC HEAD as an EC GET
|
|
behaves differently.
|
|
concurrency_timeout conn_timeout This parameter controls how long
|
|
to wait before firing off the
|
|
next concurrent_get thread. A
|
|
value of 0 would we fully concurrent,
|
|
any other number will stagger the
|
|
firing of the threads. This number
|
|
should be between 0 and node_timeout.
|
|
The default is conn_timeout (0.5).
|
|
nice_priority None Scheduling priority of server
|
|
processes.
|
|
Niceness values range from -20 (most
|
|
favorable to the process) to 19 (least
|
|
favorable to the process). The default
|
|
does not modify priority.
|
|
ionice_class None I/O scheduling class of server
|
|
processes. I/O niceness class values
|
|
are IOPRIO_CLASS_RT (realtime),
|
|
IOPRIO_CLASS_BE (best-effort),
|
|
and IOPRIO_CLASS_IDLE (idle).
|
|
The default does not modify class and
|
|
priority. Linux supports io scheduling
|
|
priorities and classes since 2.6.13
|
|
with the CFQ io scheduler.
|
|
Work only with ionice_priority.
|
|
ionice_priority None I/O scheduling priority of server
|
|
processes. I/O niceness priority is
|
|
a number which goes from 0 to 7.
|
|
The higher the value, the lower the
|
|
I/O priority of the process. Work
|
|
only with ionice_class.
|
|
Ignored if IOPRIO_CLASS_IDLE is set.
|
|
read_affinity None Specifies which backend servers to
|
|
prefer on reads; used in conjunction
|
|
with the sorting_method option being
|
|
set to 'affinity'. Format is a comma
|
|
separated list of affinity descriptors
|
|
of the form <selection>=<priority>.
|
|
The <selection> may be r<N> for
|
|
selecting nodes in region N or
|
|
r<N>z<M> for selecting nodes in
|
|
region N, zone M. The <priority>
|
|
value should be a whole number
|
|
that represents the priority to
|
|
be given to the selection; lower
|
|
numbers are higher priority.
|
|
Default is empty, meaning no
|
|
preference. This option may be
|
|
overridden in a per-policy
|
|
configuration section.
|
|
write_affinity None Specifies which backend servers to
|
|
prefer on writes. Format is a comma
|
|
separated list of affinity
|
|
descriptors of the form r<N> for
|
|
region N or r<N>z<M> for region N,
|
|
zone M. Default is empty, meaning no
|
|
preference. This option may be
|
|
overridden in a per-policy
|
|
configuration section.
|
|
write_affinity_node_count 2 * replicas The number of local (as governed by
|
|
the write_affinity setting) nodes to
|
|
attempt to contact first on writes,
|
|
before any non-local ones. The value
|
|
should be an integer number, or use
|
|
'* replicas' at the end to have it
|
|
use the number given times the number
|
|
of replicas for the ring being used
|
|
for the request. This option may be
|
|
overridden in a per-policy
|
|
configuration section.
|
|
write_affinity_handoff_delete_count auto The number of local (as governed by
|
|
the write_affinity setting) handoff
|
|
nodes to attempt to contact on
|
|
deletion, in addition to primary
|
|
nodes. Example: in geographically
|
|
distributed deployment, If replicas=3,
|
|
sometimes there may be 1 primary node
|
|
and 2 local handoff nodes in one region
|
|
holding the object after uploading but
|
|
before object replicated to the
|
|
appropriate locations in other regions.
|
|
In this case, include these handoff
|
|
nodes to send request when deleting
|
|
object could help make correct decision
|
|
for the response. The default value 'auto'
|
|
means Swift will calculate the number
|
|
automatically, the default value is
|
|
(replicas - len(local_primary_nodes)).
|
|
This option may be overridden in a
|
|
per-policy configuration section.
|
|
====================================== =============== =====================================
|
|
|
|
.. _proxy_server_per_policy_config:
|
|
|
|
************************
|
|
Per policy configuration
|
|
************************
|
|
|
|
Some proxy-server configuration options may be overridden for individual
|
|
:doc:`overview_policies` by including per-policy config section(s). These
|
|
options are:
|
|
|
|
- ``sorting_method``
|
|
- ``read_affinity``
|
|
- ``write_affinity``
|
|
- ``write_affinity_node_count``
|
|
- ``write_affinity_handoff_delete_count``
|
|
|
|
The per-policy config section name must be of the form::
|
|
|
|
[proxy-server:policy:<policy index>]
|
|
|
|
.. note::
|
|
|
|
The per-policy config section name should refer to the policy index, not
|
|
the policy name.
|
|
|
|
.. note::
|
|
|
|
The first part of proxy-server config section name must match the name of
|
|
the proxy-server config section. This is typically ``proxy-server`` as
|
|
shown above, but if different then the names of any per-policy config
|
|
sections must be changed accordingly.
|
|
|
|
The value of an option specified in a per-policy section will override any
|
|
value given in the proxy-server section for that policy only. Otherwise the
|
|
value of these options will be that specified in the proxy-server section.
|
|
|
|
For example, the following section provides policy-specific options for a
|
|
policy with index ``3``::
|
|
|
|
[proxy-server:policy:3]
|
|
sorting_method = affinity
|
|
read_affinity = r2=1
|
|
write_affinity = r2
|
|
write_affinity_node_count = 1 * replicas
|
|
write_affinity_handoff_delete_count = 2
|
|
|
|
.. note::
|
|
|
|
It is recommended that per-policy config options are *not* included in the
|
|
``[DEFAULT]`` section. If they are then the following behavior applies.
|
|
|
|
Per-policy config sections will inherit options in the ``[DEFAULT]``
|
|
section of the config file, and any such inheritance will take precedence
|
|
over inheriting options from the proxy-server config section.
|
|
|
|
Per-policy config section options will override options in the
|
|
``[DEFAULT]`` section. Unlike the behavior described under `General Server
|
|
Configuration`_ for paste-deploy ``filter`` and ``app`` sections, the
|
|
``set`` keyword is not required for options to override in per-policy
|
|
config sections.
|
|
|
|
For example, given the following settings in a config file::
|
|
|
|
[DEFAULT]
|
|
sorting_method = affinity
|
|
read_affinity = r0=100
|
|
write_affinity = r0
|
|
|
|
[app:proxy-server]
|
|
use = egg:swift#proxy
|
|
# use of set keyword here overrides [DEFAULT] option
|
|
set read_affinity = r1=100
|
|
# without set keyword, [DEFAULT] option overrides in a paste-deploy section
|
|
write_affinity = r1
|
|
|
|
[proxy-server:policy:0]
|
|
sorting_method = affinity
|
|
# set keyword not required here to override [DEFAULT] option
|
|
write_affinity = r1
|
|
|
|
would result in policy with index ``0`` having settings:
|
|
|
|
* ``read_affinity = r0=100`` (inherited from the ``[DEFAULT]`` section)
|
|
* ``write_affinity = r1`` (specified in the policy 0 section)
|
|
|
|
and any other policy would have the default settings of:
|
|
|
|
* ``read_affinity = r1=100`` (set in the proxy-server section)
|
|
* ``write_affinity = r0`` (inherited from the ``[DEFAULT]`` section)
|
|
|
|
*****************
|
|
Proxy Middlewares
|
|
*****************
|
|
|
|
Many features in Swift are implemented as middleware in the proxy-server
|
|
pipeline. See :doc:`middleware` and the ``proxy-server.conf-sample`` file for
|
|
more information. In particular, the use of some type of :doc:`authentication
|
|
and authorization middleware <overview_auth>` is highly recommended.
|
|
|
|
|
|
------------------------
|
|
Memcached Considerations
|
|
------------------------
|
|
|
|
Several of the Services rely on Memcached for caching certain types of
|
|
lookups, such as auth tokens, and container/account existence. Swift does
|
|
not do any caching of actual object data. Memcached should be able to run
|
|
on any servers that have available RAM and CPU. At Rackspace, we run
|
|
Memcached on the proxy servers. The `memcache_servers` config option
|
|
in the `proxy-server.conf` should contain all memcached servers.
|
|
|
|
-----------
|
|
System Time
|
|
-----------
|
|
|
|
Time may be relative but it is relatively important for Swift! Swift uses
|
|
timestamps to determine which is the most recent version of an object.
|
|
It is very important for the system time on each server in the cluster to
|
|
by synced as closely as possible (more so for the proxy server, but in general
|
|
it is a good idea for all the servers). At Rackspace, we use NTP with a local
|
|
NTP server to ensure that the system times are as close as possible. This
|
|
should also be monitored to ensure that the times do not vary too much.
|
|
|
|
.. _general-service-tuning:
|
|
|
|
----------------------
|
|
General Service Tuning
|
|
----------------------
|
|
|
|
Most services support either a `worker` or `concurrency` value in the
|
|
settings. This allows the services to make effective use of the cores
|
|
available. A good starting point to set the concurrency level for the proxy
|
|
and storage services to 2 times the number of cores available. If more than
|
|
one service is sharing a server, then some experimentation may be needed to
|
|
find the best balance.
|
|
|
|
At Rackspace, our Proxy servers have dual quad core processors, giving us 8
|
|
cores. Our testing has shown 16 workers to be a pretty good balance when
|
|
saturating a 10g network and gives good CPU utilization.
|
|
|
|
Our Storage server processes all run together on the same servers. These servers have
|
|
dual quad core processors, for 8 cores total. We run the Account, Container,
|
|
and Object servers with 8 workers each. Most of the background jobs are run at
|
|
a concurrency of 1, with the exception of the replicators which are run at a
|
|
concurrency of 2.
|
|
|
|
The `max_clients` parameter can be used to adjust the number of client
|
|
requests an individual worker accepts for processing. The fewer requests being
|
|
processed at one time, the less likely a request that consumes the worker's
|
|
CPU time, or blocks in the OS, will negatively impact other requests. The more
|
|
requests being processed at one time, the more likely one worker can utilize
|
|
network and disk capacity.
|
|
|
|
On systems that have more cores, and more memory, where one can afford to run
|
|
more workers, raising the number of workers and lowering the maximum number of
|
|
clients serviced per worker can lessen the impact of CPU intensive or stalled
|
|
requests.
|
|
|
|
The `nice_priority` parameter can be used to set program scheduling priority.
|
|
The `ionice_class` and `ionice_priority` parameters can be used to set I/O scheduling
|
|
class and priority on the systems that use an I/O scheduler that supports
|
|
I/O priorities. As at kernel 2.6.17 the only such scheduler is the Completely
|
|
Fair Queuing (CFQ) I/O scheduler. If you run your Storage servers all together
|
|
on the same servers, you can slow down the auditors or prioritize
|
|
object-server I/O via these parameters (but probably do not need to change
|
|
it on the proxy). It is a new feature and the best practices are still
|
|
being developed. On some systems it may be required to run the daemons as root.
|
|
For more info also see setpriority(2) and ioprio_set(2).
|
|
|
|
The above configuration setting should be taken as suggestions and testing
|
|
of configuration settings should be done to ensure best utilization of CPU,
|
|
network connectivity, and disk I/O.
|
|
|
|
-------------------------
|
|
Filesystem Considerations
|
|
-------------------------
|
|
|
|
Swift is designed to be mostly filesystem agnostic--the only requirement
|
|
being that the filesystem supports extended attributes (xattrs). After
|
|
thorough testing with our use cases and hardware configurations, XFS was
|
|
the best all-around choice. If you decide to use a filesystem other than
|
|
XFS, we highly recommend thorough testing.
|
|
|
|
For distros with more recent kernels (for example Ubuntu 12.04 Precise),
|
|
we recommend using the default settings (including the default inode size
|
|
of 256 bytes) when creating the file system::
|
|
|
|
mkfs.xfs /dev/sda1
|
|
|
|
In the last couple of years, XFS has made great improvements in how inodes
|
|
are allocated and used. Using the default inode size no longer has an
|
|
impact on performance.
|
|
|
|
For distros with older kernels (for example Ubuntu 10.04 Lucid),
|
|
some settings can dramatically impact performance. We recommend the
|
|
following when creating the file system::
|
|
|
|
mkfs.xfs -i size=1024 /dev/sda1
|
|
|
|
Setting the inode size is important, as XFS stores xattr data in the inode.
|
|
If the metadata is too large to fit in the inode, a new extent is created,
|
|
which can cause quite a performance problem. Upping the inode size to 1024
|
|
bytes provides enough room to write the default metadata, plus a little
|
|
headroom.
|
|
|
|
The following example mount options are recommended when using XFS::
|
|
|
|
mount -t xfs -o noatime,nodiratime,nobarrier,logbufs=8 /dev/sda1 /srv/node/sda
|
|
|
|
We do not recommend running Swift on RAID, but if you are using
|
|
RAID it is also important to make sure that the proper sunit and swidth
|
|
settings get set so that XFS can make most efficient use of the RAID array.
|
|
|
|
For a standard Swift install, all data drives are mounted directly under
|
|
``/srv/node`` (as can be seen in the above example of mounting ``/dev/sda1`` as
|
|
``/srv/node/sda``). If you choose to mount the drives in another directory,
|
|
be sure to set the `devices` config option in all of the server configs to
|
|
point to the correct directory.
|
|
|
|
The mount points for each drive in ``/srv/node/`` should be owned by the root user
|
|
almost exclusively (``root:root 755``). This is required to prevent rsync from
|
|
syncing files into the root drive in the event a drive is unmounted.
|
|
|
|
Swift uses system calls to reserve space for new objects being written into
|
|
the system. If your filesystem does not support `fallocate()` or
|
|
`posix_fallocate()`, be sure to set the `disable_fallocate = true` config
|
|
parameter in account, container, and object server configs.
|
|
|
|
Most current Linux distributions ship with a default installation of updatedb.
|
|
This tool runs periodically and updates the file name database that is used by
|
|
the GNU locate tool. However, including Swift object and container database
|
|
files is most likely not required and the periodic update affects the
|
|
performance quite a bit. To disable the inclusion of these files add the path
|
|
where Swift stores its data to the setting PRUNEPATHS in `/etc/updatedb.conf`::
|
|
|
|
PRUNEPATHS="... /tmp ... /var/spool ... /srv/node"
|
|
|
|
|
|
---------------------
|
|
General System Tuning
|
|
---------------------
|
|
|
|
Rackspace currently runs Swift on Ubuntu Server 10.04, and the following
|
|
changes have been found to be useful for our use cases.
|
|
|
|
The following settings should be in `/etc/sysctl.conf`::
|
|
|
|
# disable TIME_WAIT.. wait..
|
|
net.ipv4.tcp_tw_recycle=1
|
|
net.ipv4.tcp_tw_reuse=1
|
|
|
|
# disable syn cookies
|
|
net.ipv4.tcp_syncookies = 0
|
|
|
|
# double amount of allowed conntrack
|
|
net.ipv4.netfilter.ip_conntrack_max = 262144
|
|
|
|
To load the updated sysctl settings, run ``sudo sysctl -p``
|
|
|
|
A note about changing the TIME_WAIT values. By default the OS will hold
|
|
a port open for 60 seconds to ensure that any remaining packets can be
|
|
received. During high usage, and with the number of connections that are
|
|
created, it is easy to run out of ports. We can change this since we are
|
|
in control of the network. If you are not in control of the network, or
|
|
do not expect high loads, then you may not want to adjust those values.
|
|
|
|
----------------------
|
|
Logging Considerations
|
|
----------------------
|
|
|
|
Swift is set up to log directly to syslog. Every service can be configured
|
|
with the `log_facility` option to set the syslog log facility destination.
|
|
We recommended using syslog-ng to route the logs to specific log
|
|
files locally on the server and also to remote log collecting servers.
|
|
Additionally, custom log handlers can be used via the custom_log_handlers
|
|
setting.
|