If the global configuration option 'enable_open_expired' is set to true in the config, then the client will be able to make a request with the header 'x-open-expired' set to true in order to access an object that has expired, provided it is in its grace period. If this config flag is set to false, the client will not be able to access any expired objects, even with the header, which is the default behavior unless the flag is set. When a client sets a 'x-open-expired' header to a true value for a GET/HEAD/POST request the proxy will forward x-backend-open-expired to storage server. The storage server will allow clients that set x-backend-open-expired to open and read an object that has not yet been reaped by the object-expirer, even after the x-delete-at time has passed. The header is always ignored when used with temporary URLs. Co-Authored-By: Anish Kachinthaya <akachinthaya@nvidia.com> Related-Change: I106103438c4162a561486ac73a09436e998ae1f0 Change-Id: Ibe7dde0e3bf587d77e14808b169c02f8fb3dddb3
12 KiB
Expiring Object Support
The swift-object-expirer
offers scheduled deletion of
objects. The Swift client would use the X-Delete-At
or
X-Delete-After
headers during an object PUT
or
POST
and the cluster would automatically quit serving that
object at the specified time and would shortly thereafter remove the
object from the system.
The X-Delete-At
header takes a Unix Epoch timestamp, in
integer form; for example: 1317070737
represents
Mon Sep 26 20:58:57 2011 UTC
.
The X-Delete-After
header takes a positive integer
number of seconds. The proxy server that receives the request will
convert this header into an X-Delete-At
header using the
request timestamp plus the value given.
If both the X-Delete-At
and X-Delete-After
headers are sent with a request then the X-Delete-After
header will take precedence.
As expiring objects are added to the system, the object servers will
record the expirations in a hidden .expiring_objects
account for the swift-object-expirer
to handle later.
Usually, just one instance of the swift-object-expirer
daemon needs to run for a cluster. This isn't exactly automatic failover
high availability, but if this daemon doesn't run for a few hours it
should not be any real issue. The expired-but-not-yet-deleted objects
will still 404 Not Found
if someone tries to
GET
or HEAD
them and they'll just be deleted a
bit later when the daemon is restarted.
By default, the swift-object-expirer
daemon will run
with a concurrency of 1. Increase this value to get more concurrency. A
concurrency of 1 may not be enough to delete expiring objects in a
timely fashion for a particular Swift cluster.
It is possible to run multiple daemons to do different parts of the work if a single process with a concurrency of more than 1 is not enough (see the sample config file for details).
To run the swift-object-expirer
as multiple processes,
set processes
to the number of processes (either in the
config file or on the command line). Then run one process for each part.
Use process
to specify the part of the work to be done by a
process using the command line or the config. So, for example, if you'd
like to run three processes, set processes
to 3 and run
three processes with process
set to 0, 1, and 2 for the
three processes. If multiple processes are used, it's necessary to run
one for each part of the work or that part of the work will not be
done.
By default the daemon looks for two different config files. When
launching, the process searches for the [object-expirer]
section in the
/etc/swift/object-server.conf
config. If the section or
the config is missing it will then look for and use the
/etc/swift/object-expirer.conf
config. The latter config
file is considered deprecated and is searched for to aid in cluster
upgrades.
Delay Reaping of Objects from Disk
Swift's expiring object x-delete-at
feature can be used
to have the cluster reap user's objects automatically from disk on their
behalf when they no longer want them stored in their account. In some
cases it may be necessary to "intervene" in the expected expiration
process to prevent accidental or premature data loss if an object marked
for expiration should NOT be deleted immediately when it expires for
whatever reason. In these cases swift-object-expirer
offers
configuration of a delay_reaping
value on accounts and
containers, which provides a delay between when an object is marked for
deletion, or expired, and when it is actually reaped from disk. When
this is set in the object expirer config the object expirer leaves
expired objects on disk (and in container listings) for the
delay_reaping
time. After this delay has passed objects
will be reaped as normal.
The delay_reaping
value can be set either at an account
level or a container level. When set at an account level, the object
expirer will only reap objects within the account after the delay. A
container level delay_reaping
works similarly for
containers and overrides an account level delay_reaping
value.
The delay_reaping
values are set in the
[object-expirer]
section in either the object-server or
object-expirer config files. They are configured with dynamic config
option names prefixed with delay_reaping_<ACCT>
at
the account level and
delay_reaping_<ACCT>/<CNTR>
at the container
level, with the delay_reaping
value in seconds.
Here is an example of delay_reaping
configs in
theobject-expirer
section in the
object-server.conf
:
[object-expirer]
delay_reaping_AUTH_test = 300.0
delay_reaping_AUTH_test2 = 86400.0
delay_reaping_AUTH_test/test = 0.0
delay_reaping_AUTH_test/test2 = 600.0
Note
A container level delay_reaping
value does not require
an account level delay_reaping
value but overrides the
account level value for the same account if it exists. By default, no
delay_reaping
value is configured for any accounts or
containers.
Accessing Objects After Expiration
By default, objects that expire become inaccessible, even to the
account owner. The object may not have been deleted, but any
GET/HEAD/POST client request for the object will respond 404 Not Found
after the x-delete-at
timestamp has passed.
The swift-proxy-server
offers the ability to globally
configure a flag to allow requests to access expired objects that have
not yet been deleted. When this flag is enabled, a user can make a GET,
HEAD, or POST request with the header x-open-expired
set to
true to access the expired object.
The global configuration is an opt-in flag that can be set in the
[proxy-server]
section of the
proxy-server.conf
file. It is configured with a single flag
allow_open_expired
set to true or false. By default, this
flag is set to false.
Here is an example in the proxy-server
section in
proxy-server.conf
:
[proxy-server]
allow_open_expired = false
To discover whether this flag is set, you can send a
GET request to the /info
discoverability <discoverability>
path. This
will return configuration data in JSON format where the value of
allow_open_expired
is exposed.
When using a temporary URL to access the object, this feature is not enabled. This means that adding the header will not allow requests to temporary URLs to access expired objects.
Upgrading impact: General Task Queue vs Legacy Queue
The expirer daemon will be moving to a new general task-queue based
design that will divide the work across all object servers, as such only
expirers defined in the object-server config will be able to use the new
system. The parameters in both files are identical except for a new
option in the object-server [object-expirer]
section,
dequeue_from_legacy
which when set to True
will tell the expirer that in addition to using the new task queueing
system to also check the legacy (soon to be deprecated) queue.
Note
The new task-queue system has not been completed yet. So an expirer's
with dequeue_from_legacy
set to False
will
currently do nothing.
By default dequeue_from_legacy
will be
False
, it is necessary to be set to True
explicitly while migrating from the old expiring queue.
Any expirer using the old config
/etc/swift/object-expirer.conf
will not use the new general
task queue. It'll ignore the dequeue_from_legacy
and will
only check the legacy queue. Meaning it'll run as a legacy expirer.
Why is this important? If you are currently running object-expirers
on nodes that are not object storage nodes, then for the time being they
will still work but only by dequeuing from the old queue. When the new
general task queue is introduced, expirers will be required to run on
the object servers so that any new objects added can be removed. If
you're in this situation, you can safely setup the new expirer section
in the object-server.conf
to deal with the new queue and
leave the legacy expirers running elsewhere.
However, if your old expirers are running on the object-servers, the
most common topology, then you would add the new section to all object
servers, to deal the new queue. In order to maintain the same number of
expirers checking the legacy queue, pick the same number of nodes as you
previously had and turn on dequeue_from_legacy
on those
nodes only. Also note on these nodes you'd need to keep the legacy
process
and processes
options to maintain the
concurrency level for the legacy queue.
Note
Be careful not to enable dequeue_from_legacy
on too many
expirers as all legacy tasks are stored in a single hidden account and
the same hidden containers. On a large cluster one may inadvertently
overload the acccount/container servers handling the legacy expirer
queue.
Here is a quick sample of the object-expirer
section
required in the object-server.conf
:
[object-expirer]
# log_name = object-expirer
# log_facility = LOG_LOCAL0
# log_level = INFO
# log_address = /dev/log
#
interval = 300
# If this true, expirer execute tasks in legacy expirer task queue
dequeue_from_legacy = false
# processes can only be used in conjunction with `dequeue_from_legacy`.
# So this option is ignored if dequeue_from_legacy=false.
# processes is how many parts to divide the legacy work into, one part per
# process that will be doing the work
# processes set 0 means that a single legacy process will be doing all the work
# processes can also be specified on the command line and will override the
# config value
# processes = 0
# process can only be used in conjunction with `dequeue_from_legacy`.
# So this option is ignored if dequeue_from_legacy=false.
# process is which of the parts a particular legacy process will work on
# process can also be specified on the command line and will override the config
# value
# process is "zero based", if you want to use 3 processes, you should run
# processes with process set to 0, 1, and 2
# process = 0
report_interval = 300
# request_tries is the number of times the expirer's internal client will
# attempt any given request in the event of failure. The default is 3.
# request_tries = 3
# concurrency is the level of concurrency to use to do the work, this value
# must be set to at least 1
# concurrency = 1
# The expirer will re-attempt expiring if the source object is not available
# up to reclaim_age seconds before it gives up and deletes the entry in the
# queue.
# reclaim_age = 604800
And for completeness, here is a quick sample of the legacy
object-expirer.conf
file:
[DEFAULT]
# swift_dir = /etc/swift
# user = swift
# You can specify default log routing here if you want:
# log_name = swift
# log_facility = LOG_LOCAL0
# log_level = INFO
[object-expirer]
interval = 300
[pipeline:main]
pipeline = catch_errors cache proxy-server
[app:proxy-server]
use = egg:swift#proxy
# See proxy-server.conf-sample for options
[filter:cache]
use = egg:swift#memcache
# See proxy-server.conf-sample for options
[filter:catch_errors]
use = egg:swift#catch_errors
# See proxy-server.conf-sample for options
Note
When running legacy expirers, the daemon needs to run on a machine with access to all the backend servers in the cluster, but does not need proxy server or public access. The daemon will use its own internal proxy code instance to access the backend servers.