Adds specific disk file classes for EC policy types.
The new ECDiskFile and ECDiskFileWriter classes are used by the
ECDiskFileManager.
ECDiskFileManager is registered with the DiskFileRouter for use with
EC_POLICY type policies.
Refactors diskfile tests into BaseDiskFileMixin and BaseDiskFileManagerMixin
classes which are then extended in subclasses for the legacy
replication-type DiskFile* and ECDiskFile* classes.
Refactor to prefer use of a policy instance reference over a policy_index
int to refer to a policy.
Add additional verification to DiskFileManager.get_dev_path to validate the
device root with common.constraints.check_dir, even when mount_check is
disabled for use in on a virtual swift-all-in-one.
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I22f915160dc67a9e18f4738c1ddf068344e8ad5d
Wikipedia's list of common misspellings [1] has a machine-readable
version. This patch fixes those misspellings mentioned in the list
which don't have multiple right variants (as e.g. "accension", which can
be both "accession" and "ascension"), such misspellings are left
untouched. The list of changes was manually re-checked for false
positives.
[1] https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines
Change-Id: Ic9a5438629664f7cea216413a28acc0e8992da05
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Extracting large chunks of the PUT method into smaller
methods to improve maintainability and reuse of code.
Based on the work that Clay Gerrard started:
https://review.openstack.org/#/c/77812/
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Id479fc5b159a2782361ac4a6e4a6d8bbaee4fe85
Signed-off-by: Thiago da Silva <thiago@redhat.com>
If the used tool to send header doesn't support empty headers (older versions
of curl), x-remove can be used to remove metadata.
sync-key and sync-to metadata, used by container-sync, can now be removed using
x-remove headers.
Change-Id: I0edb4d5425a99d20a973aa4fceaf9af6c2ddecc0
The cached info object dict did not include
the sysmeta. This patch fixes that, and adds
a unit test.
Change-Id: I092200e76586af322ed4ff7d194a1034b1ca0433
When an account was not found, ContainerController would
return 404 unconditionally for a container GET or HEAD request,
without checking that the request was authorized.
This patch modifies the GETorHEAD method to first call any
callback method registered under 'swift.authorize' in the
request environ and prefer any response from that over the 404.
Closes-Bug: 1415957
Change-Id: I4f41fd9e445238e14af74b6208885d83698cc08d
Meaningless cleanup of the day. In my defence, I hope the next guy
doesn't have to grep throughout (the test is adjusted to signify).
Change-Id: I9e10dd977d4ca48db1393519068ce0e286705433
There is a standard LBYL race that can better be addressed by making the
EAFP case safer.
Capture 409 Conflict during expect on PUT
Similarly to how the proxy handles 412 on PUT, we will gather 409
responses in the proxy during _connect_put_node. Rather than skipping
backend servers that already have a synced copy of an object we will
accept their response and return 202 immediately.
This is particularly useful to internal clients who are using
X-Timestamp to sync transfers (e.g. container-sync and
container-reconciler).
No observable change in client facing behavior except that swift is
faster to respond Accepted when it already has the data the client is
purposing to send.
Change-Id: Ie400d5bfd9ab28b290abce2e790889d78726095f
According to documentation dlo manifest files should not
be versioned. This patch fixes this issue and adds
some unit and functional for this scenario.
Change-Id: Ib5b29a19e1d577026deb50fc9d26064a8da81cd7
Signed-off-by: Thiago da Silva <thiago@redhat.com>
All GET or HEAD requests consistently error limit nodes that return 507
and increment errors for nodes responding with any other 5XX.
There were two places in the object PUT path where the proxy was error
limiting nodes and their behavior was inconsistent. During expect-100
connect we would only error_limit nodes on 507, and during response we
would increment errors for all 5XX series responses. This was pretty
hard to reason about and the divergence in behavior of questionable
value.
An audit of base controller highlighted where make_requests would apply
error_limit's on 507 but not increment errors on other 5XX responses.
Now anywhere we track errors on nodes we use error_limit on 507 and
error_occurred on any other 5XX series request. Additionally a Timeout
or Exception that is logged through exception_occurred will bump errors -
which is consistent with the approach in "Add Error Limiting to slow
nodes" [1].
1. https://review.openstack.org/#/c/112424/
Change-Id: I67e489d18afd6bdfc730bfdba76f85a2e3ca74f0
The proxy was storing the error count and last-error time in the
ring's internal data, specifically in the device dictionaries. This
works okay, but it means that whenever a ring changes, all the error
stats reset.
Now the error stats live in the proxy server object, so they survive a
ring reload.
Better yet, the error stats are now keyed off of the node's
IP/port/device triple, so if you have the same device in two rings
(like with multiple storage policies), then the error stats are
combined. If the proxy server sees a 507 for an objec request in
policy X, then that will now result in that particular object disk
being error-limited for requests in policies Y and Z as well.
Change-Id: Icc72b68b99f37367bb16d43688e7e45327e3e022
When a X-Backend-Timestamp is available it would generally preferred
over a less specific value and sorts correctly against any X-Timestamp
values anyway.
Change-Id: I08b7eb37ab8bd6eb3afbb7dee44ed07a8c69b57e
The default MAX_FILE_SIZE of (5 * 2 ** 30 + 2) exceeds the
capacity of an int on 32 bit systems; adjust this default
down to (2 ** 30 + 2) if the default exceeds the maxsize of
an int on the platform running the tests.
Closes-Bug: #1378810
Change-Id: Iafa2ce90ceb2de4e968ad48580270c8c572a9c9c
This patch fixes the unit tests to remove the temporary directories
created during run of unit tests. Some of unit tests did not tear down
correctly, whatever it had set it up for running. This would over period
of time bloat up the tmp directory. As on date, there were around 49 tmp
directories left uncleared per round of unit tests. This patch fixes it.
Change-Id: If591375ca9cc87d52c7c9c6dc16c9fb4b49e99fc
This change adds an optional overrides map to _make_request method
in the base Controller class.
def make_requests(self, req, ring, part, method, path, headers,
query_string='', overrides=None)
Which will be passed on the the best_response method. If set and
no quorum it reached, the override map is used to attempt to find
quorum.
The overrides map is in the form:
{ <response>: <override response>, .. }
The ObjectController, in the DELETE method now passes an override map
to make_requests method in the base Controller class in the form of:
{ 404: 204 }
Statuses/responses that have been overridden are used in calculation
of the quorum but never returned to the user. They are replaced by:
(STATUS, '', '', '')
And left out of the search for best response.
Change-Id: Ibf969eac3a09d67668d5275e808ed626152dd7eb
Closes-Bug: 1318375
This adds a sanity check on x-delete headers as
part of check_object_creation method
Change-Id: If5069469e433189235b1178ea203b5c8a926f553
Signed-off-by: Thiago da Silva <thiago@redhat.com>
The proxy sends requests to all storage nodes, but it only needs a
quorum of them to respond before the proxy can, in turn, respond to
the client. Thus, it gets quorum, and then briefly waits to see if the
remainder of the storage nodes respond before giving up on them.
However, the proxy was not paying any attention to the responses from
the non-quorum storage nodes. This would lead to some odd behavior,
like a container PUT where the backends returned (201, 201, 202) would
become a 201 to the client, but the permutation (201, 202, 201) would
become 202. Further, on object PUT, if the last node responded with an
error code, that error wouldn't count towards error-limiting.
The fix is quite simple: after getting quorum, the make_requests()
method was calling a method that returns responses from the remainder
of the nodes, but it was ignoring that return value and making up
responses with dummy values instead. Now, prior to making up dummy
responses, the proxy first uses the responses it already has, and only
fills in dummy responses for nodes that really didn't respond in time.
Change-Id: I0206b6b2272b0e7dcc80fb6c51840d8dae25cee2
If any node had an error on object PUT, the proxy would count the
error against the last-connected-to node instead of the one with the
error. Now it counts the error against the right node.
Change-Id: I884eb73cebe0c723473a6d5e390a148fcad0d3ed
When deleteing versioned objects proxy will try to restore the previous
copy. The COPY request will fail if the previous version is expired but
not handled by object-expirer.
This patch checks COPY respones on the previous copy, if it's
HTTP_NOT_FOUND(mostly because it's expired) proxy will try to restore
with the version before previous.
Closes-Bug #1308446
Change-Id: I17f049ea3ef62723effae8086ec427f6e151cd9c
Current FakeServerConnection might cause 499 error
in some unit tests because sent (put) data will be
overridden by new one every time.
e.g. When calling conn.queue.put() twice and more in
an object PUT sequence, we can use only a last chunk as
the body. This fixes it to merge all chunks as a body.
Change-Id: I463e9e2b454e3f3eb26950b3af4c8b8167a9a971
All the expiring objects for a given X-Delete-At are funnelled into the
same expiring object container- this can act as a bottleneck.
Change-Id: I288a177a7ae3e213c727a2a81fa76d4ef9cf7eb3
Adds ability to copy objects between different accounts (on server side)
Adds new header to `PUT` request:
`X-Copy-From-Account: <account name>`
Account name corresponds to the last part of storage URL.
Adds new header to `COPY` request:
`Destination-Account: <account name>`
Account name corresponds to the last part of storage URL.
If your storage URL is: http://server:8080/v1/AUTH_test
Then the account name is `AUTH_test`
These headers should be used alongside `X-Copy-From` and `Destination` headers
The legacy headers should specify `<container name>/<object name>` path as usual.
DocImpact
Change-Id: I0285fe6a47df9e699ac20ae4a83b0bf23829e1e6
There was some duplication of code in both POST and PUT
methods to handle object-expiration headers.
A method was created to remove this duplication,
which should help with maintainability of code.
Change-Id: I85cc4a7b0d688760c97598d80b9e9a39288c5f34
Signed-off-by: Thiago da Silva <thiago@redhat.com>
The keystoneauth middleware supports cross-tenant access
control using the syntax <tenant>:<user> in container ACLs,
where <tenant> and <user> may currently be either a unique
id or a name. As a result of the keystone v3 API introducing
domains, names are no longer globally unique and are only
unique within a domain. The use of unqualified tenant and
user names in this ACL syntax is therefore not 'safe' in a
keystone v3 environment.
This patch modifies keystoneauth to restrict cross-tenant
ACL matching to use only ids for accounts that are not in
the default domain. For backwards compatibility,
names will still be matched in ACLs when both the requesting
user and tenant are known to be in the default domain AND the
account's tenant is also in the default domain (the default
domain being the domain to which existing tenants are
migrated).
Accounts existing prior to this patch are assumed to be for
tenants in the default domain. New accounts created using a
v2 token scoped on the tenant are also assumed to be in the
default domain. New accounts created using a v3 token scoped
on the tenant will learn their domain membership from the
token info. New accounts created using any unscoped token,
(i.e. with a reselleradmin role) will have unknown domain
membership and therefore be assumed to NOT be in the default
domain.
Despite this provision for backwards compatibility, names
must no longer be used when setting new ACLs in any account,
including new accounts in the default domain.
This change obviously impacts users accustomed to specifying
cross-tenant ACLs in terms of names, and further work will be
necessary to restore those use cases. Some ideas are
discussed under the bug report. With that caveat, this patch
removes the reported vulnerability when using
swift/keystoneauth with a keystone v3 API.
Note: to observe the new 'restricted' behaviour you will need
to setup keystone user(s) and tenant(s) in a non-default domain
and set auth_version = v3.0 in the auth_token middleware config
section of proxy-server.conf. You may also benefit from the
keystone v3 enabled swiftclient patch under review here:
https://review.openstack.org/#/c/91788/
DocImpact
blueprint keystone-v3-support
Closes-Bug: #1299146
Change-Id: Ib32df093f7450f704127da77ff06b595f57615cb
This patch takes a first step towards support
for object system metadata by enabling headers
in the x-object-sysmeta- namespace to be
persisted when objects are PUT. This should be
useful for other pending patches such as on
demand migration and server side encryption
(https://review.openstack.org/#/c/64430/ and
https://review.openstack.org/#/c/76578/1).
The x-object-sysmeta- namespace is already
reserved/protected by the gatekeeper and
passed through the proxy. This patch modifies
the object server to persist these headers
alongside user metadata when an object is
PUT.
This patch will preserve existing object
system metadata and ignore any new system
metadata when handling object POSTs,
including POST-as-copy operations. Support
for modification of object system metadata
with a POST request requires further work
as discussed in the blueprint.
This patch will preserve existing object
system metadata and update it with new
system metadata when copying an object.
A new probe test is added which makes use of
the BrainSplitter class that has been moved
from test_container_merge_policy_index.py to
a new module brain.py.
blueprint object-system-metadata
Change-Id: If716bc15730b7322266ebff4ab8dd31e78e4b962
A long, long time ago, on a GET request, the proxy would go look on 3*
nodes for the requested thing. If one of the primary nodes was
error-limited, it'd look on two primaries and a handoff. Since this
indicated some failure somewhere, the proxy would emit a warning:
"Handoff requested (1)". If two primaries were down, there'd be a
second message "Handoff requested (2)", and so on.
Some StatsD messages were emitted too.
A somewhat shorter time ago (commit d79a67eb), the proxy started
looking into handoffs if it got 404s from the primaries. While this
was a good idea, it resulted lots of "Handoff requested (N)" log spam;
you'd see these messages on every single 404. Also, the StatsD
handoff_count and handoff_all_count metrics shot way up and turned
into noise.
This commit restores the original intent (and usefulness) of the log
messages and StatsD metrics: if the proxy only looks at the normal
number of handoff nodes, nothing is logged. However, if a primary is
down, then the message "Handoff requested (1)" will be logged,
indicating that the proxy looked at one more handoff than it normally
would, and this happened because a primary node was error-limited.
Closes-Bug: 1297214
* or whatever the replica count was
Change-Id: If1b77c18c880b096e8ab1df3008db40ce313835d
If container counter per account is equal or greater than
max_container_per_account, then all PUT requests are failed and
403 is returned.
This is correct behaviour if the request is to create a new
container, however if container already exists PUT should be
allowed, even the max_container_per_account condition has met.
This patch allows to process PUT requests for existing containers,
even if max_container_per_account > = container count.
It indirectly resolve the bug 1306711, since swift-client
uses internally PUT requests for container, prior it upload an
object there.
Change-Id: I2dcf20b6feb27e346111466a565695eba4b4b1da
The get_part method is fast and stable given a consistent hash path
suffix/prefix, so there's no absolute requirement for the fake
implementation other than convenience. OTOH, removing the fake
implementation and fixing the tests that were relying on it should make
it easier to write better tests going forward and harder to hide bugs
that don't show up when using the fakes.
There may be some overhead when writing new tests that use the ring if
you're making assertions on partitions or paths, but with a part power
of zero it's normally trivially obvious when a 1 needs to be a 0 or vice
versa. Or you can just drop the assertions around the parts you were
faking anyway.
Change-Id: I8bfc388a04eff6491038991cdfd7686c9d961545
Per information filed in bug (and related bugs referenced in the
report) it appears that in TestObjectController.test_client_timeout
having a matching timeout value for both self.app.client_timeout
and SlowBody() appears to work however, as expected, a smaller
value in self.app.client_timeout also works where a larger
value fails. With that short test combined with speculation and
related fixes, seems reasonable to merge what is suggested in
the bug report, drop the self.app.client_timeout to 0.05
Fixes Bug #1316716
Change-Id: Ib4c6d3bb275f6c50c62505c90656efa7ee566bc0
If upgrading from a non-storage policy enabled version of
swift to a storage policy enabled version its possible that
memcached will have an info structure that does not contain
the 'storage_policy" key resulting in an unhandled exception
during the lookup. The fix is to simply make sure we never
return the dict without a storage_policy key defined; if it
doesn't exist its safe to make it '0' as this means you're
in the update scenario and there's xno other possibility.
Change-Id: If8e8f66d32819c5bfb2d1308e14643f3600ea6e9