- Move statsd client into it's own module
- Move all logging functions into their own module
- Move all config functions into their own module
- Move all helper functions into their own module
Partial-Bug: #2015274
Change-Id: Ic4b5005e3efffa8dba17d91a41e46d5c68533f9a
We stuff the access key into the request path until we get back a
more-authoritative account name from auth. But it needs to be a WSGI
string when we do!
Closes-Bug: #2058748
Change-Id: I34adb8141cc9e62d17a27f01c63f40d1dd25991c
Any of these directories may get unlinked between when we saw them in
their parent's directory listing and when we go to descend.
Change-Id: I1dfc0ee1d9e70cb0600557cde980bd5880bd40b3
This change allows individual SLO segments to be downloaded by adding
an extra 'part-number' query parameter to the GET request. You can
also retrieve the Content-Length of an individual segment with a HEAD
request.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I7af0dc9898ca35f042b52dd5db000072f2c7512e
Currently when the memcachering `_get_conns` method runs out of memcached
servers to try and so fails to yield anything we log a:
All memcached servers error-limited
However, this error message isn't entirely accurate. It can also fail
because it failed to connect all it's memcached servers not just because
they're error limited.
You can disable error-limiting of memcached servers. So in this case
this error message is a red-herring.
Downstream we use a mcrouter client on each node which itself talks to a
bunch of memcache servers. Therefore in swift's memcachering client we
only configure the 1 mcrouter client as a single server in the ring.
Because of this we disable memcached error-limiting.
If the node gets too overloaded we've had timeouts talking to the local
mcrouter client. This fires off error-limitted log messages which can
confuse things.
Because it's possible to turn off error-limiting, the log line isn't
quite adequate anymore. So this patch changes it to:
No more memcached servers to try
Change-Id: I97fb4f3ee2ac45831aae14a782b2c6dc73e82d85
Currently, when object-server serves GET request and DiskFile
reader iterate over disk file chunks, there is no explicit
eventlet sleep called. When network outpace the slow disk IO,
it's possible one large and slow GET request could cause
eventlet hub not to schedule any other green threads for a
long period of time. To improve this, this patch add a
configurable sleep parameter into DiskFile reader, which
is 'cooperative_period' with a default value of 0 (disabled).
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: I80b04bad0601b6cd6caef35498f89d4ba70a4fd4
The existing test fails on macOS because the value of errno.ENODATA is
platform dependent. On macOS ENODATA is 96:
% man 2 intro|grep ENODATA
96 ENODATA No message available.
Change-Id: Ibc760e641d4351ed771f2321dba27dc4e5b367c1
Object GET requests with a truthy X-Newest header are not resumed if a
backend request times out. The GetOrHeadHandler therefore uses the
regular node_timeout when waiting for a backend connection response,
rather than the possibly shorter recoverable_node_timeout. However,
previously while reading data from a backend response the
recoverable_node_timeout would still be used with X-Newest requests.
This patch simplifies GetOrHeadHandler to never use
recoverable_node_timeout when X-Newest is truthy.
Change-Id: I326278ecb21465f519b281c9f6c2dedbcbb5ff14
Both GetOrHeadHandler (used for replicated policy GETs) and
ECFragGetter (used for EC policy GETs) have _get_next_response_part
methods that are very similar. This patch replaces them with a single
method in the common GetterBase superclass.
Both classes are modified to use *only* the Request instance passed to
their constructors. Previously their entry methods
(GetOrHeadHandler.get_working_response and
ECFragGetter.response_parts_iter) accepted a Request instance as an
arg and the class then variably referred to that or the Request
instance passed to the constructor. Both instances must be the same
and it is therefore safer to only allow the Request to be passed to
the constructor.
The 'newest' keyword arg is dropped from the GetOrHeadHandler
constructor because it is never used.
This refactoring patch makes no intentional behavioral changes, apart
from the text of some error log messages which have been changed to
differentiate replicated object GETs from EC fragment GETs.
Change-Id: I148e158ab046929d188289796abfbbce97dc8d90
... in document_iters_to_http_response_body.
We seemed to be relying a little too heavily upon prompt garbage
collection to log client disconnects, leading to failures in
test_base.py::TestGetOrHeadHandler::test_disconnected_logging
under python 3.12.
Closes-Bug: #2046352
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I4479d2690f708312270eb92759789ddce7f7f930
This has been available since py32 and was backported to py27; there
is no point in us continuing to carry the old idiom forward.
Change-Id: I21f64b8b2970e2dd5f56836f7f513e7895a5dc88
Last time we did this was nearly 4 years ago; drag ourselves into
something approaching the present. Address a few new pyflakes issues
that seem reasonable to enforce:
E275 missing whitespace after keyword
E231 missing whitespace after ','
E721 do not compare types, for exact checks use `is` / `is not`,
for instance checks use `isinstance()`
Main motivator is that the old hacking kept us on an old version
of flake8 et al., which no longer work with newer Pythons.
Change-Id: I54b46349fabb9776dcadc6def1cfb961c123aaa0
We've observed a root container suddenly thinks it's unsharded when it's
own_shard_range is reset. This patch blocks a remote osr with an epoch
of None from overwriting a local epoched OSR.
The only way we've observed this happen is when a new replica or handoff
node creates a container and it's new own_shard_range is created without
an epoch and then replicated to older primaries.
However, if a bad node with a non-epoched OSR is on a primary, it's
newer timestamp would prevent pulling the good osr from it's peers. So
it'll be left stuck with it's bad one.
When this happens expect to see a bunch of:
Ignoring remote osr w/o epoch: x, from: y
When an OSR comes in from a replica that doesn't have an epoch when
it should, we do a pre-flight check to see if it would remove the epoch
before emitting the error above. We do this because when sharding is
first initiated it's perfectly valid to get OSR's without epochs from
replicas. This is expected and harmless.
Closes-bug: #1980451
Change-Id: I069bdbeb430e89074605e40525d955b3a704a44f
This is a more-intuitive name for what's going on and it's been working
well for us in the reconstructor.
Change-Id: Id935de4ca9eb6f38b0d587eaed8d13c54bd89d60
Note that there's a bit of a privilege escalation as prefix-based
tempurls can now be used to perform listings -- but only on containers
with staticweb enabled. Since having staticweb enabled was previously
pretty useless unless the container was both public and
publicly-listable, I think it's probably fine.
This also allows tempurls to be used at the container level, but only
for staticweb responses.
Change-Id: I7949185fdd3b64b882df01d54a8bc158ce2d7032
From https://docs.python.org/3/whatsnew/3.12.html :
sum() now uses Neumaier summation to improve accuracy and
commutativity when summing floats or mixed ints and floats.
At least, I *think* that's what was causing the ring builder failures.
Partial-Bug: #2046352
Change-Id: Icae2f1e3e95f216d214636bd5a6d1f40aacab20d
If the proxy timed out while reading a replicated policy multi-part
response body, it would transform the ChunkReadTimeout to a
StopIteration. This masks the fact that the backend read has
terminated unexpectedly. The document_iters_to_multipart_byteranges
would complete iterating over parts and send a multipart terminator
line, even though no parts may have been sent.
This patch removes the conversion of ChunkReadTmeout to StopIteration.
The ChunkReadTimeout that is now raised prevents the
document_iters_to_multipart_byteranges 'for' loop completing and
therefore stops the multi-part terminator line being sent. It is
raised from the GetOrHeadHandler similar to other scenarios that raise
ChunkReadTimeouts while the resp body is being read.
A ChunkReadTimeout exception handler is removed in the
_iter_parts_from_response method. This handler was previously never
reached (because StopIteration rather than ChunkReadTimeout was raised
from _get_next_response_part), but if it were reached (i.e. with this
change) then it would repeat logging of the error and repeat
incrementing the node's error counter.
This change in the GetOrHeadHandler mimics a similar change in the
ECFragGetter [1].
[1] Related-Chage: I0654815543be3df059eb2875d9b3669dbd97f5b4
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Change-Id: I6dd53e239f5e7eefcf1c74229a19b1df1c989b4a
The proxy should NOT read or write to memcache when handling a
container GET that explicitly requests 'shard' or 'object' record
type. A request for 'shard' record type may specify 'namespace'
format, but this request is unrelated to container listings or object
updates and passes directly to the backend.
This patch also removes unnecessary JSON serialisation and
de-serialisation of namespaces within the proxy GET path when a
sharded object listing is being built. The final response body will
contain a list of objects so there is no need to write intermediate
response bodies with a list of namespaces.
Requests that explicitly specify record type of 'shard' will of
course still have the response body with serialised shard dicts that
is returned from the backend.
Change-Id: Id79c156432350c11c52a4004d69b85e9eb904ca6
Unit test changes only:
- Add tests for some resuming replicated GET scenarios.
- Add test to cover resuming GET fast_forward "failing" when range
read is complete.
- Add test to verify different node_timeout for account and container
vs object controller getters.
- Refactor proxy.test_server.py tests to split out different
scenarios.
Drive-by: remove some ring device manipulation setup that's not needed.
Change-Id: I38c7fa648492c9bd2173ecf92f89e423bee4abf3
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Seems since somewhere around sqlite 3.40+ our in tests malformed sqlite
db isn't malformed anymore. I don't actually know how it was malformed
but looking in a hex editor it seems to have a bunch of null
truncated in the middle of the file. Which maybe isn't an issue anymore?
Instead I've gone and messed up what looks like to be the marker before
defining the test table data at the end of file, so from:
00001FF0 00 00 00 00 00 00 00 00 00 00 00 03 01 02 0F 31 ...............1
^^
To:
00001FF0 00 00 00 00 00 00 00 00 00 00 00 FF 01 02 0F 31 ...............1
^^
Basically FF'ed the start of the data marker (at least what I'm calling
it).
Closes-Bug: #2051067
Change-Id: I2a10adffa39abbf7e97718b7228de298209140f8
Recently, upper-constraints updated eventlet. Unfortunately, there
was a bug which breaks our unit tests which was not discovered during
the cross-project testing because the affected unit tests require an
XFS temp dir. The requirements change has since been reverted, but we
ought to have tests that cover the problematic behavior that will
actually run as part of cross-project testing.
See https://github.com/eventlet/eventlet/pull/826 for the eventlet
change that introduced the bug; it has since been fixed on master in
https://github.com/eventlet/eventlet/pull/890 (though we still need
https://review.opendev.org/c/openstack/swift/+/905796 to be able to
work with eventlet master).
Change-Id: I4a6d79317b65f746ee29d2d25073b8c3859cd6a0