During cleaving, if the sharder finds that zero object rows are copied
from the parent retiring DB to a cleaved shard DB, and if that shard
DB appears have been freshly created by the cleaving process, then the
sharder skips replicating that shard DB and does not count the shard
range as part of the batch (see Related-Change).
Previously, any shard range treated in this way would not have its
state moved to CLEAVED but would remain in the CREATED state. However,
cleaving of following shard ranges does continue, leading to anomalous
sets of shard range states, including all other shard ranges moving to
ACTIVE but the skipped range remaining in CREATED (until another
sharder visitation finds object rows and actually replicates the
cleaved shard DB).
These anomalies can be avoided by moving the skipped shard range to
the CLEAVED state. This is exactly what would happen anyway if the
cleaved DB had only one object row copied to it, or if the cleaved DB
had zero object rows copied to it but happened to already exist on
disk.
Related-Change: Id338f6c3187f93454bcdf025a32a073284a4a159
Change-Id: I1ca7bf42ee03a169261d8c6feffc38b53226c97f
Previously all sharder probe tests were skipped if auto-sharding was
not configured for the test cluster. This is now fixed so that only
tests that require auto-sharding are skipped.
Change-Id: Icefc6441f8d48a0d418a4ab70b778b48ffdcbb59
There is a tight coupling between a root container and its shards: the
shards hold the object metadata for the root container, so are really
an extension of the root. When we PUT objects in to a root container,
it'll redirect them, with the root's policy, to the shards. And the
shards are happy to take them, even if the shard's policy is different
to the root's. But when it comes to GETs, the root redirects the GET
onto it's shards whom currently wont respond with objects (which they
probably took) because they are of a different policy. Currently, when
getting objects from the container server, the policy used is always
the broker's policy.
This patch corrects this behaviour by allowing the ability to override
the policy index to use. If the request to the container server
contains an 'X-Backend-Storage-Policy-Index' header it'll be used
instead of the policy index stored in the broker.
This patch adds the root container's policy as this header in the
proxy container controller's `_get_from_shards` method which is used
by the proxy to redirect a GET to a root to its shards.
Further, a new backend response header has been added. If the
container response contains an `X-Backend-Record-Type: object` header,
then it means the response is a response with objects in it. In this
case this patch also adds a `X-Backend-Record-Storage-Policy-Index`
header so the policy index of the given objects is known, as
X-Backend-Storage-Policy-Index in the response _always_ represents the
policy index of the container itself.
On a plus side this new container policy API allows us a way to check
containers for object listing is other policies. So might come in handy
for OPs/SREs.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I026b699fc5f0fba619cf524093632d67ca38d32f
A container is typically sharded when it has grown to have an object
count of shard_container_threshold + N, where N <<
shard_container_threshold. If sharded using the default
rows_per_shard of shard_container_threshold / 2 then this would
previously result in 3 shards: the tail shard would typically be
small, having only N rows. This behaviour caused more shards to be
generated than desirable.
This patch adds a minimum-shard-size option to
swift-manage-shard-ranges, and a corresponding option in the sharder
config, which can be used to avoid small tail shards. If set to
greater than one then the final shard range may be extended to more
than rows_per_shard in order to avoid a further shard range with less
than minimum-shard-size rows. In the example given, if
minimum-shard-size is set to M > N then the container would shard into
two shards having rows_per_shard rows and rows_per_shard + N
respectively.
The default value for minimum-shard-size is rows_per_shard // 5. If
all options have their default values this results in
minimum-shard-size being 100000.
Closes-Bug: #1928370
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Change-Id: I3baa278c6eaf488e3f390a936eebbec13f2c3e55
If the reconstructor finds a fragment that appears to be stale then it
will now quarantine the fragment. Fragments are considered stale if
insufficient fragments at the same timestamp can be found to rebuild
missing fragments, and the number found is less than or equal to a new
reconstructor 'quarantine_threshold' config option.
Before quarantining a fragment the reconstructor will attempt to fetch
fragments from handoff nodes in addition to the usual primary nodes.
The handoff requests are limited by a new 'request_node_count'
config option.
'quarantine_threshold' defaults to zero i.e. no fragments will be
quarantined. 'request node count' defaults to '2 * replicas'.
Closes-Bug: 1655608
Change-Id: I08e1200291833dea3deba32cdb364baa99dc2816
Previously a shard might be shrunk if its object_count was fell below
the shrink_threshold. However, it is possible that a shard with few
objects has a large number of tombstones, which would result in a
larger than anticipated replication of rows to the acceptor shard.
With this patch, a shard's row count (i.e. the sum of tombstones and
objects) must be below the shrink_threshold before the shard will be
considered for shrinking.
A number of changes are made to enable tombstone count to be used in
shrinking decisions:
- DatabaseBroker reclaim is enhanced to count remaining tombstones
after rows have been reclaimed. A new TombstoneReclaimer class is
added to encapsulate the reclaim process and tombstone count.
- ShardRange has new 'tombstones' and 'row_count' attributes.
- A 'tombstones' column is added to the Containerbroker shard_range
table.
- The sharder performs a reclaim prior to reporting shard container
stats to the root container so that the tombstone count can be
included.
- The sharder uses 'row_count' rather than 'object_count' when
evaluating if a shard range is a shrink candidate.
Change-Id: I41b86c19c243220b7f1c01c6ecee52835de972b6
During sharding a shard range is moved to CLEAVED state when cleaved
from its parent. However, during shrinking an acceptor shard should
not be moved to CLEAVED state when the shrinking shard cleaves to it,
because the shrinking shard is not the acceptor's parent and does not
know if the acceptor has yet been cleaved from its parent.
The existing attempt to prevent a shrinking shard updating its
acceptor state relied on comparing the acceptor namespace to the
shrinking shard namespace: if the acceptor namespace fully enclosed
the shrinkng shard then it was inferred that shrinking was taking
place. That check is sufficient for normal shrinking of one shard into
an expanding acceptor, but is not sufficient when shrinking in order
to fix overlaps, when a shard might shrink into more than one
acceptor, none of which completely encloses the shrinking shard.
Fortunately, since [1], it is possible to determine that a shard is
shrinking from its own shard range state being either SHRINKING or
SHRUNK.
It is still advantageous to delete and merge the shrinking shard range
into the acceptor when the acceptor fully encloses the shrinking shard
because that increases the likelihood of the root being updated with
the deleted shard range in a timely manner.
[1] Related-Change: I9034a5715406b310c7282f1bec9625fe7acd57b6
Change-Id: I91110bc747323e757d8b63003ad3d38f915c1f35
Use the recently added assert_subprocess_success [1] helper function
more widely.
Add run_custom_sharder helper.
Add container-sharder key to the ProbeTest.configs dict.
[1] Related-Change: I9ec411462e4aaf9f21aba6c5fd7698ff75a07de3
Change-Id: Ic2bc4efeba5ae5bc8881f0deaf4fd9e10213d3b7
Adds a repair command that reads shard ranges from the container DB,
identifies overlapping shard ranges and recommends how overlaps can be
repaired by shrinking into a single chosen path. Prompted user input
or a '-y' option will then cause the appropriately modified shard
ranges to be merged into the container DB.
The repair command does not fix gaps in shard range paths and can
therefore only succeed if there is at least one unbroken path of
shards through the entire namespace.
Also adds an analyze command that loads shard data from a file and
reports overlapping shard ranges, but takes no action. The analyze
command is similar to the repair command but reads shard data from
file rather than a container db and makes no changes to the db.
e.g.:
swift-manage-shard-ranges <db-file-name> repair
swift-manage-shard-ranges <shard-data.json> analyze
to see more detail:
swift-manage-shard-ranges -v <shard-data.json> analyze
For consistency with the new repair command, and to be more cautious,
this patch changes the user input required to apply the compact
command changes to the DB from 'y' to 'yes'.
Change-Id: I9ec411462e4aaf9f21aba6c5fd7698ff75a07de3
On every sharder cycle up update in progress recon stats for each sharding
container. However, we tend to not run it one final time once sharding
is complete because the DB state is changed to SHARDED and therefore the
in_progress stats never get their final update.
For those collecting this data to monitor, this makes sharding/cleaving shards
never complete.
This patch, adds a new option `recon_shared_timeout` which will now
allow sharded containers to be processed by `_record_sharding_progress()`
after they've finished sharding for an amount of time.
Change-Id: I5fa39d41f9cd3b211e45d2012fd709f4135f595e
This patch adds a 'compact' command to swift-manage-shard-ranges that
enables sequences of contiguous shards with low object counts to be
compacted into another existing shard, or into the root container.
Change-Id: Ia8f3297d610b5a5cf5598d076fdaf30211832366
Shard containers learn about their own shard range by fetching shard
ranges from the root container during the sharder audit phase. Since
[1], if the shard is shrinking, it may also learn about acceptor
shards in the shard ranges fetched from the root. However, the
fetched shard ranges do not currently include the root's own shard
range, even when the root is to be the acceptor for a shrinking shard.
This prevents the mechanism being used to perform shrinking to root.
This patch modifies the root container behaviour to include its own
shard range in responses to shard containers when the container GET
request param 'states' has value 'auditing'. This parameter is used to
indicate that a particular GET request is from the sharder during
shard audit; the root does not otherwise include its own shard range
in GET responses.
When the 'states=auditing' parameter is used with a container GET
request the response includes all shard ranges except those in the
FOUND state. The shard ranges of relevance to a shard are its own
shard range and any overlapping shard ranges that may be acceptors if
the shard is shrinking. None of these relevant shard ranges should be
in state FOUND: the shard itself cannot be in FOUND state since it has
been created; acceptor ranges should not be in FOUND state. The FOUND
state is therefore excluded from the 'auditing' states to prevent an
unintended overlapping FOUND shard range that has not yet been
resolved at the root container being fetched by a shrinking shard,
which might then proceed to create and cleave to it.
The shard only merges the root's shard range (and any other shard
ranges) when the shard is shrinking. If the root shard range is ACTIVE
then it is the acceptor and will be used when the shard cleaves. If
the root shard range is in any other state then it will be ignored
when the shard cleaves to other acceptors.
The sharder cleave loop is modified to break as soon as cleaving is
done i.e. cleaving has been completed up to the shard's upper bound.
This prevents misleading logging that cleaving has stopped when
in fact cleaving to a non-root acceptor has completed but the shard
range list still contains an irrelevant root shard range in SHARDED
state. This also prevents cleaving to more than one acceptor in the
unexpected case that multiple active acceptors overlap the shrinking
shard - cleaving will now complete once the first acceptor has
cleaved.
[1] Related-Change: I9034a5715406b310c7282f1bec9625fe7acd57b6
Change-Id: I5d48b67217f705ac30bb427ef8d969a90eaad2e5
When a shard is deleted all the sysmeta is deleted along with it. But
this is problematic as we support dealing with misplaced objects in
deleted shards. Which is required.
Currently, when we deal with misplaced objects in a shard broker
marked as deleted we push the objects into a handoff broker and set
this broker's root_path attribute from the deleted shard.
Unfortunately, root_path is itself stored in sysmeta, which has been
removed. So the deleted broker falls back to using its own account and
container in the root_path. This in turn gets pushed to the shard
responsible for the misplaced objects and because these objects are
pushed by replication the root_path meta has a newer timestamp,
replacing the shard's pointer to the root.
As a consequence listings all still works because it's root driven,
but the shard will never pull the latest shard range details from the
_real_ root container during audits.
This patch contains a probe test that demonstrates this issue and also
fixes it by making 'X-Container-Sysmeta-Shard-Quoted-Root' and
'X-Container-Sysmeta-Shard-Root' whitelisted from being cleared on
delete. Meaning a deleted shard retains it's knowledge of root so it
can correctly deal with misplaced objects.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I3315f349e5b965cecadc78af9e1c66c3f3bcfe83
We don't normally issue any DELETEs to shards when an empty root accepts
a DELETE from the client. If we allow root dbs to reclaim while they
still have shards we risk letting undeleted shards get orphaned.
Partial-Bug: 1911232
Change-Id: I4f591e393a526bb74675874ba81bf743936633c1
This patch makes four significant changes to the handling of GET
requests for sharding or sharded containers:
- container server GET requests may now result in the entire list of
shard ranges being returned for the 'listing' state regardless of
any request parameter constraints.
- the proxy server may cache that list of shard ranges in memcache
and the requests environ infocache dict, and subsequently use the
cached shard ranges when handling GET requests for the same
container.
- the proxy now caches more container metadata so that it can
synthesize a complete set of container GET response headers from
cache.
- the proxy server now enforces more container GET request validity
checks that were previously only enforced by the backend server,
e.g. checks for valid request parameter values
With this change, when the proxy learns from container metadata
that the container is sharded then it will cache shard
ranges fetched from the backend during a container GET in memcache.
On subsequent container GETs the proxy will use the cached shard
ranges to gather object listings from shard containers, avoiding
further GET requests to the root container until the cached shard
ranges expire from cache.
Cached shard ranges are most useful if they cover the entire object
name space in the container. The proxy therefore uses a new
X-Backend-Override-Shard-Name-Filter header to instruct the container
server to ignore any request parameters that would constrain the
returned shard range listing i.e. 'marker', 'end_marker', 'includes'
and 'reverse' parameters. Having obtained the entire shard range
listing (either from the server or from cache) the proxy now applies
those request parameter constraints itself when constructing the
client response.
When using cached shard ranges the proxy will synthesize response
headers from the container metadata that is also in cache. To enable
the full set of container GET response headers to be synthezised in
this way, the set of metadata that the proxy caches when handling a
backend container GET response is expanded to include various
timestamps.
The X-Newest header may be used to disable looking up shard ranges
in cache.
Change-Id: I5fc696625d69d1ee9218ee2a508a1b9be6cf9685
Swift operators may find it useful to operate on each object in their
cluster in some way. This commit provides them a way to hook into the
object auditor with a simple, clearly-defined boundary so that they
can iterate over their objects without additional disk IO.
For example, a cluster operator may want to ensure a semantic
consistency with all SLO segments accounted in their manifests,
or locate objects that aren't in container listings. Now that Swift
has encryption support, this could be used to locate unencrypted
objects. The list goes on.
This commit makes the auditor locate, via entry points, the watchers
named in its config file.
A watcher is a class with at least these four methods:
__init__(self, conf, logger, **kwargs)
start(self, audit_type, **kwargs)
see_object(self, object_metadata, data_file_path, **kwargs)
end(self, **kwargs)
The auditor will call watcher.start(audit_type) at the start of an
audit pass, watcher.see_object(...) for each object audited, and
watcher.end() at the end of an audit pass. All method arguments are
passed as keyword args.
This version of the API is implemented on the context of the
auditor itself, without spawning any additional processes.
If the plugins are not working well -- hang, crash, or leak --
it's easier to debug them when there's no additional complication
of processes that run by themselves.
In addition, we include a reference implementation of plugin for
the watcher API, as a help to plugin writers.
Change-Id: I1be1faec53b2cdfaabf927598f1460e23c206b0a
Shard shrinking can be instigated by a third party modifying shard
ranges, moving one shard to shrinking state and expanding the
namespace of one or more other shard(s) to act as acceptors. These
state and namespace changes must propagate to the shrinking and
acceptor shards. The shrinking shard must also discover the acceptor
shard(s) into which it will shard itself.
The sharder audit function already updates shards with their own state
and namespace changes from the root. However, there is currently no
mechanism for the shrinking shard to learn about the acceptor(s) other
than by a PUT request being made to the shrinking shard container.
This patch modifies the shard container audit function so that other
overlapping shards discovered from the root are merged into the
audited shard's db. In this way, the audited shard will have acceptor
shards to cleave to if shrinking.
This new behavior is restricted to when the shard is shrinking. In
general, a shard is responsible for processing its own sub-shard
ranges (if any) and reporting them to root. Replicas of a shard
container synchronise their sub-shard ranges via replication, and do
not rely on the root to propagate sub-shard ranges between shard
replicas. The exception to this is when a third party (or
auto-sharding) wishes to instigate shrinking by modifying the shard
and other acceptor shards in the root container. In other
circumstances, merging overlapping shard ranges discovered from the
root is undesirable because it risks shards inheriting other unrelated
shard ranges. For example, if the root has become polluted by
split-brain shard range management, a sharding shard may have its
sub-shards polluted by an undesired shard from the root.
During the shrinking process a shard range's own shard range state may
be either shrinking or, prior to this patch, sharded. The sharded
state could occur when one replica of a shrinking shard completed
shrinking and moved the own shard range state to sharded before other
replica(s) had completed shrinking. This makes it impossible to
distinguish a shrinking shard (with sharded state), which we do want
to inherit shard ranges, from a sharding shard (with sharded state),
which we do not want to inherit shard ranges.
This patch therefore introduces a new shard range state, 'SHRUNK', and
applies this state to shard ranges that have completed shrinking.
Shards are now restricted to inherit shard ranges from the root only
when their own shard range state is either SHRINKING or SHRUNK.
This patch also:
- Stops overlapping shrinking shards from generating audit warnings:
overlaps are cured by shrinking and we therefore expect shrinking
shards to sometimes overlap.
- Extends an existing probe test to verify that overlapping shard
ranges may be resolved by shrinking a subset of the shard ranges.
- Adds a --no-auto-shard option to swift-container-sharder to enable the
probe tests to disable auto-sharding.
- Improves sharder logging, in particular by decrementing ranges_todo
when a shrinking shard is skipped during cleaving.
- Adds a ShardRange.sort_key class method to provide a single definition
of ShardRange sort ordering.
- Improves unit test coverage for sharder shard auditing.
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I9034a5715406b310c7282f1bec9625fe7acd57b6
md5 is not an approved algorithm in FIPS mode, and trying to
instantiate a hashlib.md5() will fail when the system is running in
FIPS mode.
md5 is allowed when in a non-security context. There is a plan to
add a keyword parameter (usedforsecurity) to hashlib.md5() to annotate
whether or not the instance is being used in a security context.
In the case where it is not, the instantiation of md5 will be allowed.
See https://bugs.python.org/issue9216 for more details.
Some downstream python versions already support this parameter. To
support these versions, a new encapsulation of md5() is added to
swift/common/utils.py. This encapsulation is identical to the one being
added to oslo.utils, but is recreated here to avoid adding a dependency.
This patch is to replace the instances of hashlib.md5() with this new
encapsulation, adding an annotation indicating whether the usage is
a security context or not.
While this patch seems large, it is really just the same change over and
again. Reviewers need to pay particular attention as to whether the
keyword parameter (usedforsecurity) is set correctly. Right now, all
of them appear to be not used in a security context.
Now that all the instances have been converted, we can update the bandit
run to look for these instances and ensure that new invocations do not
creep in.
With this latest patch, the functional and unit tests all pass
on a FIPS enabled system.
Co-Authored-By: Pete Zaitcev
Change-Id: Ibb4917da4c083e1e094156d748708b87387f2d87
* Add a new config option, proxy_base_url
* Support HTTPS as well as HTTP connections
* Monkey-patch eventlet early so we never import an unpatched version
from swiftclient
Change-Id: I4945d512966d3666f2738058f15a916c65ad4a6b
This patch adds a new object versioning mode. This new mode provides
a new set of APIs for users to interact with older versions of an
object. It also changes the naming scheme of older versions and adds
a version-id to each object.
This new mode is not backwards compatible or interchangeable with the
other two modes (i.e., stack and history), especially due to the changes
in the namimg scheme of older versions. This new mode will also serve
as a foundation for adding S3 versioning compatibility in the s3api
middleware.
Note that this does not (yet) support using a versioned container as
a source in container-sync. Container sync should be enhanced to sync
previous versions of objects.
Change-Id: Ic7d39ba425ca324eeb4543a2ce8d03428e2225a1
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com>
Previously, if you were on Python 2.7.10+ [0], such a newline would cause the
sharder to fail, complaining about invalid header values when trying to create
the shard containers. On older versions of Python, it would most likely cause a
parsing error in the container-server that was trying to handle the PUT.
Now, quote all places that we pass around container paths. This includes:
* The X-Container-Sysmeta-Shard-(Quoted-)Root sent when creating the (empty)
remote shards
* The X-Container-Sysmeta-Shard-(Quoted-)Root included when initializing the
local handoff for cleaving
* The X-Backend-(Quoted-)Container-Path the proxy sends to the object-server
for container updates
* The Location header the container-server sends to the object-updater
Note that a new header was required in requests so that servers would
know whether the value should be unquoted or not. We can get away with
reusing Location in responses by having clients opt-in to quoting with
a new X-Backend-Accept-Quoted-Location header.
During a rolling upgrade,
* old object-servers servicing requests from new proxy-servers will
not know about the container path override and so will try to update
the root container,
* in general, object updates are more likely to land in the root
container; the sharder will deal with them as misplaced objects, and
* shard containers created by new code on servers running old code
will think they are root containers until the server is running new
code, too; during this time they'll fail the sharder audit and report
stats to their account, but both of these should get cleared up upon
upgrade.
Drive-by: fix a "conainer_name" typo that prevented us from testing that
we can shard a container with unicode in its name. Also, add more UTF8
probe tests.
[0] See https://bugs.python.org/issue22928
Change-Id: Ie08f36e31a448a547468dd85911c3a3bc30e89f1
Closes-Bug: 1856894
AWS seems to support this, so let's allow s3api to do it, too.
Previously, S3 clients trying to use multi-character delimiters would
get 500s back, because s3api didn't know how to handle the 412s that the
container server would send.
As long as we're adding support for container listings, may as well do
it for accounts, too.
Change-Id: I62032ddd50a3493b8b99a40fb48d840ac763d0e7
Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com>
Closes-Bug: #1797305
There's still one problem, though: since swiftclient on py3 doesn't
support non-ASCII characters in metadata names, none of the tests in
TestReconstructorRebuildUTF8 will pass.
Change-Id: I4ec879ade534e09c3a625414d8aa1f16fd600fa4
Previously, we issued a GET to the root container for every object PUT,
POST, and DELETE. This puts load on the container server, potentially
leading to timeouts, error limiting, and erroneous 404s (!).
Now, cache the complete set of 'updating' shards, and find the shard for
this particular update in the proxy. Add a new config option,
recheck_updating_shard_ranges, to control the cache time; it defaults to
one hour. Set to 0 to fall back to previous behavior.
Note that we should be able to tolerate stale shard data just fine; we
already have to worry about async pendings that got written down with
one shard but may not get processed until that shard has itself sharded
or shrunk into another shard.
Also note that memcache has a default value limit of 1MiB, which may be
exceeded if a container has thousands of shards. In that case, set()
will act like a delete(), causing increased memcache churn but otherwise
preserving existing behavior. In the future, we may want to add support
for gzipping the cached shard ranges as they should compress well.
Change-Id: Ic7a732146ea19a47669114ad5dbee0bacbe66919
Closes-Bug: 1781291
When we abort the replication process because we've got shard ranges and
the sharder is now responsible for ensuring object-row durability, we
log a warning like "refusing to replicate objects" which sounds scary.
That's because it *is*, of course -- if the sharder isn't running,
whatever rows that DB has may only exist in that DB, meaning we're one
drive failure away from losing track of them entirely.
However, when the sharder *is* running and everything's happy, we reach
a steady-state where the root containers are all sharded and none of
them have any object rows to lose. At that point, the warning does more
harm than good.
Only print the scary "refusing to replicate" warning if we're still
responsible for some object rows, whether deleted or not.
Change-Id: I35de08d6c1617b2e446e969a54b79b42e8cfafef
Otherwise, a sharded container AUTH_test/sharded will have its stats
included in the totals for both AUTH_test *and* .shards_AUTH_test
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I7fa74e13347601c5f44fd7e6cf65656cc3ebc2c5
Resolve outstanding TODO's. One TODO is removed because there isn't an
easy way to arrange for an async pending to be targeted at a shard
container.
Change-Id: I0b003904f73461ddb995b2e6a01e92f14283278d
Previously test_misplaced_object_movement() deleted objects from both
shards and then relied on override-partitions option to selectively
run the sharder on root or shard containers and thereby control when
each shard range was identified for shrinking. This approach is flawed
when the second shard container lands in the same partition as the
root: running the sharder on the empty second shard's partition would
also cause the sharder to process the root and identify the second
shard for shrinking, resulting in premature shrinking of the second
shard.
Now, objects are only deleted from each shard range as that shard is
wanted to shrink.
Change-Id: I9f51621e8414e446e4d3f3b5027f6c40e01192c3
Drive-by: use the run_sharders() helper more often.
Before, merge_objects() always used storage policy index of 0 when
inserting a fake misplaced object into a shard container. If the shard
broker had a different policy index then the misplaced object would
not show in listings causing test_misplaced_object_movement() to
fail. This test bug might be exposed by having policy index 0 be an EC
policy, since the probe test requires a replication policy and would
therefore choose a non-zero policy index.
The fix is simply to specify the shard's policy index when inserting
the fake object.
Change-Id: Iec3f8ec29950220bb1b2ead9abfdfb1a261517d6
The sharder daemon visits container dbs and when necessary executes
the sharding workflow on the db.
The workflow is, in overview:
- perform an audit of the container for sharding purposes.
- move any misplaced objects that do not belong in the container
to their correct shard.
- move shard ranges from FOUND state to CREATED state by creating
shard containers.
- move shard ranges from CREATED to CLEAVED state by cleaving objects
to shard dbs and replicating those dbs. By default this is done in
batches of 2 shard ranges per visit.
Additionally, when the auto_shard option is True (NOT yet recommeneded
in production), the sharder will identify shard ranges for containers
that have exceeded the threshold for sharding, and will also manage
the sharding and shrinking of shard containers.
The manage_shard_ranges tool provides a means to manually identify
shard ranges and merge them to a container in order to trigger
sharding. This is currently the recommended way to shard a container.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: I7f192209d4d5580f5a0aa6838f9f04e436cf6b1f