swift

Author	SHA1	Message	Date
Zuul	4d3f9fe952	Merge "sharding: don't replace own_shard_range without an epoch"	2024-02-08 01:04:58 +00:00
Matthew Oliver	8227f4539c	sharding: don't replace own_shard_range without an epoch We've observed a root container suddenly thinks it's unsharded when it's own_shard_range is reset. This patch blocks a remote osr with an epoch of None from overwriting a local epoched OSR. The only way we've observed this happen is when a new replica or handoff node creates a container and it's new own_shard_range is created without an epoch and then replicated to older primaries. However, if a bad node with a non-epoched OSR is on a primary, it's newer timestamp would prevent pulling the good osr from it's peers. So it'll be left stuck with it's bad one. When this happens expect to see a bunch of: Ignoring remote osr w/o epoch: x, from: y When an OSR comes in from a replica that doesn't have an epoch when it should, we do a pre-flight check to see if it would remove the epoch before emitting the error above. We do this because when sharding is first initiated it's perfectly valid to get OSR's without epochs from replicas. This is expected and harmless. Closes-bug: #1980451 Change-Id: I069bdbeb430e89074605e40525d955b3a704a44f	2024-02-07 13:37:58 -08:00
Alistair Coles	252f0d36b7	proxy: only use listing shards cache for 'auto' listings The proxy should NOT read or write to memcache when handling a container GET that explicitly requests 'shard' or 'object' record type. A request for 'shard' record type may specify 'namespace' format, but this request is unrelated to container listings or object updates and passes directly to the backend. This patch also removes unnecessary JSON serialisation and de-serialisation of namespaces within the proxy GET path when a sharded object listing is being built. The final response body will contain a list of objects so there is no need to write intermediate response bodies with a list of namespaces. Requests that explicitly specify record type of 'shard' will of course still have the response body with serialised shard dicts that is returned from the backend. Change-Id: Id79c156432350c11c52a4004d69b85e9eb904ca6	2024-01-31 11:02:54 +00:00
Matthew Oliver	03b66c94f4	Proxy: Use namespaces when getting listing/updating shards With the Related-Change, container servers can return a list Namespace objects in response to a GET request. This patch modifies the proxy to take advantage of this when fetching namespaces. Specifically, the proxy only needs Namespaces when caching 'updating' or 'listing' shard range metadata. In order to allow upgrades to clusters we can't just send 'X-Backend-Record-Type = namespace', as old container servers won't know how to respond. Instead, proxies send a new header 'X-Backend-Record-Shard-Format = namespace' along with the existing 'X-Backend-Record-Type = shard' header. Newer container servers will return namespaces, old container servers continue to return full shard ranges and they are parsed as Namespaces by the new proxy. This patch refactors _get_from_shards to clarify that it does not require ShardRange objects. The method is now passed a list of namespaces, which is parsed from the response body before the method is called. Some unit tests are also refactored to be more realistic when mocking _get_from_shards. Also refactor the test_container tests to better test shard-range and namespace responses from legacy and modern container servers. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Co-Authored-By: Jianjian Huo <jhuo@nvidia.com> Related-Change: If152942c168d127de13e11e8da00a5760de5ae0d Change-Id: I7169fb767525753554a40e28b8c8c2e265d08ecd	2024-01-11 10:46:53 +00:00
Jianjian Huo	c073933387	Container-server: add container namespaces GET The proxy-server makes GET requests to the container server to fetch full lists of shard ranges when handling object PUT/POST/DELETE and container GETs, then it only stores the Namespace attributes (lower and name) of the shard ranges into Memcache and reconstructs the list of Namespaces based on those attributes. Thus, a namespaces GET interface can be added into the backend container-server to only return a list of those Namespace attributes. On a container server setup which serves a container with ~12000 shard ranges, benchmarking results show that the request rate of the HTTP GET all namespaces (states=updating) is ~12 op/s, while the HTTP GET all shard ranges (states=updating) is ~3.2 op/s. The new namespace GET interface supports most of headers and parameters supported by shard range GET interface. For example, the support of marker, end_marker, include, reverse and etc. Two exceptions are: 'x-backend-include-deleted' cannot be supported because there is no way for a Namespace to indicate the deleted state; the 'auditing' state query parameter is not supported because it is specific to the sharder which only requests full shard ranges. Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: If152942c168d127de13e11e8da00a5760de5ae0d	2024-01-11 10:46:53 +00:00
Alistair Coles	71ad062bc3	proxy: remove x-backend-record-type=shard in object listing When constructing an object listing from container shards, the proxy would previously return the X-Backend-Record-Type header with the value 'shard' that is returned with the initial GET response from the root container. It didn't break anything but was plainly wrong. This patch removes the header from object listing responses to request that did not have the header. The header value is not set to 'object' because in a request that value specifically means 'do not recurse into shards'. Change-Id: I94c68e5d5625bc8b3d9cd9baa17a33bb35a7f82f	2023-12-11 14:18:20 +00:00
Zuul	247c17b60c	Merge "Sharding: No stat updates before CLEAVED state"	2023-02-02 04:07:55 +00:00
Jianjian Huo	b4124e0cd2	Memcached: add timing stats to set/get and other public functions Change-Id: If6af519440fb444539e2526ea4dcca0ec0636388	2023-01-06 10:02:15 -08:00
Jianjian Huo	ec95047339	Sharder: add a new probe test for the case of slow parent sharding. Probe test to produce a scenario where a parent container is stuck at sharding because of a gap in shard ranges. And the gap is caused by deleted child shard range which finishes sharding before its parent does. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I73918776ed91b19ba3fd6deda2fe4ca2820f4dbf	2023-01-03 13:02:03 -08:00
Zuul	ec47bd12bf	Merge "Switch to pytest"	2022-12-13 19:51:08 +00:00
Matthew Oliver	ece4b04e82	Sharding: No stat updates before CLEAVED state Once a shard container has been created as part of the sharder cycle it pulls the shards own_shard_range, updates the object_count and bytes_used and pushes this to the root container. The root container can use these to display the current container stats. However, it is not until a shard gets to the CLEAVED state, that it holds enough information for it's namespace, so before this the number it returns is incorrect. Further, when we find and create a shard, it starts out with the number of objects, at the time, that are expected to go into them. This is better answer then, say, nothing. So it's better for the shard to send it's current own_shard_range but don't update the stats until it can be authoritive of that answer. This patch adds a new SHARD_UPDATE_STAT_STATES that track what ShardRange states a shard needs to be in in order to be responsible, current definition is: SHARD_UPDATE_STAT_STATES = [ShardRange.CLEAVED, ShardRange.ACTIVE, ShardRange.SHARDING, ShardRange.SHARDED, ShardRange.SHRINKING, ShardRange.SHRUNK] As we don't want to update the OSR stats and the meta_timestmap, also move tombstone updates to only happen when in a SHARD_UPDATE_STAT_STATES state. Change-Id: I838dbba3c791fffa6a36ffdcf73eceeaff718373	2022-12-12 17:34:53 +11:00
Tim Burke	ef155bd74a	Switch to pytest nose has not seen active development for many years now. With py310, we can no longer use it due to import errors. Also update lower contraints Closes-Bug: #1993531 Change-Id: I215ba0d4654c9c637c3b97953d8659ac80892db8	2022-12-09 11:38:02 -08:00
Alistair Coles	001d931e6a	sharder: update own_sr stats explicitly Previously, when fetching a shard range from a container DB using ContainerBroker.get_own_shard_range(), the stats of the returned shard range were updated as a side effect. However, the stats were not persisted in the own shard range row in the DB. Often the extra DB queries to get the stats are unnecessary because we don't need up-to-date stats in the returned shard range. The behavior also leads to potential confusion because the object stats of the returned shard range object are not necessarily consistent with the object stats of the same shard range in the DB. This patch therefore removes the stats updating behavior from get_own_shard_range() and makes the stats updating happen as an explicit separate operation, when needed. This is also more consistent with how the tombstone count is updated. Up-to-date own shard range stats are persisted when a container is first enabled for sharding, and then each time a shard container reports its stats to the root container. Change-Id: Ib10ef918c8983ca006a3740db8cfd07df2dfecf7	2022-12-01 14:23:37 +00:00
Alistair Coles	2bcf3d1a8e	sharder: merge shard shard_ranges from root while sharding We've seen shards become stuck while sharding because they had incomplete or stale deleted shard ranges. The root container had more complete and useful shard ranges into which objects could have been cleaved, but the shard never merged the root's shard ranges. While the sharder is auditing shard container DBs it would previously only merge shard ranges fetched from root into the shard DB if the shard was shrinking or the shard ranges were known to be children of the shard. With this patch the sharder will now merge other shard ranges from root during sharding as well as shrinking. Shard ranges from root are only merged if they would not result in overlaps or gaps in the set of shard ranges in the shard DB. Shard ranges that are known to be ancestors of the shard are never merged, except the root shard range which may be merged into a shrinking shard. These checks were not previously applied when merging shard ranges into a shrinking shard. The two substantive changes with this patch are therefore: - shard ranges from root are now merged during sharding, subject to checks. - shard ranges from root are still merged during shrinking, but are now subjected to checks. Change-Id: I066cfbd9062c43cd9638710882ae9bd85a5b4c37	2022-11-16 16:12:32 +00:00
Alistair Coles	a46f2324ab	sharder: always merge child shard ranges fetched from root While the sharder is auditing shard container DBs it would previously only merge shard ranges fetched from root into the shard DB if the shard was shrinking; shrinking is the only time when a shard normally must receive sub-shards from the root. With this patch the sharder will also merge shard ranges fetched from the root if they are known to be the children of the shard, regardless of the state of the shard. Children shard ranges would previously only have been merged during replication with peers of the shard; merging shard-ranges from the root during audit potentially speeds their propagation to peers that have yet to replicate. Change-Id: I57aafc537ff94b081d0e1ea70e7fb7dd3598c61e	2022-09-30 11:20:23 +01:00
Jianjian Huo	a53270a15a	swift-manage-shard-ranges repair: check for parent-child overlaps. Stuck shard ranges have been seen in the production, root cause has been traced back to that s-m-s-r failed to detect parent-child relationship in overlaps and it either shrinked child shard ranges into parents or the other way around. A patch has been added to check minimum age before s-m-s-r performs repair, which will most likely prevent this from happening again, but we also need to check for parent-child relationship in overlaps explicitly during repairs. This patch will do that and remove parent or child shard ranges from doners, and prevent s-m-s-r from shrinking them into acceptor shard ranges. Drive-by 1: fixup gap repair probe test. The probe test is no longer appropriate because we're no longer allowed to repair parent-child overlaps, so replace the test with a manually created gap. Drive-by 2: address probe test TODOs. The commented assertion would fail because the node filtering comparison failed to account for the same node having different indexes when generated for the root versus the shard. Adding a new iterable function filter_nodes makes the node filtering behave as expected. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: Iaa89e94a2746ba939fb62449e24bdab9666d7bab	2022-09-09 11:04:43 -07:00
Jianjian Huo	61624ab837	swift-manage-shard-ranges repair: ignore recent overlaps Add an option for the swift-manage-shard-ranges repair tool to ignore overlaps where the overlapping shards appear to be recently created, since this may indicate that they are shards of the parent with which they overlap. The new option is --min-shard-age with a default of 14400 seconds. Change-Id: Ib82219a506732303a1157c2c9e1c452b4a56061b	2022-08-22 21:45:29 -07:00
Alistair Coles	38271142eb	sharder: process deleted DBs It is possible for some replicas of a container DB to be sharded, and for the container to then be emptied and deleted, before another replica of the DB has started sharding. Previously, the unsharded replica would remain in the unsharded state because the sharder would not process deleted DBs. This patch modifies the sharder to always process deleted DBs; this will result in the sharder making some additional DB queries for shard ranges in order to determine whether any processing is required for deleted DBs. Auto-sharding will not act on deleted DBs. Change-Id: Ia6ad92042aa9a51e3ddefec0b9b8acae44e9e9d7	2022-07-27 15:48:40 +01:00
Alistair Coles	57f7145f73	sharder: always set state to CLEAVED after cleaving During cleaving, if the sharder finds that zero object rows are copied from the parent retiring DB to a cleaved shard DB, and if that shard DB appears have been freshly created by the cleaving process, then the sharder skips replicating that shard DB and does not count the shard range as part of the batch (see Related-Change). Previously, any shard range treated in this way would not have its state moved to CLEAVED but would remain in the CREATED state. However, cleaving of following shard ranges does continue, leading to anomalous sets of shard range states, including all other shard ranges moving to ACTIVE but the skipped range remaining in CREATED (until another sharder visitation finds object rows and actually replicates the cleaved shard DB). These anomalies can be avoided by moving the skipped shard range to the CLEAVED state. This is exactly what would happen anyway if the cleaved DB had only one object row copied to it, or if the cleaved DB had zero object rows copied to it but happened to already exist on disk. Related-Change: Id338f6c3187f93454bcdf025a32a073284a4a159 Change-Id: I1ca7bf42ee03a169261d8c6feffc38b53226c97f	2022-07-13 17:54:06 +01:00
Clay Gerrard	ac8f5550a0	sharder: fix probe tests skipping conditions Previously all sharder probe tests were skipped if auto-sharding was not configured for the test cluster. This is now fixed so that only tests that require auto-sharding are skipped. Change-Id: Icefc6441f8d48a0d418a4ab70b778b48ffdcbb59	2022-07-07 15:10:37 +01:00
Alistair Coles	b45b45fa72	manage-shard-ranges: add gap repair option Change-Id: I8883b63be315a7891c4bfbec662c81f218d1f263	2022-05-23 09:44:16 +01:00
Tim Burke	a5a98d7e3b	tests: Fix swiftclient/requests log level adjustment Gotta update now that swiftclient changed how it imports and uses requests. Related-Change: https://review.opendev.org/c/openstack/python-swiftclient/+/828821 Change-Id: I0e0f802fa355060f43f9e63f52897fbcf66816d2	2022-04-07 14:47:14 -07:00
Matthew Oliver	56510ab3c3	container-server: return objects of a given policy There is a tight coupling between a root container and its shards: the shards hold the object metadata for the root container, so are really an extension of the root. When we PUT objects in to a root container, it'll redirect them, with the root's policy, to the shards. And the shards are happy to take them, even if the shard's policy is different to the root's. But when it comes to GETs, the root redirects the GET onto it's shards whom currently wont respond with objects (which they probably took) because they are of a different policy. Currently, when getting objects from the container server, the policy used is always the broker's policy. This patch corrects this behaviour by allowing the ability to override the policy index to use. If the request to the container server contains an 'X-Backend-Storage-Policy-Index' header it'll be used instead of the policy index stored in the broker. This patch adds the root container's policy as this header in the proxy container controller's `_get_from_shards` method which is used by the proxy to redirect a GET to a root to its shards. Further, a new backend response header has been added. If the container response contains an `X-Backend-Record-Type: object` header, then it means the response is a response with objects in it. In this case this patch also adds a `X-Backend-Record-Storage-Policy-Index` header so the policy index of the given objects is known, as X-Backend-Storage-Policy-Index in the response _always_ represents the policy index of the container itself. On a plus side this new container policy API allows us a way to check containers for object listing is other policies. So might come in handy for OPs/SREs. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I026b699fc5f0fba619cf524093632d67ca38d32f	2021-08-16 11:56:54 +01:00
Alistair Coles	2a593174a5	sharder: avoid small tail shards A container is typically sharded when it has grown to have an object count of shard_container_threshold + N, where N << shard_container_threshold. If sharded using the default rows_per_shard of shard_container_threshold / 2 then this would previously result in 3 shards: the tail shard would typically be small, having only N rows. This behaviour caused more shards to be generated than desirable. This patch adds a minimum-shard-size option to swift-manage-shard-ranges, and a corresponding option in the sharder config, which can be used to avoid small tail shards. If set to greater than one then the final shard range may be extended to more than rows_per_shard in order to avoid a further shard range with less than minimum-shard-size rows. In the example given, if minimum-shard-size is set to M > N then the container would shard into two shards having rows_per_shard rows and rows_per_shard + N respectively. The default value for minimum-shard-size is rows_per_shard // 5. If all options have their default values this results in minimum-shard-size being 100000. Closes-Bug: #1928370 Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Change-Id: I3baa278c6eaf488e3f390a936eebbec13f2c3e55	2021-07-07 13:59:36 +01:00
Zuul	5ec3826246	Merge "Quarantine stale EC fragments after checking handoffs"	2021-05-11 22:44:16 +00:00
Alistair Coles	46ea3aeae8	Quarantine stale EC fragments after checking handoffs If the reconstructor finds a fragment that appears to be stale then it will now quarantine the fragment. Fragments are considered stale if insufficient fragments at the same timestamp can be found to rebuild missing fragments, and the number found is less than or equal to a new reconstructor 'quarantine_threshold' config option. Before quarantining a fragment the reconstructor will attempt to fetch fragments from handoff nodes in addition to the usual primary nodes. The handoff requests are limited by a new 'request_node_count' config option. 'quarantine_threshold' defaults to zero i.e. no fragments will be quarantined. 'request node count' defaults to '2 * replicas'. Closes-Bug: 1655608 Change-Id: I08e1200291833dea3deba32cdb364baa99dc2816	2021-05-10 20:45:17 +01:00
Zuul	876cf34862	Merge "Consider tombstone count before shrinking a shard"	2021-05-10 05:49:18 +00:00
Alistair Coles	bcecddd517	Consider tombstone count before shrinking a shard Previously a shard might be shrunk if its object_count was fell below the shrink_threshold. However, it is possible that a shard with few objects has a large number of tombstones, which would result in a larger than anticipated replication of rows to the acceptor shard. With this patch, a shard's row count (i.e. the sum of tombstones and objects) must be below the shrink_threshold before the shard will be considered for shrinking. A number of changes are made to enable tombstone count to be used in shrinking decisions: - DatabaseBroker reclaim is enhanced to count remaining tombstones after rows have been reclaimed. A new TombstoneReclaimer class is added to encapsulate the reclaim process and tombstone count. - ShardRange has new 'tombstones' and 'row_count' attributes. - A 'tombstones' column is added to the Containerbroker shard_range table. - The sharder performs a reclaim prior to reporting shard container stats to the root container so that the tombstone count can be included. - The sharder uses 'row_count' rather than 'object_count' when evaluating if a shard range is a shrink candidate. Change-Id: I41b86c19c243220b7f1c01c6ecee52835de972b6	2021-05-07 18:41:18 +01:00
Alistair Coles	29418998b7	Fix shrinking making acceptors prematurely active During sharding a shard range is moved to CLEAVED state when cleaved from its parent. However, during shrinking an acceptor shard should not be moved to CLEAVED state when the shrinking shard cleaves to it, because the shrinking shard is not the acceptor's parent and does not know if the acceptor has yet been cleaved from its parent. The existing attempt to prevent a shrinking shard updating its acceptor state relied on comparing the acceptor namespace to the shrinking shard namespace: if the acceptor namespace fully enclosed the shrinkng shard then it was inferred that shrinking was taking place. That check is sufficient for normal shrinking of one shard into an expanding acceptor, but is not sufficient when shrinking in order to fix overlaps, when a shard might shrink into more than one acceptor, none of which completely encloses the shrinking shard. Fortunately, since [1], it is possible to determine that a shard is shrinking from its own shard range state being either SHRINKING or SHRUNK. It is still advantageous to delete and merge the shrinking shard range into the acceptor when the acceptor fully encloses the shrinking shard because that increases the likelihood of the root being updated with the deleted shard range in a timely manner. [1] Related-Change: I9034a5715406b310c7282f1bec9625fe7acd57b6 Change-Id: I91110bc747323e757d8b63003ad3d38f915c1f35	2021-04-29 09:38:46 +01:00
Alistair Coles	122840cc04	probe test: use helper functions more widely Use the recently added assert_subprocess_success [1] helper function more widely. Add run_custom_sharder helper. Add container-sharder key to the ProbeTest.configs dict. [1] Related-Change: I9ec411462e4aaf9f21aba6c5fd7698ff75a07de3 Change-Id: Ic2bc4efeba5ae5bc8881f0deaf4fd9e10213d3b7	2021-04-08 12:18:40 +01:00
Alistair Coles	c9c42c07c9	swift-manage-shard-ranges: add repair and analyze commands Adds a repair command that reads shard ranges from the container DB, identifies overlapping shard ranges and recommends how overlaps can be repaired by shrinking into a single chosen path. Prompted user input or a '-y' option will then cause the appropriately modified shard ranges to be merged into the container DB. The repair command does not fix gaps in shard range paths and can therefore only succeed if there is at least one unbroken path of shards through the entire namespace. Also adds an analyze command that loads shard data from a file and reports overlapping shard ranges, but takes no action. The analyze command is similar to the repair command but reads shard data from file rather than a container db and makes no changes to the db. e.g.: swift-manage-shard-ranges <db-file-name> repair swift-manage-shard-ranges <shard-data.json> analyze to see more detail: swift-manage-shard-ranges -v <shard-data.json> analyze For consistency with the new repair command, and to be more cautious, this patch changes the user input required to apply the compact command changes to the DB from 'y' to 'yes'. Change-Id: I9ec411462e4aaf9f21aba6c5fd7698ff75a07de3	2021-03-25 11:00:15 +00:00
Zuul	5c3eb488f2	Merge "Report final in_progress when sharding is complete"	2021-02-26 18:42:36 +00:00
Matthew Oliver	1de9834816	Report final in_progress when sharding is complete On every sharder cycle up update in progress recon stats for each sharding container. However, we tend to not run it one final time once sharding is complete because the DB state is changed to SHARDED and therefore the in_progress stats never get their final update. For those collecting this data to monitor, this makes sharding/cleaving shards never complete. This patch, adds a new option `recon_shared_timeout` which will now allow sharded containers to be processed by `_record_sharding_progress()` after they've finished sharding for an amount of time. Change-Id: I5fa39d41f9cd3b211e45d2012fd709f4135f595e	2021-02-26 15:56:30 +00:00
Zuul	eafeda8a51	Merge "swift-manage-shard-ranges: add 'compact' command"	2021-02-10 03:38:50 +00:00
Zuul	1d34f321ac	Merge "Enable shard ranges to be manually shrunk to root container"	2021-02-08 03:01:39 +00:00
Alistair Coles	12bb4839f0	swift-manage-shard-ranges: add 'compact' command This patch adds a 'compact' command to swift-manage-shard-ranges that enables sequences of contiguous shards with low object counts to be compacted into another existing shard, or into the root container. Change-Id: Ia8f3297d610b5a5cf5598d076fdaf30211832366	2021-02-05 17:18:29 +00:00
Alistair Coles	b0c8de699e	Enable shard ranges to be manually shrunk to root container Shard containers learn about their own shard range by fetching shard ranges from the root container during the sharder audit phase. Since [1], if the shard is shrinking, it may also learn about acceptor shards in the shard ranges fetched from the root. However, the fetched shard ranges do not currently include the root's own shard range, even when the root is to be the acceptor for a shrinking shard. This prevents the mechanism being used to perform shrinking to root. This patch modifies the root container behaviour to include its own shard range in responses to shard containers when the container GET request param 'states' has value 'auditing'. This parameter is used to indicate that a particular GET request is from the sharder during shard audit; the root does not otherwise include its own shard range in GET responses. When the 'states=auditing' parameter is used with a container GET request the response includes all shard ranges except those in the FOUND state. The shard ranges of relevance to a shard are its own shard range and any overlapping shard ranges that may be acceptors if the shard is shrinking. None of these relevant shard ranges should be in state FOUND: the shard itself cannot be in FOUND state since it has been created; acceptor ranges should not be in FOUND state. The FOUND state is therefore excluded from the 'auditing' states to prevent an unintended overlapping FOUND shard range that has not yet been resolved at the root container being fetched by a shrinking shard, which might then proceed to create and cleave to it. The shard only merges the root's shard range (and any other shard ranges) when the shard is shrinking. If the root shard range is ACTIVE then it is the acceptor and will be used when the shard cleaves. If the root shard range is in any other state then it will be ignored when the shard cleaves to other acceptors. The sharder cleave loop is modified to break as soon as cleaving is done i.e. cleaving has been completed up to the shard's upper bound. This prevents misleading logging that cleaving has stopped when in fact cleaving to a non-root acceptor has completed but the shard range list still contains an irrelevant root shard range in SHARDED state. This also prevents cleaving to more than one acceptor in the unexpected case that multiple active acceptors overlap the shrinking shard - cleaving will now complete once the first acceptor has cleaved. [1] Related-Change: I9034a5715406b310c7282f1bec9625fe7acd57b6 Change-Id: I5d48b67217f705ac30bb427ef8d969a90eaad2e5	2021-02-05 11:44:50 +00:00
Matthew Oliver	6c62c8d072	Do not delete root_path on ContainerBroker.delete_db When a shard is deleted all the sysmeta is deleted along with it. But this is problematic as we support dealing with misplaced objects in deleted shards. Which is required. Currently, when we deal with misplaced objects in a shard broker marked as deleted we push the objects into a handoff broker and set this broker's root_path attribute from the deleted shard. Unfortunately, root_path is itself stored in sysmeta, which has been removed. So the deleted broker falls back to using its own account and container in the root_path. This in turn gets pushed to the shard responsible for the misplaced objects and because these objects are pushed by replication the root_path meta has a newer timestamp, replacing the shard's pointer to the root. As a consequence listings all still works because it's root driven, but the shard will never pull the latest shard range details from the _real_ root container during audits. This patch contains a probe test that demonstrates this issue and also fixes it by making 'X-Container-Sysmeta-Shard-Quoted-Root' and 'X-Container-Sysmeta-Shard-Root' whitelisted from being cleared on delete. Meaning a deleted shard retains it's knowledge of root so it can correctly deal with misplaced objects. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I3315f349e5b965cecadc78af9e1c66c3f3bcfe83	2021-02-04 10:15:30 +00:00
Clay Gerrard	f53ba5b502	Do not reclaim sharded roots until they shrink We don't normally issue any DELETEs to shards when an empty root accepts a DELETE from the client. If we allow root dbs to reclaim while they still have shards we risk letting undeleted shards get orphaned. Partial-Bug: 1911232 Change-Id: I4f591e393a526bb74675874ba81bf743936633c1	2021-01-25 20:27:59 +00:00
Alistair Coles	077ba77ea6	Use cached shard ranges for container GETs This patch makes four significant changes to the handling of GET requests for sharding or sharded containers: - container server GET requests may now result in the entire list of shard ranges being returned for the 'listing' state regardless of any request parameter constraints. - the proxy server may cache that list of shard ranges in memcache and the requests environ infocache dict, and subsequently use the cached shard ranges when handling GET requests for the same container. - the proxy now caches more container metadata so that it can synthesize a complete set of container GET response headers from cache. - the proxy server now enforces more container GET request validity checks that were previously only enforced by the backend server, e.g. checks for valid request parameter values With this change, when the proxy learns from container metadata that the container is sharded then it will cache shard ranges fetched from the backend during a container GET in memcache. On subsequent container GETs the proxy will use the cached shard ranges to gather object listings from shard containers, avoiding further GET requests to the root container until the cached shard ranges expire from cache. Cached shard ranges are most useful if they cover the entire object name space in the container. The proxy therefore uses a new X-Backend-Override-Shard-Name-Filter header to instruct the container server to ignore any request parameters that would constrain the returned shard range listing i.e. 'marker', 'end_marker', 'includes' and 'reverse' parameters. Having obtained the entire shard range listing (either from the server or from cache) the proxy now applies those request parameter constraints itself when constructing the client response. When using cached shard ranges the proxy will synthesize response headers from the container metadata that is also in cache. To enable the full set of container GET response headers to be synthezised in this way, the set of metadata that the proxy caches when handling a backend container GET response is expanded to include various timestamps. The X-Newest header may be used to disable looking up shard ranges in cache. Change-Id: I5fc696625d69d1ee9218ee2a508a1b9be6cf9685	2021-01-06 16:28:49 +00:00
Samuel Merritt	b971280907	Let developers/operators add watchers to object audit Swift operators may find it useful to operate on each object in their cluster in some way. This commit provides them a way to hook into the object auditor with a simple, clearly-defined boundary so that they can iterate over their objects without additional disk IO. For example, a cluster operator may want to ensure a semantic consistency with all SLO segments accounted in their manifests, or locate objects that aren't in container listings. Now that Swift has encryption support, this could be used to locate unencrypted objects. The list goes on. This commit makes the auditor locate, via entry points, the watchers named in its config file. A watcher is a class with at least these four methods: __init__(self, conf, logger, kwargs) start(self, audit_type, kwargs) see_object(self, object_metadata, data_file_path, kwargs) end(self, kwargs) The auditor will call watcher.start(audit_type) at the start of an audit pass, watcher.see_object(...) for each object audited, and watcher.end() at the end of an audit pass. All method arguments are passed as keyword args. This version of the API is implemented on the context of the auditor itself, without spawning any additional processes. If the plugins are not working well -- hang, crash, or leak -- it's easier to debug them when there's no additional complication of processes that run by themselves. In addition, we include a reference implementation of plugin for the watcher API, as a help to plugin writers. Change-Id: I1be1faec53b2cdfaabf927598f1460e23c206b0a	2020-12-26 17:16:14 -06:00
Clay Gerrard	d277960161	Populate shrinking shards with shard ranges learnt from root Shard shrinking can be instigated by a third party modifying shard ranges, moving one shard to shrinking state and expanding the namespace of one or more other shard(s) to act as acceptors. These state and namespace changes must propagate to the shrinking and acceptor shards. The shrinking shard must also discover the acceptor shard(s) into which it will shard itself. The sharder audit function already updates shards with their own state and namespace changes from the root. However, there is currently no mechanism for the shrinking shard to learn about the acceptor(s) other than by a PUT request being made to the shrinking shard container. This patch modifies the shard container audit function so that other overlapping shards discovered from the root are merged into the audited shard's db. In this way, the audited shard will have acceptor shards to cleave to if shrinking. This new behavior is restricted to when the shard is shrinking. In general, a shard is responsible for processing its own sub-shard ranges (if any) and reporting them to root. Replicas of a shard container synchronise their sub-shard ranges via replication, and do not rely on the root to propagate sub-shard ranges between shard replicas. The exception to this is when a third party (or auto-sharding) wishes to instigate shrinking by modifying the shard and other acceptor shards in the root container. In other circumstances, merging overlapping shard ranges discovered from the root is undesirable because it risks shards inheriting other unrelated shard ranges. For example, if the root has become polluted by split-brain shard range management, a sharding shard may have its sub-shards polluted by an undesired shard from the root. During the shrinking process a shard range's own shard range state may be either shrinking or, prior to this patch, sharded. The sharded state could occur when one replica of a shrinking shard completed shrinking and moved the own shard range state to sharded before other replica(s) had completed shrinking. This makes it impossible to distinguish a shrinking shard (with sharded state), which we do want to inherit shard ranges, from a sharding shard (with sharded state), which we do not want to inherit shard ranges. This patch therefore introduces a new shard range state, 'SHRUNK', and applies this state to shard ranges that have completed shrinking. Shards are now restricted to inherit shard ranges from the root only when their own shard range state is either SHRINKING or SHRUNK. This patch also: - Stops overlapping shrinking shards from generating audit warnings: overlaps are cured by shrinking and we therefore expect shrinking shards to sometimes overlap. - Extends an existing probe test to verify that overlapping shard ranges may be resolved by shrinking a subset of the shard ranges. - Adds a --no-auto-shard option to swift-container-sharder to enable the probe tests to disable auto-sharding. - Improves sharder logging, in particular by decrementing ranges_todo when a shrinking shard is skipped during cleaving. - Adds a ShardRange.sort_key class method to provide a single definition of ShardRange sort ordering. - Improves unit test coverage for sharder shard auditing. Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I9034a5715406b310c7282f1bec9625fe7acd57b6	2020-12-18 11:33:48 +00:00
Ade Lee	5320ecbaf2	replace md5 with swift utils version md5 is not an approved algorithm in FIPS mode, and trying to instantiate a hashlib.md5() will fail when the system is running in FIPS mode. md5 is allowed when in a non-security context. There is a plan to add a keyword parameter (usedforsecurity) to hashlib.md5() to annotate whether or not the instance is being used in a security context. In the case where it is not, the instantiation of md5 will be allowed. See https://bugs.python.org/issue9216 for more details. Some downstream python versions already support this parameter. To support these versions, a new encapsulation of md5() is added to swift/common/utils.py. This encapsulation is identical to the one being added to oslo.utils, but is recreated here to avoid adding a dependency. This patch is to replace the instances of hashlib.md5() with this new encapsulation, adding an annotation indicating whether the usage is a security context or not. While this patch seems large, it is really just the same change over and again. Reviewers need to pay particular attention as to whether the keyword parameter (usedforsecurity) is set correctly. Right now, all of them appear to be not used in a security context. Now that all the instances have been converted, we can update the bandit run to look for these instances and ensure that new invocations do not creep in. With this latest patch, the functional and unit tests all pass on a FIPS enabled system. Co-Authored-By: Pete Zaitcev Change-Id: Ibb4917da4c083e1e094156d748708b87387f2d87	2020-12-15 09:52:55 -05:00
Tim Burke	90c737e355	Use swiftclient Connections in sharding probe tests This gets us retries "for free" and should reduce gate flakiness. Change-Id: Ia2e4c94f246230a3e25e4557b4b2c1a3a67df756	2020-11-09 10:55:15 -08:00
Zuul	948289151b	Merge "probe tests: Work when fronted by a TLS terminator"	2020-05-05 00:45:51 +00:00
Tim Burke	630c9ef809	probe tests: Work when fronted by a TLS terminator * Add a new config option, proxy_base_url * Support HTTPS as well as HTTP connections * Monkey-patch eventlet early so we never import an unpatched version from swiftclient Change-Id: I4945d512966d3666f2738058f15a916c65ad4a6b	2020-05-04 10:54:01 -07:00
Tim Burke	d0f0d1d4f3	sharding: Add probe test that exercises swift-manage-shard-ranges Change-Id: Ic7c40589679c290e5565f9581f70b9a1c070f6ab	2020-04-20 18:46:31 -07:00
Clay Gerrard	2759d5d51c	New Object Versioning mode This patch adds a new object versioning mode. This new mode provides a new set of APIs for users to interact with older versions of an object. It also changes the naming scheme of older versions and adds a version-id to each object. This new mode is not backwards compatible or interchangeable with the other two modes (i.e., stack and history), especially due to the changes in the namimg scheme of older versions. This new mode will also serve as a foundation for adding S3 versioning compatibility in the s3api middleware. Note that this does not (yet) support using a versioned container as a source in container-sync. Container sync should be enhanced to sync previous versions of objects. Change-Id: Ic7d39ba425ca324eeb4543a2ce8d03428e2225a1 Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com>	2020-01-24 17:39:56 -08:00
Tim Burke	3f88907012	sharding: Better-handle newlines in container names Previously, if you were on Python 2.7.10+ [0], such a newline would cause the sharder to fail, complaining about invalid header values when trying to create the shard containers. On older versions of Python, it would most likely cause a parsing error in the container-server that was trying to handle the PUT. Now, quote all places that we pass around container paths. This includes: * The X-Container-Sysmeta-Shard-(Quoted-)Root sent when creating the (empty) remote shards * The X-Container-Sysmeta-Shard-(Quoted-)Root included when initializing the local handoff for cleaving * The X-Backend-(Quoted-)Container-Path the proxy sends to the object-server for container updates * The Location header the container-server sends to the object-updater Note that a new header was required in requests so that servers would know whether the value should be unquoted or not. We can get away with reusing Location in responses by having clients opt-in to quoting with a new X-Backend-Accept-Quoted-Location header. During a rolling upgrade, * old object-servers servicing requests from new proxy-servers will not know about the container path override and so will try to update the root container, * in general, object updates are more likely to land in the root container; the sharder will deal with them as misplaced objects, and * shard containers created by new code on servers running old code will think they are root containers until the server is running new code, too; during this time they'll fail the sharder audit and report stats to their account, but both of these should get cleared up upon upgrade. Drive-by: fix a "conainer_name" typo that prevented us from testing that we can shard a container with unicode in its name. Also, add more UTF8 probe tests. [0] See https://bugs.python.org/issue22928 Change-Id: Ie08f36e31a448a547468dd85911c3a3bc30e89f1 Closes-Bug: 1856894	2020-01-03 16:04:57 -08:00
Zuul	d059505aba	Merge "sharding: Update probe test to verify CleavingContext cleanup"	2019-09-25 23:11:33 +00:00

1 2

60 Commits