28 Commits

Author SHA1 Message Date
Zuul
569525a937 Merge "tests: Get test_handoff_non_durable passing with encryption enabled" 2024-01-18 08:47:36 +00:00
Tim Burke
cb468840b9 Add non-ascii meta values to ssync probe test
Change-Id: I61c7780a84a1f0cee6975da67d08417cf6aa4ea2
2023-08-03 12:33:56 -07:00
Tim Burke
91a6b52a4f tests: Get test_handoff_non_durable passing with encryption enabled
Can't make assertions on object-meta values when we're making direct
requests; those have all been shifted to transient sysmeta. Use
Content-Type instead, since we don't encrypt it.

Change-Id: If2bb773cf428dfa3c23c634b3c046ba282053c02
2022-04-21 17:21:08 -07:00
Pete Zaitcev
85d0211279 Get rid of port to node assumptions and their modulo kludges
We had get_server_number for many years now, there's no reason
to continue using the node_id=port%100//10 thing.

Change-Id: I5357a095110e8e4889c0468c154611209c6e8c07
2021-09-30 00:42:24 -05:00
Alistair Coles
2696a79f09 reconstructor: retire nondurable_purge_delay option
The nondurable_purge_delay option was introduced in [1] to prevent the
reconstructor removing non-durable data files on handoffs that were
about to be made durable. The DiskFileManager commit_window option has
since been introduced [2] which specifies a similar time window during
which non-durable data files should not be removed. The commit_window
option can be re-used by the reconstructor, making the
nondurable_purge_delay option redundant.

The nondurable_purge_delay option has not been available in any tagged
release and is therefore removed with no backwards compatibility.

[1] Related-Change: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0
[2] Related-Change: I5f3318a44af64b77a63713e6ff8d0fd3b6144f13
Change-Id: I1589a7517b7375fcc21472e2d514f26986bf5079
2021-07-19 21:18:06 +01:00
Alistair Coles
2934818d60 reconstructor: Delay purging reverted non-durable datafiles
The reconstructor may revert a non-durable datafile on a handoff
concurrently with an object server PUT that is about to make the
datafile durable.  This could previously lead to the reconstructor
deleting the recently written datafile before the object-server
attempts to rename it to a durable datafile, and consequently a
traceback in the object server.

The reconstructor will now only remove reverted nondurable datafiles
that are older (according to mtime) than a period set by a new
nondurable_purge_delay option (defaults to 60 seconds). More recent
nondurable datafiles may be made durable or will remain on the handoff
until a subsequent reconstructor cycle.

Change-Id: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0
2021-06-24 09:33:06 +01:00
Alistair Coles
1dceafa7d5 ssync: sync non-durable fragments from handoffs
Previously, ssync would not sync nor cleanup non-durable data
fragments on handoffs. When the reconstructor is syncing objects from
a handoff node (a 'revert' reconstructor job) it may be useful, and is
not harmful, to also send non-durable fragments if the receiver has
older or no fragment data.

Several changes are made to enable this. On the sending side:

  - For handoff (revert) jobs, the reconstructor instantiates
    SsyncSender with a new 'include_non_durable' option.
  - If configured with the include_non_durable option, the SsyncSender
    calls the diskfile yield_hashes function with options that allow
    non-durable fragments to be yielded.
  - The diskfile yield_hashes function is enhanced to include a
    'durable' flag in the data structure yielded for each object.
  - The SsyncSender includes the 'durable' flag in the metadata sent
    during the missing_check exchange with the receiver.
  - If the receiver requests the non-durable object, the SsyncSender
    includes a new 'X-Backend-No-Commit' header when sending the PUT
    subrequest for the object.
  - The SsyncSender includes the non-durable object in the collection
    of synced objects returned to the reconstructor so that the
    non-durable fragment is removed from the handoff node.

On the receiving side:

  - The object server includes a new 'X-Backend-Accept-No-Commit'
    header in its response to SSYNC requests. This indicates to the
    sender that the receiver has been upgraded to understand the
    'X-Backend-No-Commit' header.
  - The SsyncReceiver is enhanced to consider non-durable data when
    determining if the sender's data is wanted or not.
  - The object server PUT method is enhanced to check for and
    'X-Backend-No-Commit' header before committing a diskfile.

If a handoff sender has both a durable and newer non-durable fragment
for the same object and frag-index, only the newer non-durable
fragment will be synced and removed on the first reconstructor
pass. The durable fragment will be synced and removed on the next
reconstructor pass.

Change-Id: I1d47b865e0a621f35d323bbed472a6cfd2a5971b
Closes-Bug: 1778002
2021-01-20 12:00:10 +00:00
Alistair Coles
128f199508 Refactor reconstructor probe tests
Refactor the reconstructor probe test to share common setup and helper
methods.

Change-Id: If75803648169f85b854c3d5d8784aaebbd93805b
2021-01-11 13:57:55 +00:00
Ade Lee
5320ecbaf2 replace md5 with swift utils version
md5 is not an approved algorithm in FIPS mode, and trying to
instantiate a hashlib.md5() will fail when the system is running in
FIPS mode.

md5 is allowed when in a non-security context.  There is a plan to
add a keyword parameter (usedforsecurity) to hashlib.md5() to annotate
whether or not the instance is being used in a security context.

In the case where it is not, the instantiation of md5 will be allowed.
See https://bugs.python.org/issue9216 for more details.

Some downstream python versions already support this parameter.  To
support these versions, a new encapsulation of md5() is added to
swift/common/utils.py.  This encapsulation is identical to the one being
added to oslo.utils, but is recreated here to avoid adding a dependency.

This patch is to replace the instances of hashlib.md5() with this new
encapsulation, adding an annotation indicating whether the usage is
a security context or not.

While this patch seems large, it is really just the same change over and
again.  Reviewers need to pay particular attention as to whether the
keyword parameter (usedforsecurity) is set correctly.   Right now, all
of them appear to be not used in a security context.

Now that all the instances have been converted, we can update the bandit
run to look for these instances and ensure that new invocations do not
creep in.

With this latest patch, the functional and unit tests all pass
on a FIPS enabled system.

Co-Authored-By: Pete Zaitcev
Change-Id: Ibb4917da4c083e1e094156d748708b87387f2d87
2020-12-15 09:52:55 -05:00
Zuul
50800aba37 Merge "Update SAIO & docker image to use 62xx ports" 2020-08-01 02:39:00 +00:00
Tim Burke
314347a3cb Update SAIO & docker image to use 62xx ports
Note that existing SAIOs with 60xx ports should still work fine.

Change-Id: If5dd79f926fa51a58b3a732b212b484a7e9f00db
Related-Change: Ie1c778b159792c8e259e2a54cb86051686ac9d18
2020-07-20 15:17:12 -07:00
Tim Burke
5bd95cf2b7 probe tests: Get rid of server arg for device_dir() and storage_dir()
It's not actually *used* anywhere.

Change-Id: I8f9b5cf7f5749481ef391a2029b0c4263443a89b
2020-07-16 13:50:58 -07:00
Tim Burke
1d7e1558b3 py3: (mostly) port probe tests
There's still one problem, though: since swiftclient on py3 doesn't
support non-ASCII characters in metadata names, none of the tests in
TestReconstructorRebuildUTF8 will pass.

Change-Id: I4ec879ade534e09c3a625414d8aa1f16fd600fa4
2019-09-04 10:17:45 -07:00
Tim Burke
00ca1ce6fe Tolerate swiftclient *not* mutatinng args
Change-Id: If82fe9e1d2da8c5122881f34dfbaaa7944c66265
Related-Change: Ia1638c216eff9db6fbe416bc0570c27cfdcfe730
2017-08-25 12:27:41 -07:00
Mahati Chamarthy
188c07e12a Limit number of revert tombstone SSYNC requests
Revert tombstone only parts try to talk to all primary nodes - this
fixes it to randomize selection within part_nodes. Corresponding probe
test is modified to reflect this change.

The primary improvement of this patch is the reconstuctor at a handoff
node is being able to delete local tombstones when it succeeds to sync
to less than all primary nodes. (Before this patch, it requires all
nodes are responsible for the REVERT requests)

The number of primary nodes to communicate with the reconstructor can be
in dicsussion more but, right now with this patch, it's (replicas - k + 1)
that is able to prevent stale read.

*BONUS*

- Fix mis-testsetting (was setting less replicas than ec_k + ec_m)
  for reconstructor ring in the unit test

Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>

Change-Id: I05ce8fe75f1c4a7971cc8995b003df818b69b3c1
Closes-Bug: #1668857
2017-06-08 07:07:42 +00:00
Janie Richling
9681a833db Fix test_delete_propagate probe test
Fixes the failure in TestReconstructorRevert.test_delete_propagate
introduced by Related-Change.

Related-Change-Id: Ie351d8342fc8e589b143f981e95ce74e70e52784

Change-Id: I1657c1eecc9b62320e2cf184050e0db122821139
2017-02-10 10:07:39 -06:00
Gábor Antal
300d388825 Use more specific asserts in test/probe tests
I changed asserts with more specific assert methods.

e.g.: from assertTrue(sth == None) to assertIsNone(*) or
assertTrue(isinstance(inst, type)) to assertIsInstace(inst, type) or
assertTrue(not sth) to assertFalse(sth).

The code gets more readable, and a better description will be shown on fail.

Change-Id: I3768faa568e3964e726ecc48ac8cb133cb088284
2016-11-02 18:13:22 +00:00
Ondřej Nový
33c18c579e Remove executable flag from some test modules
Change-Id: I36560c2b54c43d1674b007b8105200869b5f7987
2016-10-31 21:22:10 +00:00
Samuel Merritt
99305b9300 Fix probe tests from commit cf48e75
Commit cf48e75 changed the default account/container/object ports in a
lot of places, including the probetests. However, it didn't change
them in doc/saio/bin/remakerings, and since the probe tests must match
the rings, they started failing.

This commit just backs out the changes to the test/probe directory so
that remakerings and the probe tests match again.

Change-Id: I316a09e6ee1a911f37ce9df3d641644739f88eeb
2016-05-02 17:29:32 -07:00
Shashirekha Gundur
cf48e75c25 change default ports for servers
Changing the recommended ports for Swift services
from ports 6000-6002 to unused ports 6200-6202;
so they do not conflict with X-Windows or other services.

Updated SAIO docs.

DocImpact
Closes-Bug: #1521339
Change-Id: Ie1c778b159792c8e259e2a54cb86051686ac9d18
2016-04-29 14:47:38 -04:00
Clay Gerrard
369447ec47 Fix purge for tombstone only REVERT job
When we revert a partition we normally push it off to the specific
primary node for the index of the data files in the partition.  However,
when a partition is devoid of any data files (only tombstones) we build
a REVERT job with a frag_index of None.

This change updates the ECDiskFile's purge method to be robust to
purging tombstones when the frag_index is None.

Add probetest to validate tombstone only revert jobs will clean
themselves up if they can validate they're in-sync with part-replica
count nodes - even if one of the primaries is down (in which case they
sync tombstones with other handoffs to fill in for the primaries)

Change-Id: Ib9a42f412fb90d51959efce886c0f8952aba8d85
2015-09-10 11:07:04 +01:00
paul luse
893f30c61d EC GET path: require fragments to be of same set
And if they are not, exhaust the node iter to go get more.  The
problem without this implementation is a simple overwrite where
a GET follows before the handoff has put the newer obj back on
the 'alive again' node such that the proxy gets n-1 fragments
of the newest set and 1 of the older.

This patch bucketizes the fragments by etag and if it doesn't
have enough continues to exhaust the node iterator until it
has a large enough matching set.

Change-Id: Ib710a133ce1be278365067fd0d6610d80f1f7372
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Closes-Bug: 1457691
2015-08-27 21:09:41 -07:00
Kai Zhang
fa35e38c9f Fix some minor typos
Fixed some typos in function name and comments.

Change-Id: Ida76ab4b331a51b71e57650702acc136e66ba4b2
2015-08-14 16:49:41 -07:00
Darrell Bishop
df134df901 Allow 1+ object-servers-per-disk deployment
Enabled by a new > 0 integer config value, "servers_per_port" in the
[DEFAULT] config section for object-server and/or replication server
configs.  The setting's integer value determines how many different
object-server workers handle requests for any single unique local port
in the ring.  In this mode, the parent swift-object-server process
continues to run as the original user (i.e. root if low-port binding
is required), binds to all ports as defined in the ring, and forks off
the specified number of workers per listen socket.  The child, per-port
servers drop privileges and behave pretty much how object-server workers
always have, except that because the ring has unique ports per disk, the
object-servers will only be handling requests for a single disk.  The
parent process detects dead servers and restarts them (with the correct
listen socket), starts missing servers when an updated ring file is
found with a device on the server with a new port, and kills extraneous
servers when their port is found to no longer be in the ring.  The ring
files are stat'ed at most every "ring_check_interval" seconds, as
configured in the object-server config (same default of 15s).

Immediately stopping all swift-object-worker processes still works by
sending the parent a SIGTERM.  Likewise, a SIGHUP to the parent process
still causes the parent process to close all listen sockets and exit,
allowing existing children to finish serving their existing requests.
The drop_privileges helper function now has an optional param to
suppress the setsid() call, which otherwise screws up the child workers'
process management.

The class method RingData.load() can be told to only load the ring
metadata (i.e. everything except replica2part2dev_id) with the optional
kwarg, header_only=True.  This is used to keep the parent and all
forked off workers from unnecessarily having full copies of all storage
policy rings in memory.

A new helper class, swift.common.storage_policy.BindPortsCache,
provides a method to return a set of all device ports in all rings for
the server on which it is instantiated (identified by its set of IP
addresses).  The BindPortsCache instance will track mtimes of ring
files, so they are not opened more frequently than necessary.

This patch includes enhancements to the probe tests and
object-replicator/object-reconstructor config plumbing to allow the
probe tests to work correctly both in the "normal" config (same IP but
unique ports for each SAIO "server") and a server-per-port setup where
each SAIO "server" must have a unique IP address and unique port per
disk within each "server".  The main probe tests only work with 4
servers and 4 disks, but you can see the difference in the rings for the
EC probe tests where there are 2 disks per server for a total of 8
disks.  Specifically, swift.common.ring.utils.is_local_device() will
ignore the ports when the "my_port" argument is None.  Then,
object-replicator and object-reconstructor both set self.bind_port to
None if server_per_port is enabled.  Bonus improvement for IPv6
addresses in is_local_device().

This PR for vagrant-swift-all-in-one will aid in testing this patch:
https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/

Also allow SAIO to answer is_local_device() better; common SAIO setups
have multiple "servers" all on the same host with different ports for
the different "servers" (which happen to match the IPs specified in the
rings for the devices on each of those "servers").

However, you can configure the SAIO to have different localhost IP
addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the
servers' config files' bind_ip setting.

This new whataremyips() implementation combined with a little plumbing
allows is_local_device() to accurately answer, even on an SAIO.

In the default case (an unspecified bind_ip defaults to '0.0.0.0') as
well as an explict "bind to everything" like '0.0.0.0' or '::',
whataremyips() behaves as it always has, returning all IP addresses for
the server.

Also updated probe tests to handle each "server" in the SAIO having a
unique IP address.

For some (noisy) benchmarks that show servers_per_port=X is at least as
good as the same number of "normal" workers:
https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md

Benchmarks showing the benefits of I/O isolation with a small number of
slow disks:
https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md

If you were wondering what the overhead of threads_per_disk looks like:
https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md

DocImpact

Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6
2015-06-18 12:43:50 -07:00
janonymous
09e7477a39 Replace it.next() with next(it) for py3 compat
The Python 2 next() method of iterators was renamed to __next__() on
Python 3. Use the builtin next() function instead which works on Python
2 and Python 3.

Change-Id: Ic948bc574b58f1d28c5c58e3985906dee17fa51d
2015-06-15 22:10:45 +05:30
Clay Gerrard
a3559edc23 Exclude local_dev from sync partners on failure
If the primary left or right hand partners are down, the next best thing
is to validate the rest of the primary nodes.  Where the rest should
exclude not just the left and right hand partners - but ourself as well.

This fixes a accidental noop when partner node is unavailable and
another node is missing data.

Validation:

Add probetests to cover ssync failures for the primary sync_to nodes for
sync jobs.

Drive-by:

Make additional plumbing for the check_mount and check_dir constraints into
the remaining daemons.

Change-Id: I4d1c047106c242bca85c94b569d98fd59bb255f4
2015-05-26 12:50:31 -07:00
Clay Gerrard
52b102163e Don't apply the wrong Etag validation to rebuilt fragments
Because of the object-server's interaction with ssync sender's
X-Backend-Replication-Headers when a object (or fragment archive) is
pushed unmodified to another node it's ETag value is duped into the
recieving ends metadata as Etag.  This interacts poorly with the
reconstructor's RebuildingECDiskFileStream which can not know ahead of
time the ETag of the fragment archive being rebuilt.

Don't send the Etag from the local source fragment archive being used as
the basis for the rebuilt fragent archive's metadata along to ssync.

Change-Id: Ie59ad93a67a7f439c9a84cd9cff31540f97f334a
2015-04-15 23:33:32 +01:00
paul luse
647b66a2ce Erasure Code Reconstructor
This patch adds the erasure code reconstructor. It follows the
design of the replicator but:
  - There is no notion of update() or update_deleted().
  - There is a single job processor
  - Jobs are processed partition by partition.
  - At the end of processing a rebalanced or handoff partition, the
    reconstructor will remove successfully reverted objects if any.

And various ssync changes such as the addition of reconstruct_fa()
function called from ssync_sender which performs the actual
reconstruction while sending the object to the receiver

Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
blueprint ec-reconstructor
Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51
2015-04-14 00:52:17 -07:00