This has been available since py32 and was backported to py27; there
is no point in us continuing to carry the old idiom forward.
Change-Id: I21f64b8b2970e2dd5f56836f7f513e7895a5dc88
pytest still complains about some 20k warnings, but the vast majority
are actually because of eventlet, and a lot of those will get cleaned up
when upper-constraints picks up v0.33.2.
Change-Id: If48cda4ae206266bb41a4065cd90c17cbac84b7f
This patch makes the reconciler PPI aware. It does this by adding a
helper method `can_reconcile_policy` that is used to check that the
policies used for the source and destination aren't in the middle of a
PPI (their ring doesn't have next_part_power set).
In order to accomplish this the reconciler has had to include the
POLICIES singleton and grown swift_dir and ring_check_interval config options.
Closes-Bug: #1934314
Change-Id: I78a94dd1be90913a7a75d90850ec5ef4a85be4db
"self.assertTrue(policies[1].is_deprecated, True)" and
"self.assertTrue(crashy_calls[0], 1)" are not correct, this is
to fix them.
Change-Id: I7b07f0833d675d2939c910f679b54da2b8cda482
With this commit, each storage policy can define the diskfile to use to
access objects. Selection of the diskfile is done in swift.conf.
Example:
[storage-policy:0]
name = gold
policy_type = replication
default = yes
diskfile = egg:swift#replication.fs
The diskfile configuration item accepts the same format than middlewares
declaration: [[scheme:]egg_name#]entry_point
The egg_name is optional and default to "swift". The scheme is optional
and default to the only valid value "egg". The upstream entry points are
"replication.fs" and "erasure_coding.fs".
Co-Authored-By: Alexandre Lécuyer <alexandre.lecuyer@corp.ovh.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I070c21bc1eaf1c71ac0652cec9e813cadcc14851
If swift-recon/swift-get-nodes/swift-object-info is used with the
swiftdir option they will read rings from the given directory; however
they are still using /etc/swift/swift.conf to find the policies on the
current node.
This makes it impossible to maintain a local swift.conf copy (if you
don't have write access to /etc/swift) or check multiple clusters from
the same node.
Until now swift-recon was also not usable with storage policy aliases,
this patch fixes this as well.
Closes-Bug: 1577582
Closes-Bug: 1604707
Closes-Bug: 1617951
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Change-Id: I13188d42ec19e32e4420739eacd1e5b454af2ae3
Use assertIsNone() instead of assertEqual(), because assertEqual()
still fails on false values when compared to None
Change-Id: Ic52c319e3e55135df834fdf857982e1721bc44bb
We said we were going to do it, we've had two releases saying we'd do
it, we've even backported our saying it to Newton -- let's actually do
it.
Upgrade Consideration
=====================
Erasure-coded storage policies using isa_l_rs_vand and nparity >= 5 must
be configured as deprecated, preventing any new containers from being
created with such a policy. This configuration is known to harm data
durability. Any data in such policies should be migrated to a new
policy. See https://bugs.launchpad.net/swift/+bug/1639691 for more
information.
UpgradeImpact
Related-Change: I50159c9d19f2385d5f60112e9aaefa1a68098313
Change-Id: I8f9de0bec01032d9d9b58848e2a76ac92e65ab09
Closes-Bug: 1639691
From the docs for LogRecord.message [1],
> This is set when Formatter.format() is invoked.
Apparently we may find ourselves in a situation [2] where that never
happens? Really weird that it failed *midway* through the test though;
maybe some concurrent test removed all formatters?
ERROR: test_known_bad_ec_config
(test.unit.common.test_storage_policy.TestStoragePolicies)
----------------------------------------------------------------------
Traceback (most recent call last):
File ".../mock/mock.py", line 1305, in patched
return func(*args, **keywargs)
File ".../test/unit/common/test_storage_policy.py", line 688, in test_known_bad_ec_config
self.assertIn(msg, records[0].message)
AttributeError: 'LogRecord' object has no attribute 'message'
[1] https://docs.python.org/2/library/logging.html#logrecord-attributes
[2] http://logs.openstack.org/59/460359/1/check/gate-swift-tox-xfs-tmp-py27-ubuntu-xenial/5ecc2cb/console.html#_2017-04-27_01_06_43_346096
Change-Id: I8f5ac0ec1195a233f14edc0126de1d1cea7a6e2f
This patch enables efficent PUT/GET for global distributed cluster[1].
Problem:
Erasure coding has the capability to decrease the amout of actual stored
data less then replicated model. For example, ec_k=6, ec_m=3 parameter
can be 1.5x of the original data which is smaller than 3x replicated.
However, unlike replication, erasure coding requires availability of at
least some ec_k fragments of the total ec_k + ec_m fragments to service
read (e.g. 6 of 9 in the case above). As such, if we stored the
EC object into a swift cluster on 2 geographically distributed data
centers which have the same volume of disks, it is likely the fragments
will be stored evenly (about 4 and 5) so we still need to access a
faraway data center to decode the original object. In addition, if one
of the data centers was lost in a disaster, the stored objects will be
lost forever, and we have to cry a lot. To ensure highly durable
storage, you would think of making *more* parity fragments (e.g.
ec_k=6, ec_m=10), unfortunately this causes *significant* performance
degradation due to the cost of mathmetical caluculation for erasure
coding encode/decode.
How this resolves the problem:
EC Fragment Duplication extends on the initial solution to add *more*
fragments from which to rebuild an object similar to the solution
described above. The difference is making *copies* of encoded fragments.
With experimental results[1][2], employing small ec_k and ec_m shows
enough performance to store/retrieve objects.
On PUT:
- Encode incomming object with small ec_k and ec_m <- faster!
- Make duplicated copies of the encoded fragments. The # of copies
are determined by 'ec_duplication_factor' in swift.conf
- Store all fragments in Swift Global EC Cluster
The duplicated fragments increase pressure on existing requirements
when decoding objects in service to a read request. All fragments are
stored with their X-Object-Sysmeta-Ec-Frag-Index. In this change, the
X-Object-Sysmeta-Ec-Frag-Index represents the actual fragment index
encoded by PyECLib, there *will* be duplicates. Anytime we must decode
the original object data, we must only consider the ec_k fragments as
unique according to their X-Object-Sysmeta-Ec-Frag-Index. On decode no
duplicate X-Object-Sysmeta-Ec-Frag-Index may be used when decoding an
object, duplicate X-Object-Sysmeta-Ec-Frag-Index should be expected and
avoided if possible.
On GET:
This patch inclues following changes:
- Change GET Path to sort primary nodes grouping as subsets, so that
each subset will includes unique fragments
- Change Reconstructor to be more aware of possibly duplicate fragments
For example, with this change, a policy could be configured such that
swift.conf:
ec_num_data_fragments = 2
ec_num_parity_fragments = 1
ec_duplication_factor = 2
(object ring must have 6 replicas)
At Object-Server:
node index (from object ring): 0 1 2 3 4 5 <- keep node index for
reconstruct decision
X-Object-Sysmeta-Ec-Frag-Index: 0 1 2 0 1 2 <- each object keeps actual
fragment index for
backend (PyEClib)
Additional improvements to Global EC Cluster Support will require
features such as Composite Rings, and more efficient fragment
rebalance/reconstruction.
1: http://goo.gl/IYiNPk (Swift Design Spec Repository)
2: http://goo.gl/frgj6w (Slide Share for OpenStack Summit Tokyo)
Doc-Impact
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Idd155401982a2c48110c30b480966a863f6bd305
We *know* there are combinations that will prevent decoding, and we're
going to be increasingly aggressive about getting it out of clusters.
Partial-Bug: 1639691
Change-Id: I50159c9d19f2385d5f60112e9aaefa1a68098313
Swift EC has a strong constraint about the ring must have a number of
replicas which fits ec_k + ec_m. That is validated when servers waking
up. However, Swift has more chance to load such an invalid ring when
a request comming, calling some node iteration like get_nodes,
get_part_nodes or so, and no ring validation is there.
This patch moves ring validation from policy validate_ring into the ring
instance as validation_hook that will run at ring reload. Since this patch,
ring instance will allow to use the old ring if the reload is not fourced.
Note that the exception if invalid ring found was changed from
RingValidationError to RingLoadError because RingValidationError is a
child of RingBuilderError but the ring reload is obviously outside of
"builder".
Closes-Bug: #1534572
Change-Id: I6428fbfb04e0c79679b917d5e57bd2a34f2a0875
ECStoragePolicy.fragment_size is never changed on running Swift because
it is from ec_segment_size and ec_type defined in swift.conf statically
so let's cache the value after retrieving the value from the pyeclib driver.
And more, pyeclib <= 1.2.1 (current newest) has a bug [1] to leak the reference
count of the items in the returned dict (i.e. causes memory leak) so that
this caching will be mitigation of the memory leak because this saves the call
count fewer than current as possible.
Note that the complete fix for the memory leak for pyeclib is proposed at
https://review.openstack.org/#/c/344066/
1: https://bugs.launchpad.net/pyeclib/+bug/1604335
Related-Bug: #1604335
Change-Id: I6bbaa4063dc462383c949764b6567b2bee233689
Requiring 2/2 backends for PUT requests means that the cluster can't
tolerate a single failure. Likewise, if you have 4 replicas in 2
regions, requiring 3/4 on a POST request means you cannot POST with
your inter-region link down or congested.
This changes the (replication) quorum size in the proxy to be at least
half the nodes instead of a majority of the nodes.
Daemons that were looking for a majority remain unchanged. The
container reconciler, replicator, and updater still require majorities
so their functioning is unchanged.
Odd numbers of replicas are unaffected by this commit.
Change-Id: I3b07ff0222aba6293ad7d60afe1747acafbe6ce4
_validate_policy_name always either returns True or raises an exception.
Simplify it to just being a callable that may raise an exception.
Also, move the check for blank/None names into _validate_policy_name, so
it will be applied in more cases.
Change-Id: I7832a0c9c895cd75ba4c6d0e8b5568a3c8a0ea25
This patch alters storage_policy.py to allow storage policies
to have multiple names. Now users are able to add a number of
human-readable aliases for storage policies. Policies now have
a .name (the default name), .aliases (a string of comma
seperated aliases), and .aliases_list (a list of all human
readable names). Policies will always have an .aliases value
if no aliases are set it will contain the default name.
The policy docs and tests have been updated to reflect changes
and policy.get_policy_info has been altered to display the
name and aliases
Change-Id: I02967ca8d7c790595e5ee551581196aa64552eea
To minimize external library dependencies for Swift unit
tests and SAIO, PyECLib 1.1.1 introduces a native backend
'liberasurecode_rs_vand.' This patch is to migrate over
the unit tests to the new ec_type when available.
This change will work with current pyeclib requirements
(==1.0.7) and also future requirements (>=1.0.7).
When we're able to raise *our* requirements to >=1.1.1 we
should remove jerasure from the list of preferred backends.
Related SAIO doc and example config changes should be
included with that patch.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Idf657f0acf0479bc8158972e568a29dbc08eaf3b
assertEquals is deprecated in py3, replacing it.
Change-Id: Ida206abbb13c320095bb9e3b25a2b66cc31bfba8
Co-Authored-By: Ondřej Nový <ondrej.novy@firma.seznam.cz>
Get configparser, queue, http_client modules from six.moves.
Patch generated by the six_moves operation of the sixer tool:
https://pypi.python.org/pypi/sixer
Change-Id: I666241ab50101b8cc6f992dd80134ce27327bd7d
The assert_() method is deprecated and can be safely replaced by assertTrue().
This patch makes sure that running the tests does not create undesired
warnings.
Change-Id: I0602ba39ef93263386644ee68088d5f65fcb4a71
* replace "from cStringIO import StringIO"
with "from six.moves import cStringIO as StringIO"
* replace "from StringIO import StringIO"
with "from six import StringIO"
* replace "import cStringIO" and "cStringIO.StringIO()"
with "from six import moves" and "moves.cStringIO()"
* replace "import StringIO" and "StringIO.StringIO()"
with "import six" and "six.StringIO()"
This patch was generated by the stringio operation of the sixer tool:
https://pypi.python.org/pypi/sixer
Change-Id: Iacba77fec3045f96773d1090c0bd48613729a561
Enabled by a new > 0 integer config value, "servers_per_port" in the
[DEFAULT] config section for object-server and/or replication server
configs. The setting's integer value determines how many different
object-server workers handle requests for any single unique local port
in the ring. In this mode, the parent swift-object-server process
continues to run as the original user (i.e. root if low-port binding
is required), binds to all ports as defined in the ring, and forks off
the specified number of workers per listen socket. The child, per-port
servers drop privileges and behave pretty much how object-server workers
always have, except that because the ring has unique ports per disk, the
object-servers will only be handling requests for a single disk. The
parent process detects dead servers and restarts them (with the correct
listen socket), starts missing servers when an updated ring file is
found with a device on the server with a new port, and kills extraneous
servers when their port is found to no longer be in the ring. The ring
files are stat'ed at most every "ring_check_interval" seconds, as
configured in the object-server config (same default of 15s).
Immediately stopping all swift-object-worker processes still works by
sending the parent a SIGTERM. Likewise, a SIGHUP to the parent process
still causes the parent process to close all listen sockets and exit,
allowing existing children to finish serving their existing requests.
The drop_privileges helper function now has an optional param to
suppress the setsid() call, which otherwise screws up the child workers'
process management.
The class method RingData.load() can be told to only load the ring
metadata (i.e. everything except replica2part2dev_id) with the optional
kwarg, header_only=True. This is used to keep the parent and all
forked off workers from unnecessarily having full copies of all storage
policy rings in memory.
A new helper class, swift.common.storage_policy.BindPortsCache,
provides a method to return a set of all device ports in all rings for
the server on which it is instantiated (identified by its set of IP
addresses). The BindPortsCache instance will track mtimes of ring
files, so they are not opened more frequently than necessary.
This patch includes enhancements to the probe tests and
object-replicator/object-reconstructor config plumbing to allow the
probe tests to work correctly both in the "normal" config (same IP but
unique ports for each SAIO "server") and a server-per-port setup where
each SAIO "server" must have a unique IP address and unique port per
disk within each "server". The main probe tests only work with 4
servers and 4 disks, but you can see the difference in the rings for the
EC probe tests where there are 2 disks per server for a total of 8
disks. Specifically, swift.common.ring.utils.is_local_device() will
ignore the ports when the "my_port" argument is None. Then,
object-replicator and object-reconstructor both set self.bind_port to
None if server_per_port is enabled. Bonus improvement for IPv6
addresses in is_local_device().
This PR for vagrant-swift-all-in-one will aid in testing this patch:
https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/
Also allow SAIO to answer is_local_device() better; common SAIO setups
have multiple "servers" all on the same host with different ports for
the different "servers" (which happen to match the IPs specified in the
rings for the devices on each of those "servers").
However, you can configure the SAIO to have different localhost IP
addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the
servers' config files' bind_ip setting.
This new whataremyips() implementation combined with a little plumbing
allows is_local_device() to accurately answer, even on an SAIO.
In the default case (an unspecified bind_ip defaults to '0.0.0.0') as
well as an explict "bind to everything" like '0.0.0.0' or '::',
whataremyips() behaves as it always has, returning all IP addresses for
the server.
Also updated probe tests to handle each "server" in the SAIO having a
unique IP address.
For some (noisy) benchmarks that show servers_per_port=X is at least as
good as the same number of "normal" workers:
https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md
Benchmarks showing the benefits of I/O isolation with a small number of
slow disks:
https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md
If you were wondering what the overhead of threads_per_disk looks like:
https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md
DocImpact
Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6
This patch extends the StoragePolicy class for non-replication storage
policies, the first one being "erasure coding".
Changes:
- Add 'policy_type' support to BaseStoragePolicy class
- Disallow direct instantiation of BaseStoragePolicy class
- Subclass BaseStoragePolicy
- "StoragePolicy":
. Replication policy, default
. policy_type = 'replication'
- "ECStoragePolicy":
. Erasure Coding policy
. policy_type = 'erasure_coding'
. Private member variables
ec_type (EC backend),
ec_num_data_fragments (number of fragments original
data split into after erasure coding operation),
ec_num_parity_fragments (number of parity fragments
generated during erasure coding)
. Private methods
EC specific attributes and ring validator methods.
- Swift will use PyECLib, a Python Erasure Coding library, for
erasure coding operations. PyECLib is already an approved
OpenStack core requirement.
(https://bitbucket.org/kmgreen2/pyeclib/)
- Add test cases for
- 'policy_type' StoragePolicy member
- policy_type == 'erasure_coding'
DocImpact
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: Ie0e09796e3ec45d3e656fb7540d0e5a5709b8386
Implements: blueprint ec-proxy-work
Currently the in-process tests build a 2 replica, 4 partition,
2 device object ring. This patch allows an alternative policy
and ring to be specified for testing via environment variables
that may optionally be set.
SWIFT_TEST_IN_PROCESS_CONF_DIR - This points the test setup to a
directory which may have a swift.conf file and ring file. The
test setup will then prefer these conf files over the samples
in '/etc'.
SWIFT_TEST_POLICY - This causes the in-process test to
use the specified policy from the swift.conf file and its
associated ring for testing (first copying the conf and ring file
and modifying device parameters to suit in-process testing). If
not set, the tests will use the default policy.
The in-process tests now start sufficient object servers for the
ring file being tested against.
This should allow in-process functional testing of various policies
and rings (e.g. EC policies) without needing to reconfigure an SAIO
for each test scenario.
The refactoring of the in_process test setup code should also
allow easier addition of other 'hard-coded' test policies/rings
in the future.
Change-Id: I24f5a13de3d296b400da1691dcb53423a9f8a463
The basic idea here is to replace the use of a single object ring in
the Application class with a collection of object rings. The
collection includes not only the Ring object itself but the policy
name associated with it, the filename for the .gz and any other
metadata associated with the policy that may be needed. When
containers are created, a policy (thus a specific obj ring) is
selected allowing apps to specify policy at container creation time
and leverage policies simply by using different containers for object
operations.
The policy collection is based off of info in the swift.conf file.
The format of the sections in the .conf file is as follows:
swift.conf format:
[storage-policy:0]
name = chicken
[storage-policy:1]
name = turkey
default = yes
With the above format:
- Policy 0 will always be used for access to existing containers
without the policy specified. The ring name for policy 0 is always
'object', assuring backwards compatiblity. The parser will always
create a policy 0 even if not specified
- The policy with 'default=yes' is the one used for new container
creation. This allows the admin to specify which policy is used without
forcing the application to add the metadata.
This commit simply introduces storage policies and the loading
thereof; nobody's using it yet. That will follow in subsequent
commits.
Expose storage policies in /info
DocImpact
Implements: blueprint storage-policies
Change-Id: Ica05f41ecf3adb3648cc9182f11f1c8c5c678985