Doesn't work for anything other than policy 0. updated to allow user
to specify policy name on cmd line (as with object-info) which
then makes populate/report work with 3x, 2x, or EC style policies
Change-Id: Ib7c298f0f6d666b1ecca25315b88539f45cf9f95
Closes-Bug: 1458688
This patch adds the erasure code reconstructor. It follows the
design of the replicator but:
- There is no notion of update() or update_deleted().
- There is a single job processor
- Jobs are processed partition by partition.
- At the end of processing a rebalanced or handoff partition, the
reconstructor will remove successfully reverted objects if any.
And various ssync changes such as the addition of reconstruct_fa()
function called from ssync_sender which performs the actual
reconstruction while sending the object to the receiver
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
blueprint ec-reconstructor
Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51
This is a follow-on from a previous commit which added recon info
for swift-drive-audit (https://review.openstack.org/#/c/122468/).
Here, the "--drievaudit" option is added to swift-recon tool. This
feature gives the statistics for the system-wide drive errors flagged
by swift-drive-audit. An example of the output is as follows:
(verbose mode)
swift-recon --driveaudit -v
===============================================================================
--> Starting reconnaissance on 5 hosts
===============================================================================
[2015-03-11 17:13:39] Checking drive-audit errors
-> http://1.2.3.4:6000/recon/driveaudit: {'drive_audit_errors': 14}
-> http://1.2.3.5:6000/recon/driveaudit: {'drive_audit_errors': 0}
-> http://1.2.3.6:6000/recon/driveaudit: {'drive_audit_errors': 37}
-> http://1.2.3.7:6000/recon/driveaudit: {'drive_audit_errors': 101}
-> http://1.2.3.8:6000/recon/driveaudit: {'drive_audit_errors': 0}
[drive_audit_errors] low: 0, high: 101, avg: 30.4, total: 152, Failed: 0.0%, no_result: 0, reported: 5
===============================================================================
Change-Id: Ia16c52a9d613eeb3de1a5a428d88dd1233631912
The drive-audit detects error log about a device and comments out it
in /etc/fstab. When the error log is generated several times, it
comments out the line for each time.
This patch makes drive-audit to check if the device is already
commented out, and prevents redundant commenting out.
Change-Id: Ia542d35b58552dde0f324bb9c42531f98c9058fa
The replicator already supports --devices and --partitions to restrict
its operation to a subset of devices and partitions. However,
operators don't always want to replicate a partition in all policies
since different policies (usually) have different rings.
For example, if I know that policy 0's partition 1234 is has no
replicas on primary nodes due to over-aggressive rebalancing, I really
want to find a node where the partition isa and make the replicator
push it onto the primaries. However, if I haven't been messing with
policy 1's ring, its partition 1234 is fine. With the existing
replicator args, I get both or neither; this commit lets me get just
the useful one.
Change-Id: Ib1d58fdd228a6ee7865321e65d7c04a891fa5c49
This patch adds console logging ability to swift-drive-audit.
There are cases where logging to console is necessary when drive-audit
is done. This can be consumed for flagging errors in monitoring tools
such as icinga.
DocImpact
Change-Id: Ia1e1effcbd89bd2cf6d5b8c64019f1647c736a3a
After the release of Swift ver. 2.0.0, some recon responses do not
show each policy's information yet. To make things worse, some recon
results only count on policy-0's score, therefore the total is not
shown in the recon results.
With this patch, async_pending count of recon results becomes
policy-aware. Suppose a number of async_pending files for policy-0 is 2
and a number for policy-1 is 3, recon sums up every policy's amount
as follows.
$ curl http://<host>:<port>/recon/async
{"async_pending": 5} # It showed 2 before this commit
Related-Bug: 1375332
Change-Id: Ifc88b8c9e06b9f022a926a87ed807e938e1e0412
This patch adds two new features to swift-drive-audit. The first
is an option in the drive-audit.conf file that allows the operator
to prevent the drives ever being unmounted automatically,
regardless of the amount of errors present. This could be of
benefit in very small systems consisting of only one or two drives
where the operator would like to manually unmount/fix the
particular drive(s) and minimise any potential downtime.
The second is another option in drive-audit.conf that allows the
operator to select a recon directory. This directory will then
have a drive.recon file which will keep an up-to-date record of
the swift drives and any errors associated with them. An example
of the output would be as follows:
{"/srv/node/disk2": "0", "/srv/node/disk3": "25", "/srv/node/disk0": "0",
"/srv/node/disk1": "0", "/srv/node/disk10": "0", "/srv/node/disk7": "0",
"/srv/node/disk4": "137", "/srv/node/disk5": "0", "/srv/node/disk8": "0",
"/srv/node/disk9": "0", "/srv/node/disk6": "0", "/srv/node/disk11": "60"}
This would allow the operator to monitor the errors on the swift
drives without having to spend time searching through logs. Also, if
this is accepted, it should be possible to add an option to
swift-recon that would keep track of this at a system level.
Change-Id: Ib5dacf8622b7363e070c274c7c30c8ead448a055
This change allows the user to use a "--no-overlap" parameter when
running the tool multiple times. It will increase the coverage by
whatever is specified in the dispersion_coverage field of the conf
file in a manner where existing container/objects are left in place
and no partition is populated more than once.
Related-Bug: #1233045
Change-Id: I139fed2f4c967ba18d073b7ecd1e946ed4da1271
In a long-term effort to change the recommended ports for Swift,
the first step is to require the bind_port in config files. Later,
we can change the recommended setting.
Anyone currently explicitly setting the ports will not be affected.
Anyone not setting the ports will need to specify them to match their
rings.
DocImpact
Change-Id: Icca83a263acdd0afc9016424a3e9f8c15e944789
Moved the body of bin/swift-form-signature into
swift/cli/form_signature.py, like was done with swift-ring-builder and
others. Added a couple basic tests; there's not 100% coverage, but
it's better than the 0% coverage we had before.
It's almost a straight forklift, but I changed exit() calls to return
statements.
Change-Id: Ie2f702c070da24d9cdface83b9e838e9e2965085
swift-container-info:
Print policy container info
swift-object-info:
Allow to specify storage policy name when looking for object info
Notify if there is missmatch between ring location and the actual
object path in filesystem
swift-get-nodes:
Allow to specify storage policy name when looking for account/
container/object ring location
Notify if there is missmatch between ring and the policy
Lookup policy name in swift.conf; 'Legacy' container will use
policy-0's name; 'Unknown' is shown if policy not found in swift.conf
DocImpact
Implements: blueprint storage-policies
Change-Id: I450d40dc6e2d8f759187dff36d658e52737ae2a5
This daemon will take objects that are in the wrong storage policy and
move them to the right ones, or delete requests that went to the wrong
storage policy and apply them to the right ones. It operates on a
queue similar to the object-expirer's queue.
Discovering that the object is in the wrong policy will be done in
subsequent commits by the container replicator; this is the daemon
that handles them once they happen.
Like the object expirer, you only need to run one of these per cluster
see etc/container-reconciler.conf.
DocImpact
Implements: blueprint storage-policies
Change-Id: I5ea62eb77ddcbc7cfebf903429f2ee4c098771c9
This allows an easier and more explicit way to tell swift-init to run on
specific servers. For example with an SAIO, this allows you to do
something like:
swift-init object-server.1 reload
to reload just the 1st object server. A more real world example is when
you are running separate servers for replication. In this example you
might have an object-server/public.conf and
object-server/replication.conf. With this change you can do something
like:
swift-init object-server.replication reload
to just reload the replication server.
DocImpact
Change-Id: I5c6046b5ee28e17dadfc5fc53d1d872d9bb8fe48
If you have a path with special characters it may be easier to hand them to
swift-temp-url prequoted than try to escape them on the command line. By the
time common.middleware.tempurl gets ahold of the path it's unquoted so we do
the same before calculating the hmac but still use the pre-quoted path output
to the commandline.
Change-Id: Ia1a9666e487b1e70e4db7cd597bc6a1027e3e918
This is a very simple swift tool to retrieve information
of an account that is located on the storage node.
One can call the tool with a given account db file
as it is stored on the storage node system.
It will then return several information about that account.
Change-Id: Ibfeee790adc000fc177b4b3c03d22ff785fda325
This is a very simple swift tool to retrieve information
of a container that is located on the storage node.
One can call the tool with a given container db file
as it is stored on the storage node system.
It will then return several information about that container.
Change-Id: Ifebaed6c51a9ed5fbc0e7572bb43ef05d7dd254b
In object audit "once" mode we are allowing the user to specify
a sub-set of devices to audit using the "--devices" command-line
option. The sub-set is specified as a comma-separated list. This
patch is taken from a larger patch to enable parallel processing
in the object auditor.
We've had to modify recon so that it will work properly with this
change to "once" mode. We've modified dump_recon_cache()
so that it will store nested dictionaries, in other words it will
store a recon cache entry such as {'key1': {'key2': {...}}}. When
the object auditor is run in "once" mode with "--devices" set the
object_auditor_stats_ALL and ZBF entries look like:
{'object_auditor_stats_ALL': {'disk1disk2..diskn': {...}}}. When
swift-recon is run, it hunts through the nested dicts to find the
appropriate entries. The object auditor recon cache entries are set
to {} at the beginning of each audit cycle, and individual disk
entries are cleared from cache at the end of each disk's audit cycle.
DocImpact
Change-Id: Icc53dac0a8136f1b2f61d5e08baf7b4fd87c8123
Just use import to make scripts available in bin/ instead of
creating these during setup.py install.
Change-Id: I7318bbb77f6564ed58736887e711e1c497873471
Removes the requirement for swiftclient in swift-dispersion-report
and swift-dispersion-populate. To prevent a dependency on
keystoneclient and to avoid reinventing the wheel with an internal
keystoneclient, authentication with keystone is only supported if
swiftclient is available. If not, only auth v1 is supported.
The dependency in swift/container/sync.py has also been removed.
Implements: blueprint remove-swiftclient-dependency
Change-Id: I6ec3b3c85a67b9ab6eb04b90ffc16daf1600e8a7
Add some tests for essential methods in swift-ring-builder.
Tests for removing or changing device settings are executed
with different search values to cover many possible command
line arguments.
Currently tested methods:
- create ring
- add device
- remove device
- set weight
- set info
- set min_part_hours
- set replicas
Tests use swift.common.ring.RingBuilder to verify actions.
Catching and testing output from print statements is not
tested, because this requires redirecting sys.stdout during
tests and that might have some sideeffects for testing tools.
bin/swift-ring-builder has been moved to swift/cli/ringbuilder.py
and slightly modified to work as before (mainly due to no more
existing global variables since that part of the code has been
moved inside a main() function).
Change-Id: Ia63f59a8faca1fad990784f27532ca07a2125454
Fix also minor bug in zone filtering when zone set to 0.
Moved bin/swift-recon to swift/cli/recon.py, which makes
it possible to import it without using some scary hacks.
bin/swift-recon is now created by setup.py install.
Closes-Bug: #1261692
Change-Id: Id0729991c8ece73604467480dbf93fec7d8eb196
Fixes a bug when swift-recon --sockstat is used on hosts without
IPv6 support.
Tested by disabling IPv6 on Ubuntu 12.04 LTS:
Add "ipv6.disable=1" to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub
sudo update-grub
sudo /sbin/reboot
Closes-Bug: 1270711
Change-Id: Ib31a059e412ac68ca0a3faef4201fec7560d1178
swift-dispersion-report tries to avoid checking a partition more than
once, so it keeps track of partitions already queried and skips
duplicates.
swift-dispersion-report also keeps track of the number of successful
responses; it counts the number of expected replicas to find, and also
counts the number of replicas actually found, and tells the operator
if the numbers differ.
However, in the case that a partition was duplicated, the
expected-responses counter was incremented, but the actual check was
skipped, so it looked as though some copies were missing. Now we only
increment the expected-responses counter if we're actually going to
perform the check.
Change-Id: I22ac2b8066b62ca7c8ebf099c9f602118bb1a298
Now swift-object-info has a "-n" option that will cause the etag
verification to be skipped; on large objects, the etag verification
takes the vast majority of the runtime, and sometimes you just want to
know which account owns the 5 GB monstrosity without waiting around while
its checksum is verified.
Change-Id: Id284570633eb7b98046cdb948d7c37a152de1195
If we encounter an exception trying to gather async pendings 'async'
doesn't exist and the cronjob ends up erroring out and leaving behind a
stale lock file.
Change-Id: I70a6d3f00bd2c9ce742e6d16af93804280707040