This patch adds console logging ability to swift-drive-audit.
There are cases where logging to console is necessary when drive-audit
is done. This can be consumed for flagging errors in monitoring tools
such as icinga.
DocImpact
Change-Id: Ia1e1effcbd89bd2cf6d5b8c64019f1647c736a3a
This patch adds two new features to swift-drive-audit. The first
is an option in the drive-audit.conf file that allows the operator
to prevent the drives ever being unmounted automatically,
regardless of the amount of errors present. This could be of
benefit in very small systems consisting of only one or two drives
where the operator would like to manually unmount/fix the
particular drive(s) and minimise any potential downtime.
The second is another option in drive-audit.conf that allows the
operator to select a recon directory. This directory will then
have a drive.recon file which will keep an up-to-date record of
the swift drives and any errors associated with them. An example
of the output would be as follows:
{"/srv/node/disk2": "0", "/srv/node/disk3": "25", "/srv/node/disk0": "0",
"/srv/node/disk1": "0", "/srv/node/disk10": "0", "/srv/node/disk7": "0",
"/srv/node/disk4": "137", "/srv/node/disk5": "0", "/srv/node/disk8": "0",
"/srv/node/disk9": "0", "/srv/node/disk6": "0", "/srv/node/disk11": "60"}
This would allow the operator to monitor the errors on the swift
drives without having to spend time searching through logs. Also, if
this is accepted, it should be possible to add an option to
swift-recon that would keep track of this at a system level.
Change-Id: Ib5dacf8622b7363e070c274c7c30c8ead448a055
This change is the result of an audit through the config parameters
provided by swift and how/if they are addressed in the swift
documentation. The documentation being the sample config files in
the /etc directory or the documentation.
This change is only concerned with the config files in etc/ next
I will look at the documentation in the doc/ folder.
This change makes the following assumptions:
- Unless stated otherwise, the commented out parameter in the
sample configuration is the default for swift.
- When the default in the code differs from that of the sample
configuration, the default in the code is correct.
Container reconciler:
Parameter: interval
- code: 30
- config: 300
Result: config = 30
Object Expirer:
Parameter: recon_cache_path
- code: /var/cache/swift
- config: Parameter missing
Result: Add parameter
swift-dispersion-populate && swift-dispersion-report
Parameter: auth_version
- code: 1.0
- config: 2.0 (due to being a confusing example of how to setup
version 2.0).
Result: Added 'auth_version = 1.0' to the right section (showing
default and make the sample configuration for auth version
2.0 easier to understand.
swift-drive-audit:
Parameter: log_file_pattern
- code: /var/log/kern.*[!.][!g][!z]
- config: /var/log/kern*
Result: config = /var/log/kern.*[!.][!g][!z]
NOTE: swift-drive-audit uses a parameter called device_dir which
defaults to '/srv/node'. In all other swift binaries/services
there is a similar parameter called devices which stores the
same thing. This is an inconsistency which I haven't fixed
as this could break existing swift clusters out in the wild.
Proxy Server:
Parameter: object_chunk_size
- code: 65536
- config: 8192
Result: config = 65536
Parameter: client_chunk_size
- code: 65536
- config: 8192
Result: config = 65536
Parameter: strict_cors_mode
- code: True
- config: No parameter
Result: config = True
Account and Container replicator configuration confusion:
NOTES:
The account and container replicators have parameters:
- interval
- run_pause
Both of these are loaded into the same variable in code:
self.interval = int(conf.get('interval') or
conf.get('run_pause') or 30)
If a user sets both to different values then interval is used.
Result: Update the configuration to make this more clear.
DocImpact
Change-Id: Iaadbb1a6284f8b3e0801bc343b29772f70f4bf6e
Log lines can get quite large, as we previously noticed with rsync error
log lines. We added a setting to cap those, but it really looks like we
should have just done this overall limit. We noticed the issue when we
switched to UDP syslogging and it would occasionally blow past the 16436
lo MTU! This causes Python's logging code to get an error and hilarity
ensues.
Change-Id: I44bdbe68babd58da58c14360379e8fef8a6b75f7
Making it possible for one to overwrite the default set of regexes
used to search for device block errors in the log file. Also making
the log file naming pattern configurable by setting them in the
drive-audit.conf file.
Updating "Detecting Failed Drives" section on the admin guide as well.
Change-Id: I7bd3acffed196da3e09db4c9dcbb48a20bdd1cf0