7 Commits

Author SHA1 Message Date
Keshava Bharadwaj
1ffe6b3953 Adds console logging to swift-drive-audit
This patch adds console logging ability to swift-drive-audit.
There are cases where logging to console is necessary when drive-audit
is done. This can be consumed for flagging errors in monitoring tools
such as icinga.

DocImpact
Change-Id: Ia1e1effcbd89bd2cf6d5b8c64019f1647c736a3a
2014-12-18 14:19:12 +05:30
Lorcan
cb20763893 Add new features to swift-drive-audit
This patch adds two new features to swift-drive-audit. The first
is an option in the drive-audit.conf file that allows the operator
to prevent the drives ever being unmounted automatically,
regardless of the amount of errors present. This could be of
benefit in very small systems consisting of only one or two drives
where the operator would like to manually unmount/fix the
particular drive(s) and minimise any potential downtime.

The second is another option in drive-audit.conf that allows the
operator to select a recon directory. This directory will then
have a drive.recon file which will keep an up-to-date record of
the swift drives and any errors associated with them. An example
of the output would be as follows:

{"/srv/node/disk2": "0", "/srv/node/disk3": "25", "/srv/node/disk0": "0",
"/srv/node/disk1": "0", "/srv/node/disk10": "0", "/srv/node/disk7": "0",
"/srv/node/disk4": "137", "/srv/node/disk5": "0", "/srv/node/disk8": "0",
"/srv/node/disk9": "0", "/srv/node/disk6": "0", "/srv/node/disk11": "60"}

This would allow the operator to monitor the errors on the swift
drives without having to spend time searching through logs. Also, if
this is accepted, it should be possible to add an option to
swift-recon that would keep track of this at a system level.

Change-Id: Ib5dacf8622b7363e070c274c7c30c8ead448a055
2014-09-25 16:27:25 +01:00
Matthew Oliver
090baa1fa9 Swift configuration parameter audit
This change is the result of an audit through the config parameters
provided by swift and how/if they are addressed in the swift
documentation. The documentation being the sample config files in
the /etc directory or the documentation.

This change is only concerned with the config files in etc/ next
I will look at the documentation in the doc/ folder.

This change makes the following assumptions:
  - Unless stated otherwise, the commented out parameter in the
    sample configuration is the default for swift.

  - When the default in the code differs from that of the sample
    configuration, the default in the code is correct.

Container reconciler:
  Parameter: interval
    - code: 30
    - config: 300
  Result: config = 30

Object Expirer:
  Parameter: recon_cache_path
    - code: /var/cache/swift
    - config: Parameter missing
  Result: Add parameter

swift-dispersion-populate && swift-dispersion-report
  Parameter: auth_version
    - code: 1.0
    - config: 2.0 (due to being a confusing example of how to setup
                   version 2.0).
  Result: Added 'auth_version = 1.0' to the right section (showing
          default and make the sample configuration for auth version
          2.0 easier to understand.

swift-drive-audit:
  Parameter: log_file_pattern
    - code: /var/log/kern.*[!.][!g][!z]
    - config: /var/log/kern*
  Result: config = /var/log/kern.*[!.][!g][!z]

  NOTE: swift-drive-audit uses a parameter called device_dir which
        defaults to '/srv/node'. In all other swift binaries/services
        there is a similar parameter called devices which stores the
        same thing. This is an inconsistency which I haven't fixed
        as this could break existing swift clusters out in the wild.

Proxy Server:
  Parameter: object_chunk_size
    - code: 65536
    - config: 8192
  Result: config = 65536

  Parameter: client_chunk_size
    - code: 65536
    - config: 8192
  Result: config = 65536

  Parameter: strict_cors_mode
    - code: True
    - config: No parameter
  Result: config = True

Account and Container replicator configuration confusion:
  NOTES:
    The account and container replicators have parameters:
      - interval
      - run_pause

    Both of these are loaded into the same variable in code:
      self.interval = int(conf.get('interval') or
                          conf.get('run_pause') or 30)

    If a user sets both to different values then interval is used.
  Result: Update the configuration to make this more clear.

DocImpact
Change-Id: Iaadbb1a6284f8b3e0801bc343b29772f70f4bf6e
2014-08-06 11:12:14 +10:00
gholt
2d00f7b7ba New log_max_line_length option.
Log lines can get quite large, as we previously noticed with rsync error
log lines. We added a setting to cap those, but it really looks like we
should have just done this overall limit. We noticed the issue when we
switched to UDP syslogging and it would occasionally blow past the 16436
lo MTU! This causes Python's logging code to get an error and hilarity
ensues.

Change-Id: I44bdbe68babd58da58c14360379e8fef8a6b75f7
2014-05-22 20:30:34 +00:00
Marcelo Martins
d2dd3e5488 Configuration options for error regex and log file in the config now
Making it possible for one to overwrite the default set of regexes
used to search for device block errors in the log file. Also making
the log file naming pattern configurable by setting them in the
drive-audit.conf file.

Updating "Detecting Failed Drives" section on the admin guide as well.

Change-Id: I7bd3acffed196da3e09db4c9dcbb48a20bdd1cf0
2013-07-23 07:24:29 -05:00
Victor Rodionov
13e4de1899 Patch for Swift Solaris (Illumos) compability.
* Add new configuration option log_address.

Change-Id: I636bd4116687629c997b70a0d804b7ed4bc46032
2012-06-19 15:38:56 +04:00
Chuck Thier
001407b969 Initial commit of Swift code 2010-07-12 17:03:45 -05:00