swift/doc/source/overview_stats.rst
Clay Gerrard 5d0bc6b9c7 logging refactor to support proxy access logs
New log level "notice" set to python log level 25 maps to syslog priority
LOG_NOTICE.  Used for some messages in the proxy server, but will be available
to all apps using the LogAdapter returned from get_logger.  Cleaned up some
code in get_logger so that console logging works with log_routes and removed
some unneeded bits.  NamedFormatter functionality was split between LogAdapter
(which now inherits from logging.LoggerAdapter) and TxnFormatter (which now is
only responsible for adding the log records txn_id).

The proxy server app now configures a separate logger for access line logging.
By default it will use the same settings as the regular proxy logger.
2011-02-10 14:59:52 -06:00

7.2 KiB

Swift stats system

The swift stats system is composed of three parts parts: log creation, log uploading, and log processing. The system handles two types of logs (access and account stats), but it can be extended to handle other types of logs.

Log Types

Access logs

Access logs are the proxy server logs. Rackspace uses syslog-ng to redirect the proxy log output to an hourly log file. For example, a proxy request that is made on August 4, 2010 at 12:37 gets logged in a file named 2010080412. This allows easy log rotation and easy per-hour log processing.

Account stats logs

Account stats logs are generated by a stats system process. swift-account-stats-logger runs on each account server (via cron) and walks the filesystem looking for account databases. When an account database is found, the logger selects the account hash, bytes_used, container_count, and object_count. These values are then written out as one line in a csv file. One csv file is produced for every run of swift-account-stats-logger. This means that, system wide, one csv file is produced for every storage node. Rackspace runs the account stats logger every hour. Therefore, in a cluster of ten account servers, ten csv files are produced every hour. Also, every account will have one entry for every replica in the system. On average, there will be three copies of each account in the aggregate of all account stat csv files created in one system-wide run.

Log Processing plugins

The swift stats system is written to allow a plugin to be defined for every log type. Swift includes plugins for both access logs and storage stats logs. Each plugin is responsible for defining, in a config section, where the logs are stored on disk, where the logs will be stored in swift (account and container), the filename format of the logs on disk, the location of the plugin class definition, and any plugin-specific config values.

The plugin class definition defines three methods. The constructor must accept one argument (the dict representation of the plugin's config section). The process method must accept an iterator, and the account, container, and object name of the log. The keylist_mapping accepts no parameters.

Log Uploading

swift-log-uploader accepts a config file and a plugin name. It finds the log files on disk according to the plugin config section and uploads them to the swift cluster. This means one uploader process will run on each proxy server node and each account server node. To not upload partially-written log files, the uploader will not upload files with an mtime of less than two hours ago. Rackspace runs this process once an hour via cron.

Log Processing

swift-log-stats-collector accepts a config file and generates a csv that is uploaded to swift. It loads all plugins defined in the config file, generates a list of all log files in swift that need to be processed, and passes an iterable of the log file data to the appropriate plugin's process method. The process method returns a dictionary of data in the log file keyed on (account, year, month, day, hour). The log-stats-collector process then combines all dictionaries from all calls to a process method into one dictionary. Key collisions within each (account, year, month, day, hour) dictionary are summed. Finally, the summed dictionary is mapped to the final csv values with each plugin's keylist_mapping method.

The resulting csv file has one line per (account, year, month, day, hour) for all log files processed in that run of swift-log-stats-collector.

Running the stats system on SAIO

  1. Create a swift account to use for storing stats information, and note the account hash. The hash will be used in config files.

  2. Install syslog-ng:

    sudo apt-get install syslog-ng
  3. Add the following to the end of `/etc/syslog-ng/syslog-ng.conf`:

    # Added for swift logging
    destination df_local1 { file("/var/log/swift/proxy.log" owner(<username>) group(<groupname>)); };
    destination df_local1_err { file("/var/log/swift/proxy.error" owner(<username>) group(<groupname>)); };
    destination df_local1_hourly { file("/var/log/swift/hourly/$YEAR$MONTH$DAY$HOUR" owner(<username>) group(<groupname>)); };
    filter f_local1 { facility(local1) and level(info); };
    
    filter f_local1_err { facility(local1) and not level(info); };
    
    # local1.info                        -/var/log/swift/proxy.log
    # write to local file and to remove log server
    log {
            source(s_all);
            filter(f_local1);
            destination(df_local1);
            destination(df_local1_hourly);
    };
    
    # local1.error                        -/var/log/swift/proxy.error
    # write to local file and to remove log server
    log {
            source(s_all);
            filter(f_local1_err);
            destination(df_local1_err);
    };
  4. Restart syslog-ng

  5. Create the log directories:

    mkdir /var/log/swift/hourly
    mkdir /var/log/swift/stats
    chown -R <username>:<groupname> /var/log/swift
  6. Create `/etc/swift/log-processor.conf`:

    [log-processor]
    swift_account = <your-stats-account-hash>
    user = <your-user-name>
    
    [log-processor-access]
    swift_account = <your-stats-account-hash>
    container_name = log_data
    log_dir = /var/log/swift/hourly/
    source_filename_format = %Y%m%d%H
    class_path = swift.stats.access_processor.AccessLogProcessor
    user = <your-user-name>
    
    [log-processor-stats]
    swift_account = <your-stats-account-hash>
    container_name = account_stats
    log_dir = /var/log/swift/stats/
    source_filename_format = %Y%m%d%H_*
    class_path = swift.stats.stats_processor.StatsLogProcessor
    account_server_conf = /etc/swift/account-server/1.conf
    user = <your-user-name>
  7. Add the following under [app:proxy-server] in `/etc/swift/proxy-server.conf`:

    log_facility = LOG_LOCAL1
  8. Create a cron job to run once per hour to create the stats logs. In `/etc/cron.d/swift-stats-log-creator`:

    0 * * * * <your-user-name> swift-account-stats-logger /etc/swift/log-processor.conf
  9. Create a cron job to run once per hour to upload the stats logs. In `/etc/cron.d/swift-stats-log-uploader`:

    10 * * * * <your-user-name> swift-log-uploader /etc/swift/log-processor.conf stats
  10. Create a cron job to run once per hour to upload the access logs. In `/etc/cron.d/swift-access-log-uploader`:

    5 * * * * <your-user-name> swift-log-uploader /etc/swift/log-processor.conf access
  11. Create a cron job to run once per hour to process the logs. In `/etc/cron.d/swift-stats-processor`:

    30 * * * * <your-user-name> swift-log-stats-collector /etc/swift/log-processor.conf

After running for a few hours, you should start to see .csv files in the log_processing_data container in the swift stats account that was created earlier. This file will have one entry per account per hour for each account with activity in that hour. One .csv file should be produced per hour. Note that the stats will be delayed by at least two hours by default. This can be changed with the new_log_cutoff variable in the config file. See log-processing.conf-sample for more details.