This change introduces grafana into the stack which gives us a great
way to visualize the data. The grafana role from cloudalchemy is being
used for the bulk of the deployment.
Because the grafana deployment playbook is now standalone the mentions
of grafana in the other ops directories have been removed.
Change-Id: I23e1c96cd1fda7ece9b86a69f9f0326913de714d
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
When playbook-influx-telegraf.yml runs, it uses roles from mgrzybek
openstack-ansible-telegraf repo. Playbooks from that repo loads
search scripts in different dirs and reading a source path.
Change-Id: Ib1ca9f60ad5e686790b56e1c66ab53ed9cc490b7
InfluxDB is usually already installed at this point
(running playbook-influx-db.yml playbook), so in most cases using
such host as default output avoids to specify additional information.
Change-Id: Iac5e16c3d24a74119ea2179ecc3e5273de20676e
The non_negative_derivative function applied to the timing_counter
based graphs is replaced by the mean function, since these
timing_counter are not exactly behaving like traditional counters.
Change-Id: I5e2e5cdd2d04f469853f59f96da68839830bb359
Currently the only outputs plugin used is influxdb. This adds
prometheus_client outputs plugin and its directives as options.
The telegraf.conf.j2 template has been updated to check for the
outputs_prometheus_client variable and two other related variables.
New variable names and default values are shown in vars.yml.
Change-Id: I8d9380a4cc2ea58b4ad98b9fc964d45ff82090ed
Signed-off-by: Melvin Hillsman <mrhillsman@gmail.com>
The run_int interval is being set to True. This patch ensures that
the interval is returned as an integer.
Change-Id: I1faf616b16e9da3dac45bec9e1ec3ca098563552
Signed-off-by: mrhillsman <mrhillsman@gmail.com>
The vm_quota.py is missing as telegraf plugin but still
referenced inside the playbook-influx-telegraf.yml
playbook.
Additionally the my.cnf is not necessary to be present
on the telegraf hosts/containers, in order to function.
The override influxdb_protocol exposes the protocol to
be used for communicating with influxdb, usually HTTP
Change-Id: I90226d02e82d2516be4a4d84baff22e46ce709fb
All counter based graphs inside the the Swift dashboard are fixed
and now correctly showing the timings per second rather than total
values per telegraf flush interval.
Additionally the High Response Time graphs are now using the timing_upper
metrics.
Minor issues inside the playbook-influx-telegraf.yml and telegraf.conf.j2
are fixed to support deployments without optional componentns like ironic,
cinder etc.
Change-Id: I0ac0d2004416cae7a6d137d98ab685b7abc22d3f
Provide initial version of a grafana OpenStack Swift Dashboard for the
Swift Proxy Server. The metrics are gathered by the built-in statsd
functionality of Swift and are forwarded via local telegraf daemons
to the influxDB.
Change-Id: Ieb7df97fbc7534e34ebde5a5fe365ff479de81fe
This creates a specific slice which all OpenStack services will operate
from. By creating an independent slice these components will be governed
away from the system slice allowing us to better optimise resource
consumption.
See the following for more information on slices:
* https://www.freedesktop.org/software/systemd/man/systemd.slice.html
See for following for more information on resource controls:
* https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html
Tools like ``systemd-cgtop`` and ``systemd-cgls`` will now give us
insight into specific processes, process groups, and resouce consumption
in ways that we've not had access to before. To enable some of this reporting
the accounting options have been added to the [Service] section of the unit
file.
Change-Id: Ife2e28ce6b3e0d0219b8a5ec2ca8d9dbe513d5a7
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change adds the cinder storage pools data to the influx metric
collection system as a plugin.
Change-Id: I632b53aa09d69c6df28b86988629242a26ab9b50
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Added kapacitor tickscripts to trigger alerts based on certain
thresholds.
Change-Id: I66d1b1e58d279405637d9a2f06b3aae19fa29cc3
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The KVM virsh plugin already existed however the setup was not using the
new playbook plugin system. This change moves the kvm vish plugin into
that system and updates the plugin to use the influxdb line format
instead of the json format which was recently deprecated.
Change-Id: Ib23a0a231044389aab5669dc0c467175cd220423
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change adds a second plugin to the telegraf setup. A change is
being made to the telegraf config file to allow for more than one
external plugin to be executed and to allow for full plugin execution
between telegraf reporting intervals.
Each plugin will potentially account for up to 8 seconds of runtime with
the telegraf agent now using a dynamic reporting interval based on the
number of plugins a given agent is needing to execute.
Change-Id: I652e8e2f13bd4fb9135280b76f2344177a14eaf7
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
With ansible 2.2, apt_repository update_cache feature has
been fixed. When a new repo will be added, apt-get update
will be run after the addition if update_cache is set to yes.
This combined with the apt module now properly checking the
cache validity, we can now have proper updating of the cache
with registering variables.
Change-Id: Ic9788156a88223dc0d27fafa2a798f396135f990
This changes 'ansible_ssh_host' to 'ansible_host'. The 'ansible_ssh_host'
variable has been deprecated as noted here: [0].
[0] - http://docs.ansible.com/ansible/intro_inventory.html#hosts-and-groups
Change-Id: Ie34bb924b55d4e1c7b4568c2eadd2a7a1a60a821
Related-Bug: #1636606
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Added a playbook to deploy an alerting tool, Kapacitor that can work
with influxdb. Updated readme to demonstrate how to deploy Kapacitor.
Kapacitor can be used to trigger alerts based on some uncertain
events. It subscribes to influxdb to collect data.
General Flow:
Telegraf -> InfluxDb -> Grafana
Telegraf -> InfluxDb -> Kapacitor
Change-Id: I5c400cf9efbda43bb5cb7a9bbd890435e74127f3
This change implements metric collection system using influxdata
(influxdb and telegraf) with visulization using grafana. No
Dashboard automation is provided at this time however a template
dashboard can be used by importing the JSON files from the
dashboards directory.
Change-Id: I5445b01170054393a31afc2a20ffb3ea4eda1209
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>