22 Commits

Author SHA1 Message Date
Kevin Carter
9c9efd9eb5 Change the q_mem and h_mem to lower and upper limits
This change removes the {h,q}_mem options in favor of a new variable which
clearly states the upper and lower limits for a given deployment. This change
also makes these options a lot more conservative by default which will allow
the deployment to better run on shared infra.

Change-Id: I169f457198c11edc4881a04df65312f6c4f67feb
Signed-off-by: Kevin Carter <kevin@cloudnull.com>
2019-02-14 08:50:29 -06:00
cloudnull
03d25dce3d
Add logstash ingestion for collectd
This change will allow logstash to ingest metrics from collectd. New
options have been added to enable the deployment and configure it.

Change-Id: I995c0db69fc68d5f5bcae27ce16956876368e2a8
Signed-off-by: cloudnull <kevin@cloudnull.com>
2019-02-11 00:13:37 -06:00
Kevin Carter
6017fc0e89 Add the ability to set the JVM heap size
This change makes it possible for users to set the `elastic_heap_size_default`
value. Before this change, the option was unreachable due to a series of facts
ganerated template values. The options `elastic_heap_size` or `logstash_heap_size`
have also been exposed giving deployers the ability to define service specific
heap sizes as needed.

Change-Id: Ida3a57fdcff388f8e4bb3f325b787205a6183970
Signed-off-by: Kevin Carter <kevin@cloudnull.com>
2019-01-30 09:53:20 -06:00
Kevin Carter
82cc72e166 Read the path for the logstash queue path
The queue path within logstash may be a symlink which will fail to mount
as tempfs. To ensure queue path can be tempfs, a readlink command is
used to fetch the true path, which will be used in a mount when nessisary.

Change-Id: I5fe6bf311e0621c98766ae458371b5f11f89a61f
Signed-off-by: Kevin Carter <kevin@cloudnull.com>
2019-01-25 13:53:41 -06:00
Kevin Carter
7491b6df8e Update the embedded-ansible-setup process to be configurable
This change allows the embedded ansible process to be configurable by
the end user.
  * Python requirements and ansible roles will all now be user
    configurable.
  * Setup is now a local only playbook. This playbook replaces the bash
    commands we were rerunning when the `bootstrap-embedded-ansible.sh`
    script was executed.
  * Embedded ansible version is now 2.7.5 as default.
  * Deprecation warnings have been resolved.
  * Tests impacted by this change have been updated.

Change-Id: I4303c44e249cda31457a4f05a681e298d225a8b7
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2019-01-04 11:46:19 -06:00
Michael Vollman
b56dbc6b4f Fix missing logstash device error
The findmnt command is printing the fsroot as [/dir] at the end
of the source device and that is causing as a parsing error to
occur when it is assumed that only the device string is returned leading
to an error looking up an invalid device.

Change-Id: If95f8e0ed8154ad0277972159afac9f967b79c8f
2018-10-15 14:28:31 -04:00
Kevin Carter
4c86cb9be2
Add changes to the sysconfig defaults file
These changes mirror systemd tunables for elasticsearch and are needed
to ensure any OS without systemd (like Ubuntu 14.04) has the same
capabilities and OS's with systemd. This also adds a specific sysctl
file to use when making sysctl changes. This will ensure we're not
subjecting our deployment to other changes from other sources, like an
OSA playbook run.

Change-Id: Ic0e0bc0f93a12298c1e2f634cf5a1b4c6be2995e
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-27 09:44:09 -05:00
Kevin Carter
814622cc6c
Improve logstash and elasticsearch performance
The logstash and elasticsearch performance can be improved by using
async index options, pulling back the refresh interval, and by not
fingerprinting every document.

* Async translog allows elasticsearch to using run fsync in the
  background instead of blocking
* the refresh interval will now be 5x the number of replicas with a cap
  of 30. This integer is representitive of the seconds between index
  refresh calls which greatly lowers the load generated across the
  cluster.
* All documents were fingerprinted before writting to the cluster. This
  was a costly operation as elasticsearch will do a forward lookup on all
  documents with a preset ID resulting in 100's, if not 1000's, of extra
  reads. The purpose of the fingerprint function is to limit repeading
  writes so to keep some of this functionality the fingerprint function is
  now only added to documents with messages.
* G1 garbage collection is now enabled by default when the heap size is
  > 6GiB. Early versions of elasticsearch did not recommend this setting
  however its since stabalized in recent releases.
* JVM options have been moved into the elasticsearch and logstash roles
  allowing these tasks to trigger service restarts when changes are made.

Change-Id: I805129b207ad4db182ae6e59b6ec78eb3e246b54
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-21 21:47:07 -05:00
Dave Wilde
e4bd1fdaed MNAIO ELK Updates
* We don't need to create the containers as they are created during the 
initial run.

* Remove quoting in favor of {% raw %} blocks

Change-Id: Ied696ad0882169d523a60a900788e7c2ba1d3fa3
2018-09-19 10:52:32 -05:00
Kevin Carter
0b0efcb841
Add capability to set node role
Presently the node role assignment is only automatic. Auto selection
makes the assumption every node is identical however in many deployments
a deployer may want to assign node roles to specific hardware thereby
optimizing resources and improving general performance. This change
adds and documents the ability to set the node roles within an ansible
inventory.

Change-Id: I22a2b636cb1441f17e575439b55ca64f9c7b0336
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-18 12:35:06 -05:00
Kevin Carter
0d4a4a92c7
Converg the logstash pipelines and enhance memory backed queues
The multi-logstash pipeline setup, while amazingly fast, was crashing
and causing index errors when under high load for a long period of time.
Because of the crashing behavior and the fact that the folks from
Elastic describe multi-pipeline queues to be "beta" at this time the
logstash pipelines have been converted back into a single pipeline.

The memory backed queue options are now limited by a ram disk (tmpfs)
which will ensure that a burst within the queue does not cause OOM
issues and ensures a highly performant deployment and limiting memory
usage at the same time. Memory backed queues will be enabled when the
underlying system is using "rotational" media as detected by ansible
facts. This will ensure a fast and consistent experience across all
deployment types.

Pipeline/ml/template/dashboard setup has been added to the beat
configurations which will ensure beats are properly configured even
when running in an isolated deployment and outside of normal operations
where beats are generally configured on the first data node.

Change-Id: Ie3c775f98b14f71bcbed05db9cb1c5aa46d9c436
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-16 23:44:58 -05:00
Kevin Carter
a98035e177
Correct elasticsearch list entropy
The list of elasticsearch hosts was being randomized too much which
results in the a performance issue. This change reduces the entropy and
ensures that the list of hosts is correctly ordered such that localhost
is always used first and other nodes in the cluster will be used as a
fall back.

Change-Id: Ifb551a6e01b5c0e1f62c1466a3d5b344a3c5da97
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-09-12 13:13:13 -05:00
Kevin Carter
1c56b7f034
Add option block to ensure apache2 is enabled correctly
The apache2 monitoring process requires a couple interactions to deploy
successfully. This change will ensure that if the apache2 monitoring
fails, in any way, it does not block the deployment.

Change-Id: Ibe35197a1c65f4abe9e4870c07ee15f37f9a58ab
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-29 15:39:08 -05:00
Kevin Carter
ce9007cda5
Thread pools should be based on processor counts
The current setup was using processor cores from ansible facts which in
a multi-core, single socket system could result in 1. Using the
processor count will return the logical processor count giving us a more
performant setup when the compute power is present.

Change-Id: Ia5b63d45691f58e848d05cc4a4e5f353b993a347
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-25 01:51:16 -05:00
Kevin Carter
e4c84aa28d
Add Redhat to the ELK deployment capabilities
Change-Id: Id34e046a546f8d0878843596f53e400165e37c6e
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-13 18:59:57 -05:00
Kevin Carter
bf6a8d85e7
Add SUSE support
This change adds SUSE 42.3 support to the elastic telemetry solutions.

Change-Id: Ibe93ea0d1ead9e7fe6da16d89989cfe5ade0f43e
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-12 23:53:23 -05:00
Kevin Carter
8db0238749 Move most of the variables into the roles
Change-Id: I82a48c554c164c7166c1a0d4e3192332af5024fb
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-13 03:20:33 +00:00
Kevin Carter
45df59ed7e
move the bulk to templates into the new roles
This change will help with organization throughout the stack.

Change-Id: I2ad865db534ae1d377bbdecd4b421ee0fc802536
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-12 22:19:19 -05:00
Kevin Carter
b9fa34d42e Add syslog input into logstash
The new option logstash_syslog_input_enabled has been added which will
allow users to enable a direct syslog input. When enabled, messages will
be processed via logstash and sent directly to elasticsearch.

Change-Id: Icb7712ecb8aae3d7f99df80ae1c5cd647a15ce83
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-11 03:08:32 -05:00
Kevin Carter
79c3a3cf93
Add trusty support to the project
This change adds Ubuntu 14.04 support to the project.

Change-Id: I20695e19409b63c6e1def4ccf8929c6d52be647e
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-11 00:00:03 -05:00
Kevin Carter
6aa88dd7b7
Add bionic job to elk_metrics_6x testing
Change-Id: I67bbfa116c45a82eb8b5bc191d19d203493f0b00
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-09 00:44:32 -05:00
Kevin Carter
3a0b3d2cde
Convert playbooks into roles
This change adds the scaffolding required to get multi-distro support
running in the roles. The change breaks up our playbooks converting all
of the tasks into various roles with internal dependencies. While this
will improve execution time, the change is being done to reduce boiler
plate and to allow us to build on the pattern used in OSA to provide
multi-distro capabilities.

A side effect of this change is a major improvement in idempotency. The
playbooks should now be 100% idempotent.

All of the templates have been left in the main playbook directory. This
was done to help ease the transition. In a future PR the template
structure will be moved into the roles where it needs to be.

The main variable files has been left intact. This file will be carved
up into role defaults in a future PR.

Change-Id: I938a10564128ce4078fa12edcf614dcdbd684b25
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
2018-08-09 00:41:05 -05:00