The logstash and elasticsearch performance can be improved by using
async index options, pulling back the refresh interval, and by not
fingerprinting every document.
* Async translog allows elasticsearch to using run fsync in the
background instead of blocking
* the refresh interval will now be 5x the number of replicas with a cap
of 30. This integer is representitive of the seconds between index
refresh calls which greatly lowers the load generated across the
cluster.
* All documents were fingerprinted before writting to the cluster. This
was a costly operation as elasticsearch will do a forward lookup on all
documents with a preset ID resulting in 100's, if not 1000's, of extra
reads. The purpose of the fingerprint function is to limit repeading
writes so to keep some of this functionality the fingerprint function is
now only added to documents with messages.
* G1 garbage collection is now enabled by default when the heap size is
> 6GiB. Early versions of elasticsearch did not recommend this setting
however its since stabalized in recent releases.
* JVM options have been moved into the elasticsearch and logstash roles
allowing these tasks to trigger service restarts when changes are made.
Change-Id: I805129b207ad4db182ae6e59b6ec78eb3e246b54
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
Presently the node role assignment is only automatic. Auto selection
makes the assumption every node is identical however in many deployments
a deployer may want to assign node roles to specific hardware thereby
optimizing resources and improving general performance. This change
adds and documents the ability to set the node roles within an ansible
inventory.
Change-Id: I22a2b636cb1441f17e575439b55ca64f9c7b0336
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The multi-logstash pipeline setup, while amazingly fast, was crashing
and causing index errors when under high load for a long period of time.
Because of the crashing behavior and the fact that the folks from
Elastic describe multi-pipeline queues to be "beta" at this time the
logstash pipelines have been converted back into a single pipeline.
The memory backed queue options are now limited by a ram disk (tmpfs)
which will ensure that a burst within the queue does not cause OOM
issues and ensures a highly performant deployment and limiting memory
usage at the same time. Memory backed queues will be enabled when the
underlying system is using "rotational" media as detected by ansible
facts. This will ensure a fast and consistent experience across all
deployment types.
Pipeline/ml/template/dashboard setup has been added to the beat
configurations which will ensure beats are properly configured even
when running in an isolated deployment and outside of normal operations
where beats are generally configured on the first data node.
Change-Id: Ie3c775f98b14f71bcbed05db9cb1c5aa46d9c436
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The list of elasticsearch hosts was being randomized too much which
results in the a performance issue. This change reduces the entropy and
ensures that the list of hosts is correctly ordered such that localhost
is always used first and other nodes in the cluster will be used as a
fall back.
Change-Id: Ifb551a6e01b5c0e1f62c1466a3d5b344a3c5da97
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The current setup was using processor cores from ansible facts which in
a multi-core, single socket system could result in 1. Using the
processor count will return the logical processor count giving us a more
performant setup when the compute power is present.
Change-Id: Ia5b63d45691f58e848d05cc4a4e5f353b993a347
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
This change will help with organization throughout the stack.
Change-Id: I2ad865db534ae1d377bbdecd4b421ee0fc802536
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>