clean up readme
Change-Id: I6c52fd6a75ca1b6bcbb97624afd6c62eb074f03c
This commit is contained in:
parent
bff0cbd899
commit
eb9126decd
89
README.rst
89
README.rst
@ -1,6 +1,6 @@
|
|||||||
===============================
|
===============
|
||||||
elastic-recheck
|
elastic-recheck
|
||||||
===============================
|
===============
|
||||||
|
|
||||||
"Use ElasticSearch to classify OpenStack gate failures"
|
"Use ElasticSearch to classify OpenStack gate failures"
|
||||||
|
|
||||||
@ -8,27 +8,28 @@ elastic-recheck
|
|||||||
|
|
||||||
Idea
|
Idea
|
||||||
----
|
----
|
||||||
Identifying the specific bug that is causing a transient error in the gate
|
|
||||||
is very hard. Just identifying which tempest test failed is not enough
|
Identifying the specific bug that is causing a transient error in the gate is
|
||||||
because a single bug can potentially cause multiple tempest tests to fail.
|
difficult. Just identifying which tempest test failed is not enough because a
|
||||||
If we can find a fingerprint for a specific bug using logs, then we can use
|
single tempest test can fail due to any number of underlying bugs. If we can
|
||||||
ElasticSearch to automatically detect any occurrences of the bug.
|
find a fingerprint for a specific bug using logs, then we can use ElasticSearch
|
||||||
|
to automatically detect any occurrences of the bug.
|
||||||
|
|
||||||
Using these fingerprints elastic-recheck can:
|
Using these fingerprints elastic-recheck can:
|
||||||
|
|
||||||
* Search ElasticSearch for all occurrences of a bug.
|
* Search ElasticSearch for all occurrences of a bug.
|
||||||
* Identify bug trends such as: when it started, is the bug fixed, is it
|
* Identify bug trends such as: when it started, is the bug fixed, is it getting
|
||||||
getting worse, etc.
|
worse, etc.
|
||||||
* Classify bug failures in real time and report back to gerrit if we find a
|
* Classify bug failures in real time and report back to gerrit if we find a
|
||||||
match, so a patch author knows why the test failed.
|
match, so a patch author knows why the test failed.
|
||||||
|
|
||||||
queries/
|
queries/
|
||||||
--------
|
--------
|
||||||
|
|
||||||
All queries are stored in separate yaml files in a queries directory
|
All queries are stored in separate yaml files in a queries directory at the top
|
||||||
at the top of the elastic-recheck code base. The format of these files
|
of the elastic-recheck code base. The format of these files is ######.yaml
|
||||||
is ######.yaml (where ###### is the launchpad bug number), the yaml should have
|
(where ###### is the launchpad bug number), the yaml should have a ``query``
|
||||||
a ``query`` keyword which is the query text for elastic search.
|
keyword which is the query text for elastic search.
|
||||||
|
|
||||||
Guidelines for good queries:
|
Guidelines for good queries:
|
||||||
|
|
||||||
@ -36,60 +37,60 @@ Guidelines for good queries:
|
|||||||
filename query is typically better than a console one, as that's matching a
|
filename query is typically better than a console one, as that's matching a
|
||||||
deep failure versus a surface symptom.
|
deep failure versus a surface symptom.
|
||||||
|
|
||||||
- Queries should not return any hits for successful jobs, this is a
|
- Queries should not return any hits for successful jobs, this is a sign the
|
||||||
sign the query isn't specific enough. A rule of thumb is > 10% success hits
|
query isn't specific enough. A rule of thumb is > 10% success hits probably
|
||||||
probably means this isn't good enough.
|
means this isn't good enough.
|
||||||
|
|
||||||
- If it's impossible to build a query to target a bug, consider patching the
|
- If it's impossible to build a query to target a bug, consider patching the
|
||||||
upstream program to be explicit when it fails in a particular way.
|
upstream program to be explicit when it fails in a particular way.
|
||||||
|
|
||||||
- Use the 'tags' field rather than the 'filename' field for filtering. This is
|
- Use the 'tags' field rather than the 'filename' field for filtering. This is
|
||||||
primarily because of grenade jobs where the same log file shows up in the
|
primarily because of grenade jobs where the same log file shows up in the
|
||||||
'old' and 'new' side of the grenade job. For example, tags:"screen-n-cpu.txt"
|
'old' and 'new' side of the grenade job. For example,
|
||||||
will query in logs/old/screen-n-cpu.txt and logs/new/screen-n-cpu.txt. The
|
``tags:"screen-n-cpu.txt"`` will query in ``logs/old/screen-n-cpu.txt`` and
|
||||||
tags:"console" filter is also used to query in console.html as well as
|
``logs/new/screen-n-cpu.txt``. The ``tags:"console"`` filter is also used to
|
||||||
tempest and devstack logs.
|
query in ``console.html`` as well as tempest and devstack logs.
|
||||||
|
|
||||||
- Avoid the use of wildcards in queries since they can put an undue burden on
|
- Avoid the use of wildcards in queries since they can put an undue burden on
|
||||||
the query engine. A common case where wildcards are used and shouldn't be are
|
the query engine. A common case where wildcards are used and shouldn't be are
|
||||||
in querying against a specific set of build_name fields,
|
in querying against a specific set of ``build_name`` fields, e.g.
|
||||||
e.g. gate-nova-python26 and gate-nova-python27.
|
``gate-nova-python26`` and ``gate-nova-python27``. Rather than use
|
||||||
Rather than use build_name:gate-nova-python*, list the jobs with an OR, e.g.:
|
``build_name:gate-nova-python*``, list the jobs with an ``OR``. For example::
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
(build_name:"gate-nova-python26" OR build_name:"gate-nova-python27")
|
(build_name:"gate-nova-python26" OR build_name:"gate-nova-python27")
|
||||||
|
|
||||||
In order to support rapidly added queries, it's considered socially
|
In order to support rapidly added queries, it's considered socially acceptable
|
||||||
acceptable to +A changes that only add 1 new bug query, and to even
|
to approve changes that only add 1 new bug query, and to even self approve
|
||||||
self approve those changes by core reviewers.
|
those changes by core reviewers.
|
||||||
|
|
||||||
|
|
||||||
Adding Bug Signatures
|
Adding Bug Signatures
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
Most transient bugs seen in gate are not bugs in tempest associated
|
Most transient bugs seen in gate are not bugs in tempest associated with a
|
||||||
with a specific tempest test failure, but rather some sort of issue
|
specific tempest test failure, but rather some sort of issue further down the
|
||||||
further down the stack that can cause many tempest tests to fail.
|
stack that can cause many tempest tests to fail.
|
||||||
|
|
||||||
#. Given a transient bug that is seen during the gate, go through the
|
#. Given a transient bug that is seen during the gate, go through `the logs
|
||||||
logs (logs.openstack.org) and try to find a log that is associated
|
<http://logs.openstack.org/>`_ and try to find a log that is associated with
|
||||||
with the failure. The closer to the root cause the better.
|
the failure. The closer to the root cause the better.
|
||||||
|
|
||||||
Note that queries can only be written against INFO level and higher log
|
Note that queries can only be written against INFO level and higher log
|
||||||
messages. This is by design to not overwhelm the search cluster.
|
messages. This is by design to not overwhelm the search cluster.
|
||||||
|
|
||||||
#. Go to logstash.openstack.org and create an elastic search query to
|
#. Go to `logstash.openstack.org <http://logstash.openstack.org/>`_ and create
|
||||||
find the log message from step 1. To see the possible fields to
|
an elastic search query to find the log message from step 1. To see the
|
||||||
search on click on an entry. Lucene query syntax is available at
|
possible fields to search on click on an entry. Lucene query syntax is
|
||||||
http://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description
|
available at `lucene.apache.org
|
||||||
|
<http://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description>`_.
|
||||||
|
|
||||||
#. Add a comment to the bug with the query you identified and a link to
|
#. Tag your commit with a ``Related-Bug`` tag in the footer, or add a comment
|
||||||
the logstash url for that query search.
|
to the bug with the query you identified and a link to the logstash URL for
|
||||||
#. Add the query to ``elastic-recheck/queries/BUGNUMBER.yaml`` and push
|
that query search.
|
||||||
the patch up for review.
|
|
||||||
https://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries
|
|
||||||
|
|
||||||
|
#. Add the query to ``elastic-recheck/queries/BUGNUMBER.yaml``
|
||||||
|
(All queries can be found on `git.openstack.org
|
||||||
|
<https://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries>`_)
|
||||||
|
and push the patch up for review.
|
||||||
|
|
||||||
Future Work
|
Future Work
|
||||||
------------
|
------------
|
||||||
|
Loading…
Reference in New Issue
Block a user