3d4495d243
... because it bothered me that it wasn't also add in a few more relevant aspects to our current approach to development, to keep this in line with how the project actually works. Change-Id: I6aa66263b3018d07512083cad86945295457ded1
61 lines
2.0 KiB
ReStructuredText
61 lines
2.0 KiB
ReStructuredText
===============================
|
|
elastic-recheck
|
|
===============================
|
|
|
|
"Classify tempest-devstack failures using ElasticSearch"
|
|
|
|
* Free software: Apache license
|
|
* Documentation: http://docs.openstack.org/developer/elastic-recheck
|
|
|
|
Idea
|
|
----
|
|
When a tempest job failure is detected, by monitoring gerrit (using
|
|
gerritlib), a collection of logstash queries will be run on the failed
|
|
job to detect what the bug was.
|
|
|
|
Eventually this can be tied into the rechecker tool and launchpad
|
|
|
|
|
|
queries/
|
|
------------
|
|
|
|
All queries are stored in separate yaml files in a queries directory
|
|
at the top of the elastic_recheck code base. The format of these files
|
|
is ######.yaml (where ###### is the bug number), the yaml should have
|
|
a ``query`` keyword which is the query text for elastic search.
|
|
|
|
Guidelines for good queries
|
|
|
|
- After a bug is resolved and has no more hits in elasticsearch, we
|
|
should flag it with a resolved_at keyword. This will let us keep
|
|
some memory of past bugs, and see if they come back. (Note: this is
|
|
a forward looking statement, sorting out resolved_at will come in
|
|
the future)
|
|
- Queries should get as close as possible to fingerprinting the root cause
|
|
- Queries should not return any hits for successful jobs, this is a
|
|
sign the query isn't specific enough
|
|
|
|
In order to support rapidly added queries, it's considered socially
|
|
acceptable to +A changes that only add 1 new bug query, and to even
|
|
self approve those changes by core reviewers.
|
|
|
|
|
|
Future Work
|
|
------------
|
|
- Move config files into a separate directory
|
|
- Make unit tests robust
|
|
- Add debug mode flag
|
|
- Expand gating testing
|
|
- Cleanup and document code better
|
|
- Sort out resolved_at stamping to remove active bugs
|
|
- Move away from polling ElasticSearch to discover if its ready or not
|
|
- Add nightly job to propose a patch to remove bug queries that return
|
|
no hits -- Bug hasn't been seen in 2 weeks and must be closed
|
|
- implement resolved_at in loader
|
|
|
|
|
|
Main Dependencies
|
|
------------------
|
|
- gerritlib
|
|
- pyelasticsearch
|