d14cde9dec
Add a resolved_at attribute in the query yaml files that can be used to mark when a bug has been fixed or does not occur any more. This can help us re-enable bugs quickly when we see them again. Change-Id: I7af7ce9417eec5ff9ecc2487a920ff9d1286a714 |
||
---|---|---|
doc/source | ||
elastic_recheck | ||
queries | ||
web | ||
.coveragerc | ||
.gitignore | ||
.gitreview | ||
.testr.conf | ||
babel.cfg | ||
CONTRIBUTING.rst | ||
elasticRecheck.conf.sample | ||
LICENSE | ||
MANIFEST.in | ||
README.rst | ||
recheckwatchbot.yaml | ||
requirements.txt | ||
setup.cfg | ||
setup.py | ||
test-requirements.txt | ||
tox.ini |
elastic-recheck
"Classify tempest-devstack failures using ElasticSearch"
- Free software: Apache license
- Documentation: http://docs.openstack.org/developer/elastic-recheck
Idea
When a tempest job failure is detected, by monitoring gerrit (using gerritlib), a collection of logstash queries will be run on the failed job to detect what the bug was.
Eventually this can be tied into the rechecker tool and launchpad
queries/
All queries are stored in separate yaml files in a queries directory
at the top of the elastic_recheck code base. The format of these files
is ######.yaml (where ###### is the bug number), the yaml should have a
query
keyword which is the query text for elastic
search.
Guidelines for good queries
- After a bug is resolved and has no more hits in elasticsearch, we should flag it with a resolved_at keyword. This will let us keep some memory of past bugs, and see if they come back. (Note: this is a forward looking statement, sorting out resolved_at will come in the future)
- Queries should get as close as possible to fingerprinting the root cause
- Queries should not return any hits for successful jobs, this is a sign the query isn't specific enough
In order to support rapidly added queries, it's considered socially acceptable to +A changes that only add 1 new bug query, and to even self approve those changes by core reviewers.
Future Work
- Move config files into a separate directory
- Make unit tests robust
- Add debug mode flag
- Expand gating testing
- Cleanup and document code better
- Sort out resolved_at stamping to remove active bugs
- Move away from polling ElasticSearch to discover if its ready or not
- Add nightly job to propose a patch to remove bug queries that return no hits -- Bug hasn't been seen in 2 weeks and must be closed
- implement resolved_at in loader
Main Dependencies
- gerritlib
- pyelasticsearch