
Implement the spec for multi-tenancy support for metrics. This adds a new 'Aetos' datasource very similar to the current Prometheus datasource. Because of that, the original PrometheusHelper class was split into two classes and the base class is used for PrometheusHelper and for AetosHelper. Except for the split, there is one more change to the original PrometheusHelper class code, which is the addition and use of the _get_fqdn_label() and _get_instance_uuid_label() methods. As part of the change, I refactored the current prometheus datasource unit tests. Most of them are now used to test the PrometheusBase class with minimal changes. Changes I've made to the original tests: - the ones that can be be used to test the base class are moved into the TestPrometheusBase class - the _setup_prometheus_client, _get_instance_uuid_label and _get_fqdn_label functions are mocked in the base class tests. Their concrete implementations are tested in each datasource tests separately. - a self._create_helper() is used to instantiate the helper class with correct mocking. - all config value modification is the original tests got moved out and instead of modifying the config values, the _get_* methods are mocked to return the wanted values - to keep similar test coverage, config retrieval is tested for each concrete class by testing the _get_* methods. New watcher-aetos-integration and watcher-aetos-integration-realdata zuul jobs are added to test the new datasource. These use the same set of tempest tests as the current watcher-prometheus-integration jobs. The only difference is the environment setup and the Watcher config, so that the job deploys Aetos and Watcher uses it instead of accessing Prometheus directly. At first this was generated by asking cursor to implement the linked spec with some additional prompts for some smaller changes. Afterwards I manually went through the code doing some cleanups, ensuring it complies with PEP8 and hacking and so on. Later on I manually adjusted the code to use the latest observabilityclient changes. The zuul job was also mostly generated by cursor. Implements: https://blueprints.launchpad.net/watcher/+spec/prometheus-multitenancy-support Generated-By: Cursor with claude-4-sonnet model Change-Id: I72c2171f72819bbde6c9cbbf565ee895e5d2bd53 Signed-off-by: Jaromir Wysoglad <jwysogla@redhat.com>
6.6 KiB
Aetos datasource
Synopsis
The Aetos datasource allows Watcher to use an Aetos reverse proxy server as the source for collected metrics used by the Watcher decision engine. Aetos is a multi-tenant aware reverse proxy that sits in front of a Prometheus server and provides Keystone authentication and role-based access control. The Aetos datasource uses Keystone service discovery to locate the Aetos endpoint and requires authentication via Keystone tokens.
Requirements
The Aetos datasource has the following requirements:
- An Aetos reverse proxy server deployed in front of Prometheus
- Aetos service registered in Keystone with service type 'metric-storage'
- Valid Keystone credentials for Watcher with admin or service role
- Prometheus metrics with appropriate labels (same as direct Prometheus access)
Like the Prometheus datasource, it is required that Prometheus
metrics contain a label to identify the hostname of the exporter from
which the metric was collected. This is used to match against the
Watcher cluster model ComputeNode.hostname
. The default for
this label is fqdn
and in the prometheus scrape configs
would look like:
scrape_configs:
- job_name: node
static_configs:
- targets: ['10.1.2.3:9100']
labels:
fqdn: "testbox.controlplane.domain"
This default can be overridden when a deployer uses a different label
to identify the exporter host (for example hostname
or
host
, or any other label, as long as it identifies the
host).
Internally this label is used in creating
fqdn_instance_labels
, containing the list of values
assigned to the label in the Prometheus targets. The elements of the
resulting fqdn_instance_labels are expected to match the
ComputeNode.hostname
used in the Watcher decision engine
cluster model. An example fqdn_instance_labels
is the
following:
[
'ena.controlplane.domain',
'dio.controlplane.domain',
'tria.controlplane.domain',
]
For instance metrics, it is required that Prometheus contains a label
with the uuid of the OpenStack instance in each relevant metric. By
default, the datasource will look for the label resource
.
The instance_uuid_label
config option in watcher.conf
allows deployers to override this default to any other label name that
stores the uuid
.
Limitations
The Aetos datasource shares the same limitations as the Prometheus datasource:
The current implementation doesn't support the
statistic_series
function of the Watcher
class DataSourceBase
. It is expected that the
statistic_aggregation
function (which is implemented) is
sufficient in providing the current state of the
managed resources in the cluster. The statistic_aggregation
function defaults to querying back 300 seconds, starting from the
present time (the time period is a function parameter and can be set to
a value as required). Implementing the statistic_series
can
always be re-visited if the requisite interest and work cycles are
volunteered by the interested parties.
One further note about a limitation in the implemented
statistic_aggregation
function. This function is defined
with a granularity
parameter, to be used when querying
whichever of the Watcher DataSourceBase
metrics providers.
In the case of Aetos (like Prometheus), we do not fetch and then process
individual metrics across the specified time period. Instead we use the
PromQL querying operators and functions, so that the server itself will
process the request across the specified parameters and then return the
result. So granularity
parameter is redundant and remains
unused for the Aetos implementation of
statistic_aggregation
. The granularity of the data fetched
by Prometheus server is specified in configuration as the server
scrape_interval
(current default 15 seconds).
Additionally, there is a slight performance impact compared to direct Prometheus access. Since Aetos acts as a reverse proxy in front of Prometheus, there is an additional step for each request, resulting in slightly longer delays.
Configuration
A deployer must set the datasources
parameter to include
aetos
under the watcher_datasources section of watcher.conf
(or add aetos
in datasources for a specific strategy if
preferred eg. under the
[watcher_strategies.workload_stabilization]
section).
Note
Having both Prometheus and Aetos datasources configured at the same time is not supported and will result in a configuration error. Allowing this can be investigated in the future if a need or a proper use case is identified.
The watcher.conf configuration file is also used to set the parameter
values required by the Watcher Aetos data source. The configuration can
be added under the [aetos_client]
section and the available
options are duplicated below from the code as they are self
documenting:
cfg.StrOpt('interface',
default='public',
choices=['internal', 'public', 'admin'],
help="Type of endpoint to use in keystoneclient."),
cfg.StrOpt('region_name',
help="Region in Identity service catalog to use for "
"communication with the OpenStack service."),
cfg.StrOpt('fqdn_label',
default='fqdn',
help="The label that Prometheus uses to store the fqdn of "
"exporters. Defaults to 'fqdn'."),
cfg.StrOpt('instance_uuid_label',
default='resource',
help="The label that Prometheus uses to store the uuid of "
"OpenStack instances. Defaults to 'resource'."),
Authentication and Service Discovery
Unlike the Prometheus datasource which requires explicit host and port configuration, the Aetos datasource uses Keystone service discovery to automatically locate the Aetos endpoint. The datasource:
- Uses the configured Keystone credentials to authenticate
- Searches the service catalog for a service with type 'metric-storage'
- Uses the discovered endpoint URL to connect to Aetos
- Attaches a Keystone token to each request for authentication
If the Aetos service is not registered in Keystone, the datasource will fail to initialize and prevent the decision engine from starting.
So a sample watcher.conf configured to use the Aetos datasource would look like the following:
[watcher_datasources]
datasources = aetos
[aetos_client]
interface = public
region_name = RegionOne
fqdn_label = fqdn