99 Commits

Author SHA1 Message Date
Arnaud Morin
989dbb8aad Enable use of quorum queues for transient messages
Add a new flag rabbit_transient_quorum_queue to enable the use of quorum
for transient queues (reply_ and _fanout_)

This is helping a lot OpenStack services to not fail (and recover) from
a rabbit node issue.

Related-bug: #2031497

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: Icee5ee6938ca7c9651f281fb835708fc88b8464f
2023-11-12 00:08:20 +01:00
Arnaud Morin
8e3c523fd7 Auto-delete the failed quorum rabbit queues
When rabbit is failing for a specific quorum queue, the only thing to
do is to delete the queue (as per rabbit doc, see [1]).

So, to avoid the RPC service to be broken until an operator eventually
do a manual fix on it, catch any INTERNAL ERROR (code 541) and trigger
the deletion of the failed queues under those conditions.
So on next queue declare (triggered from various retries), the queue
will be created again and the service will recover by itself.

Closes-Bug: #2028384
Related-bug: #2031497

[1] https://www.rabbitmq.com/quorum-queues.html#availability

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: Ib8dba833542973091a4e0bf23bb593aca89c5905
2023-11-12 00:08:20 +01:00
Arnaud Morin
f23f3276c4 Allow creating transient queues with no expire
When an operator rely on rabbitmq policies, there is no point to set the
queue TTL in config.
Moreover, using policies is much more simpler as you dont need to
delete/recreate the queues to apply the new parameter (see [1]).
So, adding the possibility to set the transient queue TTL to 0 will
allow the creation of the queue without the x-expire parameter and only
the policy will apply.

[1] https://www.rabbitmq.com/parameters.html#policies

Related-bug: #2031497

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: I34bad0f6d8ace475c48839adc68a023dd0c380de
2023-11-12 00:08:20 +01:00
Zuul
38c86a93ad Merge "Set default heartbeat_rate to 3" 2023-10-11 13:29:33 +00:00
OpenStack Proposal Bot
8759cd7d9a Imported Translations from Zanata
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: I59074f7789476cdfb861be97c60b720bc3ccba54
2023-09-30 04:10:18 +00:00
7705b4f302 Update master for stable/2023.2
Add file to the reno documentation build to show release notes for
stable/2023.2.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.2.

Sem-Ver: feature
Change-Id: I8e9c35ebe41e0283309d64db97a4d9ffebcf9626
2023-09-07 09:37:14 +00:00
Zuul
fa15630041 Merge "Deprecate the amqp1 driver and Remove qpid functional tests" 2023-08-13 10:36:35 +00:00
Arnaud Morin
36fb5bceab Set default heartbeat_rate to 3
Kombu recommend to run heartbeat_check every seconds but we use a lock
around the kombu connection so, to not lock to much this lock to most of
the time do nothing except waiting the events drain, we start
heartbeat_check and retrieve the server heartbeat packet only two times
more than the minimum required for the heartbeat works:
    heartbeat_timeout / heartbeat_rate / 2.0

Because of this, we are not sending the heartbeat frames at correct
intervals. E.G.

If heartbeat_timeout=60 and rate=2, AMQP protocol expects to send a
frame
every 30sec.

With the current heartbeat_check implementation, heartbeat_check will be
called every:
    heartbeat_timeout / heartbeat_rate / 2.0 = 60 / 2 / 2.0 = 15
Which will result in the following frame flow:
    T+0  --> do nothing (60/2 > 0)
    T+15 --> do nothing (60/2 > 15)
    T+30 --> do nothing (60/2 > 30)
    T+45 --> send a frame (60/2 < 45)
    ...

With heartbeat_rate=3, the heartbeat_check will be executed more often:
    heartbeat_timeout / heartbeat_rate / 2.0 = 60 / 3 / 2.0 = 10
Frame flow:
    T+0  --> do nothing (60/3 > 0)
    T+10 --> do nothing (60/3 > 10)
    T+20 --> do nothing (60/3 > 20)
    T+30 --> send a frame (60/3 < 30)
    ...

Now we are sending the frame with correct intervals

Closes-bug: #2008734

Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
Change-Id: Ie646d254faf5e45ba46948212f4c9baf1ba7a1a8
2023-08-08 15:23:59 +02:00
OpenStack Proposal Bot
ec1c99d562 Imported Translations from Zanata
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: If124c09bfa878585b89dee7578a97c12c42497f8
2023-06-22 03:45:12 +00:00
Andrew Bogott
0602d1a10a Increase ACK_REQUEUE_EVERY_SECONDS_MAX to exceed default kombu_reconnect_delay
Previously the two values were the same; this caused us
to always exceed the timeout limit ACK_REQUEUE_EVERY_SECONDS_MAX
which results in various code paths never being traversed
due to premature timeout exceptions.

Also apply min/max values to kombu_reconnect_delay so it doesn't
exceed ACK_REQUEUE_EVERY_SECONDS_MAX and break things again.

Closes-Bug: #1993149
Change-Id: I103d2aa79b4bd2c331810583aeca53e22ee27a49
2023-04-20 15:27:58 -05:00
9b1e2dc48e Update master for stable/2023.1
Add file to the reno documentation build to show release notes for
stable/2023.1.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.1.

Sem-Ver: feature
Change-Id: I80f227a59c36693c83bb94890536745610ba2393
2023-02-24 15:20:06 +00:00
Dmitriy Rabotyagov
115cfb5b7c Fix typo in quorum-related variables for RabbitMQ
In [1] there was a typo made in variable names. To prevent even futher
awkwardness regarding variable naming, we fix typo and publish a
release note for ones that already using variables in their deployments.

[1] https://review.opendev.org/c/openstack/oslo.messaging/+/831058

Change-Id: Icc438397c11521f3e5e9721f85aba9095e0831c2
2023-02-14 15:20:00 +00:00
Tobias Urdin
687dea2e65 Support overriding class for get_rpc_* helper functions
We currently do not support overriding the class being
instantiated in the RPC helper functions, this adds that
support so that projects that define their own classes
that inherit from oslo.messaging can use the helpers.

For example neutron utilizes code from neutron-lib that
has it's own RPCClient implementation that inherits from
oslo.messaging, in order for them to use for example
the get_rpc_client helper they need support to override
the class being returned. The alternative would be to
modify the internal _manual_load variable which seems
counter-productive to extending the API provided to
consumers.

Change-Id: Ie22f2ee47a4ca3f28a71272ee1ffdb88aaeb7758
2023-01-23 08:40:37 +00:00
Zuul
9f710ce6cd Merge "Remove logging from ProducerConnection._produce_message" 2022-12-21 07:46:22 +00:00
Zuul
2e81fac973 Merge "Implement get_rpc_client function" 2022-12-01 18:45:46 +00:00
Zuul
b3c666ff34 Merge "Force creating non durable control exchange when a precondition failed" 2022-11-16 09:27:05 +00:00
Tobias Urdin
4ead7cb2dc Implement get_rpc_client function
We already expose functions to handle the instantiation
of classes such as RPCServer and RPCTransport but the
same was never done for RPCClient so the API is
inconsistent in its enforcement.

This adds a get_rpc_client function that should be used
instead of instatiating the RPCClient class directly to
be more consistent.

This also allows to handle more logic inside the function
in the future such as if implementations for an async client
is implemented, as investigation in [1] has shown.

[1] https://review.opendev.org/c/openstack/oslo.messaging/+/858936

Change-Id: Ia4d1f0497b9e2728bde02f4ff05fdc175ddffe66
2022-10-25 11:42:40 +00:00
Hervé Beraud
0f63c227f5 Deprecate the amqp1 driver and Remove qpid functional tests
A recent oslo.messaging patch [1], not yet merged, who aim to update the
test runtime for antelope lead us to the following error:

```
qdrouterd: Python: ModuleNotFoundError: No module named 'qpid_dispatch'
```

Neither debian nor ubuntu in the latest releases have any binary
built for the qpid backend, not even 3rd party. Only qpid proton,
the client lib, is available.

To solve this issue, these changes propose to deprecate the AMQP1 driver
who is the one based on qpid and proton, and propose to remove the
related functional tests.

The AMQP1 driver doesn't seems to be widely used.

[1] https://review.opendev.org/c/openstack/oslo.messaging/+/856643

Closes-Bug: 1992587
Change-Id: Id2ca9cd9ee8b8dbdd14dcd00ebd8188d20ea18dc
2022-10-18 11:27:46 +02:00
e5e70a5d89 Update master for stable/zed
Add file to the reno documentation build to show release notes for
stable/zed.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/zed.

Sem-Ver: feature
Change-Id: Ic1020b39172981abcc9fc3d66fc6ec58f440a456
2022-09-09 09:17:04 +00:00
Slawek Kaplonski
e44f286ebc Change default value of "heartbeat_in_pthread" to False
As was reported in the related bug some time ago, setting that
option to True for nova-compute can break it as it's non-wsgi service.
We also noticed same problems with randomly stucked non-wsgi services
like e.g. neutron agents and probably the same issue can happen with
any other non-wsgi service.

To avoid that this patch changes default value of that config option
to be False.
Together with [1] it effectively reverts change done in [2] some time
ago.

[1] https://review.opendev.org/c/openstack/oslo.messaging/+/800621
[2] https://review.opendev.org/c/openstack/oslo.messaging/+/747395

Related-Bug: #1934937
Closes-Bug: #1961402

Change-Id: I85f5b9d1b5d15ad61a9fcd6e25925b7eeb8bf6e7
2022-08-16 14:14:29 +00:00
Guillaume Espanel
43f2224aac Remove logging from ProducerConnection._produce_message
In impl_kafka, _produce_message is run in a tpool.execute
context but it was also calling logging functions.
This could cause subsequent calls to logging functions to
deadlock.

This patch moves the logging calls out of the tpool.execute scope.

Change-Id: I81167eea0a6b1a43a88baa3bc383af684f4b1345
Closes-bug: #1981093
2022-08-03 17:35:16 +02:00
Zuul
4186386748 Merge "Add quorum queue control configurations" 2022-06-13 17:14:16 +00:00
Zuul
64888bd05a Merge "Add a new option to enforce the OpenSSL FIPS mode" 2022-04-26 14:15:36 +00:00
hamza alqtaishat
8932ad237b Add quorum queue control configurations
the quorum queue type add features that did not exist before or not
handled in rabbitmq the following link shows some of them
https://blog.rabbitmq.com/posts/2020/04/rabbitmq-gets-an-ha-upgrade/

the options below control the quorum queue and ensure the stability of
the quorum system
x-max-in-memory-length
x-max-in-memory-bytes
x-delivery-limit

which control the memory usage and handle message poisoning

Closes-Bug: #1962348
Change-Id: I570227d6102681f4f9d8813ed0d7693a1160c21d
2022-04-06 19:46:40 +00:00
f1d691b9f3 Update master for stable/yoga
Add file to the reno documentation build to show release notes for
stable/yoga.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/yoga.

Sem-Ver: feature
Change-Id: I3d2b041769c5c14a7d391c223dc499218a937e76
2022-03-04 17:18:28 +00:00
Zuul
2d090b5d6b Merge "Adding support for rabbitmq quorum queues" 2022-02-08 14:57:32 +00:00
Hervé Beraud
7e8acbf870 Adding support for rabbitmq quorum queues
https://www.rabbitmq.com/quorum-queues.html

The quorum queue is a modern queue type for RabbitMQ implementing a
durable, replicated FIFO queue based on the Raft consensus algorithm. It
is available as of RabbitMQ 3.8.0.

the quorum queues can not be set by policy so this should be done when
declaring the queue.

To declare a quorum queue set the x-queue-type queue argument to quorum
(the default is classic). This argument must be provided by a client at
queue declaration time; it cannot be set or changed using a policy. This
is because policy definition or applicable policy can be changed
dynamically but queue type cannot. It must be specified at the time of
declaration.

its good for the oslo messaging to add support for that type of queue
that have multiple advantaged over mirroring.

If quorum queues are sets mirrored queues will be ignored.

Closes-Bug: #1942933
Change-Id: Id573e04c287e034e50626daf6e18a34735d45251
2022-02-05 07:12:49 +00:00
Balazs Gibizer
7b3968d9b0 [rabbit] use retry parameters during notification sending
The rabbit backend now applies the [oslo_messaging_notifications]retry,
[oslo_messaging_rabbit]rabbit_retry_interval, rabbit_retry_backoff and
rabbit_interval_max configuration parameters when tries to establish the
connection to the message bus during notification sending.

This patch also clarifies the differences between the behavior
of the kafka and the rabbit drivers in this regard.

Closes-Bug: #1917645
Change-Id: Id4ccafc95314c86ae918336e42cca64a6acd4d94
2022-01-12 12:22:55 +01:00
Hervé Beraud
1fd461647f Force creating non durable control exchange when a precondition failed
Precondition failed exception related to durable exchange
config may be triggered when a control exchange is shared
between services and when services try to create it with
configs that differ from each others. RabbitMQ will reject
the services that try to create it with a configuration
that differ from the one used first.

This kind of exception is not managed for now and services
can fails without handling this kind of issue.

These changes catch this kind exception to analyze if they
related to durable config. In this case we try to re-declare
the failing exchange/queue as non durable.

This problem can be easily reproduced by running a local RabbitMQ
server.

By setting the config below (sample.conf):

```
[DEFAULT]
transport_url = rabbit://localhost/
[OSLO_MESSAGING_RABBIT]
amqp_durable_queues = true
```

And by running our simulator twice:

```
$ tox -e venv --  python tools/simulator.py -d rpc-server -w 40
$ tox -e venv --  python tools/simulator.py --config-file ./sample.conf -d rpc-server -w 40
```

The first one will create a default non durable control exchange.
The second one will create the same default control exchange but as
durable.

Closes-Bug: #1953351
Change-Id: I27625b468c428cde6609730c8ab429c2c112d010
2021-12-15 13:56:12 +01:00
Hervé Beraud
384738a92d Add a new option to enforce the OpenSSL FIPS mode
This option ``ssl_enforce_fips_mode`` allow us to enforce the FIPS mode
if supported by the version of python in use.

https://en.wikipedia.org/wiki/Federal_Information_Processing_Standards

Change-Id: I50c7de71bfd38137eb83d23e910298946507ce9f
2021-11-08 15:05:30 +01:00
Zuul
feb72de7b8 Merge "Remove deprecation of heartbeat_in_pthread" 2021-10-21 12:33:29 +00:00
Hervé Beraud
d24edef117 Remove deprecation of heartbeat_in_pthread
In some circumstances services can be executed outside of mod_wsgi and
in a monkey patched environment. In this context we need to leave the
possibility to users to execute the heartbeat in a green thread.

The heartbeat_in_pthread was tagged as depreacted few months and planned
for a future removal. These changes drop this deprecation to allow to
enable green threads if needed.

Closes-Bug: #1934937
Change-Id: Iee2e5a6f7d71acba70bbc857f0bd7d83e32a7b8c
2021-10-14 15:20:42 +02:00
2a052499dc Update master for stable/xena
Add file to the reno documentation build to show release notes for
stable/xena.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/xena.

Sem-Ver: feature
Change-Id: Ia40ac2ccee4fe230605f3183b0b432b0e31bff04
2021-09-10 14:35:07 +00:00
Ching Kuo
bdbb6d62ee Add Support For oslo.metrics
This commit added support to send rpc metrics to oslo.metrics.

Changes includes:
- Adding client wrapper for oslo.metrics to process metrics information
  and send to oslo.metrics socket
- Modify rpc client to send metric when certain rpc events happens

For more information on oslo.metrics
https://opendev.org/openstack/oslo.metrics

Change-Id: Idf8cc0e52ced1f697ac4048655eff4c956fd5c79
2021-06-08 22:22:37 +08:00
Pierre Riteau
a5ad998b12 Fix formatting of release list
Change-Id: I1f859a964de7f96e5decdec0977faa355b6a2a60
2021-04-16 11:29:44 +01:00
bddf53109e Update master for stable/wallaby
Add file to the reno documentation build to show release notes for
stable/wallaby.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/wallaby.

Sem-Ver: feature
Change-Id: I2deb189cfdd7420e69060cd89c45b03b43c211af
2021-03-18 11:20:30 +00:00
Zuul
11a49a0a3e Merge "Correctly handle missing RabbitMQ queues" 2021-02-09 18:01:29 +00:00
Hervé Beraud
4937949dff Correctly handle missing RabbitMQ queues
Currently, setting the '[oslo_messaging] direct_mandatory_flag' config
option to 'True' (the default) will result in a 'MessageUndeliverable'
exception being raised when sending a reply if a RabbitMQ queue is
missing [1]. It was the responsibility of the application to handle
this exception, however, many applications are not doing so. This has
resulted in a number of bug reports.

Start handling this error condition, using a retry loop to attempt to
resend the message and work around any temporary glitches. Since
attempting to send a reply will will no longer raise an exception,
there is little benefit in retaining the '[oslo_messaging]
direct_mandatory_flag' config option: users setting this to False will
simply not benefit from the retry logic and improved logging added
here. This option is already deprecated though and will be fully
removed in a future release.

[1] https://www.rabbitmq.com/channels.html

Change-Id: Id5cddbefbe24ef100f1cc522f44430df77d217cb
Closes-Bug: #1905965
2021-02-04 09:47:08 +00:00
Zuul
11e13abf9b Merge "remove unicode from code" 2021-02-03 15:04:07 +00:00
Hervé Beraud
2b89d97888 Deprecate the mandatory flag
It will not be possible to deactivate this functionality anymore.

Change-Id: I1cbafff03349f7da9224de46285707fbf2a81a68
2021-02-01 10:31:16 +01:00
xuanyandong
642367cdfd remove unicode from code
Change-Id: Ib2b816728307166450a4cea2ccdb3c4b550a0713
2021-01-03 16:11:46 +08:00
Zuul
2cc35f6b1a Merge "Run rabbitmq heartbeat in python thread by default" 2020-10-15 17:53:23 +00:00
633383babb Update master for stable/victoria
Add file to the reno documentation build to show release notes for
stable/victoria.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/victoria.

Change-Id: I0e5758da8c95474b1ddda5419f80ae94da71d147
Sem-Ver: feature
2020-09-11 20:58:48 +00:00
Zuul
32943fd5a5 Merge "Add a ping endpoint to RPC dispatcher" 2020-08-28 19:55:30 +00:00
Hervé Beraud
add5ab4ece Run rabbitmq heartbeat in python thread by default
Removing the experimental nature of this feature and activating it by default.
Now to run heartbeat in a green thread users should set this option to False.

Also deprecating this option to prepare future removal and force to always run
heartbeat in a python thread whatever the context.

Change-Id: I32a6c4ad0a456282ec02b5e4c8309489b3c17553
2020-08-27 12:22:41 +02:00
Arnaud Morin
82492442f3 Add a ping endpoint to RPC dispatcher
The purpose of this patch is to add an endpoint directly in RPC
dispatcher, so this endpoint will always be available, in a cross
project manner, without the need for projects to manage it by themself.

This endpoint stay disabled by default, so this change is harmless
without a specific configuration option.

To enable this ping endpoint, an operator will just have to add a new
parameter in the [DEFAULT] section, alongside with rpc_response_timeout
[DEFAULT]
rpc_ping_enabled=true  # default is false

The purpose of this new endpoint is to help operators do a RPC call (a
ping) toward a specific RPC callback (e.g. a nova-compute, or a
neutron-agent).
This is helping a lot for monitoring agents (for example, if agents are
deployed in a kubernetes pod).

The endpoint is named oslo_rpc_server_ping.

Change-Id: I51cf67e060f240e6eb82260e70a057fe599f9063
Signed-off-by: Arnaud Morin <arnaud.morin@corp.ovh.com>
2020-08-18 15:09:29 +02:00
shenjiatong
196fa877a9 Cancel consumer if queue down
Previously, we have switched to use default exchanges
to avoid excessive amounts of exchange not found messages.
But it does not actually solve the problem because
reply_* queue is already gone and agent will not receive callbacks.

after some debugging, I found under some circumstances
seems rabbitmq consumer does not receive basic cancel
signal when queue is already gone. This might due to
rabbitmq try to restart consumer when queue is down
(for example when split brain). In such cases,
it might be better to fail early.

by reading the code, seems like x-cancel-on-ha-failover
is not dedicated to mirror queues only, https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_channel.erl#L1894,
https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_channel.erl#L1926.

By failing early, in my own test setup,
I could solve a certain case of exchange not found problem.

Change-Id: I2ae53340783e4044dab58035bc0992dc08145b53
Related-bug: #1789177
2020-07-31 06:05:16 +08:00
Andreas Jaeger
9cc3f52ac3 Switch to newer openstackdocstheme and reno versions
Switch to openstackdocstheme 2.2.0 and reno 3.1.0 versions. Using
these versions will allow especially:
* Linking from HTML to PDF document
* Allow parallel building of documents
* Fix some rendering

Update Sphinx version as well.

Remove docs requirements from lower-constraints, they are not needed
during install or test but only for docs building.

openstackdocstheme renames some variables, so follow the renames
before the next release removes them. A couple of variables are also
not needed anymore, remove them.

Depends-On: https://review.opendev.org/728938
Change-Id: I70c7edf8b95cde890e6263195be1de6bb826e700
2020-05-18 20:55:51 +02:00
OpenStack Proposal Bot
d011c7f262 Imported Translations from Zanata
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: I792154ba794512046b265718c2f01db624413f30
2020-04-16 08:13:17 +00:00
434ec937f9 Update master for stable/ussuri
Add file to the reno documentation build to show release notes for
stable/ussuri.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/ussuri.

Change-Id: I74595d2712ed45ce5b9336e17152f50caab838a0
Sem-Ver: feature
2020-04-14 10:15:30 +00:00