We already expose functions to handle the instantiation
of classes such as RPCServer and RPCTransport but the
same was never done for RPCClient so the API is
inconsistent in its enforcement.
This adds a get_rpc_client function that should be used
instead of instatiating the RPCClient class directly to
be more consistent.
This also allows to handle more logic inside the function
in the future such as if implementations for an async client
is implemented, as investigation in [1] has shown.
[1] https://review.opendev.org/c/openstack/oslo.messaging/+/858936
Change-Id: Ia4d1f0497b9e2728bde02f4ff05fdc175ddffe66
A recent oslo.messaging patch [1], not yet merged, who aim to update the
test runtime for antelope lead us to the following error:
```
qdrouterd: Python: ModuleNotFoundError: No module named 'qpid_dispatch'
```
Neither debian nor ubuntu in the latest releases have any binary
built for the qpid backend, not even 3rd party. Only qpid proton,
the client lib, is available.
To solve this issue, these changes propose to deprecate the AMQP1 driver
who is the one based on qpid and proton, and propose to remove the
related functional tests.
The AMQP1 driver doesn't seems to be widely used.
[1] https://review.opendev.org/c/openstack/oslo.messaging/+/856643
Closes-Bug: 1992587
Change-Id: Id2ca9cd9ee8b8dbdd14dcd00ebd8188d20ea18dc
Add file to the reno documentation build to show release notes for
stable/zed.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/zed.
Sem-Ver: feature
Change-Id: Ic1020b39172981abcc9fc3d66fc6ec58f440a456
As was reported in the related bug some time ago, setting that
option to True for nova-compute can break it as it's non-wsgi service.
We also noticed same problems with randomly stucked non-wsgi services
like e.g. neutron agents and probably the same issue can happen with
any other non-wsgi service.
To avoid that this patch changes default value of that config option
to be False.
Together with [1] it effectively reverts change done in [2] some time
ago.
[1] https://review.opendev.org/c/openstack/oslo.messaging/+/800621
[2] https://review.opendev.org/c/openstack/oslo.messaging/+/747395
Related-Bug: #1934937
Closes-Bug: #1961402
Change-Id: I85f5b9d1b5d15ad61a9fcd6e25925b7eeb8bf6e7
In impl_kafka, _produce_message is run in a tpool.execute
context but it was also calling logging functions.
This could cause subsequent calls to logging functions to
deadlock.
This patch moves the logging calls out of the tpool.execute scope.
Change-Id: I81167eea0a6b1a43a88baa3bc383af684f4b1345
Closes-bug: #1981093
this change updates the max version of hacking
to 4.1.0 to allow pre-commit to work with the
flake 3.8.3 release and correct one new error that was
raised as a result.
Change-Id: I3a0242208f411b430db0e7429e2c773f45b3d301
In Zed cycle testing runtime, we are targetting to drop the
python 3.6/3.7 support, project started adding python 3.8 as minimum,
example nova:
- 56b5aed08c/setup.cfg (L13)
Change-Id: Id23d3845db716d26175d71280dbedf93736d19de
the quorum queue type add features that did not exist before or not
handled in rabbitmq the following link shows some of them
https://blog.rabbitmq.com/posts/2020/04/rabbitmq-gets-an-ha-upgrade/
the options below control the quorum queue and ensure the stability of
the quorum system
x-max-in-memory-length
x-max-in-memory-bytes
x-delivery-limit
which control the memory usage and handle message poisoning
Closes-Bug: #1962348
Change-Id: I570227d6102681f4f9d8813ed0d7693a1160c21d
kombu 5.2.4 fixed an off-by-one issue that meant we were attempting
retries more than once [1]. We need to handle this to unblock the gate.
This was discovered by examining the call stack and comparing this with
recent changes in openstack/requirements.
[1] 5bed2a8f98
Change-Id: I476e3c573523d5991c56b31ad4df1172196aa7f1
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Add file to the reno documentation build to show release notes for
stable/yoga.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/yoga.
Sem-Ver: feature
Change-Id: I3d2b041769c5c14a7d391c223dc499218a937e76
https://www.rabbitmq.com/quorum-queues.html
The quorum queue is a modern queue type for RabbitMQ implementing a
durable, replicated FIFO queue based on the Raft consensus algorithm. It
is available as of RabbitMQ 3.8.0.
the quorum queues can not be set by policy so this should be done when
declaring the queue.
To declare a quorum queue set the x-queue-type queue argument to quorum
(the default is classic). This argument must be provided by a client at
queue declaration time; it cannot be set or changed using a policy. This
is because policy definition or applicable policy can be changed
dynamically but queue type cannot. It must be specified at the time of
declaration.
its good for the oslo messaging to add support for that type of queue
that have multiple advantaged over mirroring.
If quorum queues are sets mirrored queues will be ignored.
Closes-Bug: #1942933
Change-Id: Id573e04c287e034e50626daf6e18a34735d45251
The rabbit backend now applies the [oslo_messaging_notifications]retry,
[oslo_messaging_rabbit]rabbit_retry_interval, rabbit_retry_backoff and
rabbit_interval_max configuration parameters when tries to establish the
connection to the message bus during notification sending.
This patch also clarifies the differences between the behavior
of the kafka and the rabbit drivers in this regard.
Closes-Bug: #1917645
Change-Id: Id4ccafc95314c86ae918336e42cca64a6acd4d94
Precondition failed exception related to durable exchange
config may be triggered when a control exchange is shared
between services and when services try to create it with
configs that differ from each others. RabbitMQ will reject
the services that try to create it with a configuration
that differ from the one used first.
This kind of exception is not managed for now and services
can fails without handling this kind of issue.
These changes catch this kind exception to analyze if they
related to durable config. In this case we try to re-declare
the failing exchange/queue as non durable.
This problem can be easily reproduced by running a local RabbitMQ
server.
By setting the config below (sample.conf):
```
[DEFAULT]
transport_url = rabbit://localhost/
[OSLO_MESSAGING_RABBIT]
amqp_durable_queues = true
```
And by running our simulator twice:
```
$ tox -e venv -- python tools/simulator.py -d rpc-server -w 40
$ tox -e venv -- python tools/simulator.py --config-file ./sample.conf -d rpc-server -w 40
```
The first one will create a default non durable control exchange.
The second one will create the same default control exchange but as
durable.
Closes-Bug: #1953351
Change-Id: I27625b468c428cde6609730c8ab429c2c112d010
The [oslo_messaging_notification]retry parameter is not applied during
connecting to the message bus. But the documentation implies it should[1][2].
The two possible drivers, rabbit and kafka, behaves differently.
1) The rabbit driver will retry the connection forever, blocking the caller
process.
2) The kafka driver also ignores the retry configuration but the
notifier call returns immediately even if the notification is not
(cannot) be delivered.
This patch adds test cases to show the wrong behavior.
[1] https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html#oslo_messaging_notifications.retry
[2] feb72de7b8/oslo_messaging/notify/messaging.py (L31-L36)
Related-Bug: #1917645
Change-Id: Id8557050157aecd3abd75c9114d3fcaecdfc5dc9
Currently this is how reconnect works:
- pyngus detects failure and invokes callback
Controller.connection_failed() which in turn calls
Controller._handle_connection_loss()
- The first thing that _handle_connection_loss does is to set
self.addresser to None (important later)
- Then it defers _do_reconnect after a delay (normally 1 second)
- (1 second passes)
- _do_reconnect calls _hard_reset which resets the controller state
However, there is a race here. This can happen:
- The above, up until it defers and waits for 1 second
- Controller.send() is invoked on a task
- A new Sender is created, and critically because self.reply_link
still exists and is active, we call sender.attach and pass in
self.addresser. Remember _handle_connection_loss sets
self.addresser to None.
- Eventually Sender.attach throws an AttributeError because it
attempts to call addresser.resolve() but addresser is None
The reason this happens is because although the connection is dead,
the controller state is still half-alive because _hard_reset hasn't
been called yet since it's deferred one second in _do_reconnect.
The fix here is to move _hard_reset out of _do_reconnect and directly
into _handle_connection_loss. The eventloop is woken up immediately
to process _hard_reset but _do_reconnect is still deferred as before
so as to retain the desired reconnect backoff behavior.
Closes-Bug: #1941652
Change-Id: Ife62a7d76022908f0dc6a77f1ad607cb2fbd3e8f
In some circumstances services can be executed outside of mod_wsgi and
in a monkey patched environment. In this context we need to leave the
possibility to users to execute the heartbeat in a green thread.
The heartbeat_in_pthread was tagged as depreacted few months and planned
for a future removal. These changes drop this deprecation to allow to
enable green threads if needed.
Closes-Bug: #1934937
Change-Id: Iee2e5a6f7d71acba70bbc857f0bd7d83e32a7b8c
Add file to the reno documentation build to show release notes for
stable/xena.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/xena.
Sem-Ver: feature
Change-Id: Ia40ac2ccee4fe230605f3183b0b432b0e31bff04
Return back the message id cache feature to RPC listener, it was
removed while refactoring in I708c3d6676b974d8daac6817c15f596cdf35817b
See attached bug for more info.
We should not raise DuplicateMessageError to avoid rejecting the
previously ACK'ed message.
Closes-Bug: #1935883
Change-Id: Ie237e9e3fdc3fc27b3deb18b94751cdc3afd190e
Each _SocketConnection object is unique per-peer. For example, the
properties attribute may contain keys such as 'x-ssl-peer-name'.
Reusing the existing _socket_connection during failover will cause the
TLS handshake to fail since the peer name will not match. There is
potential for other similar-yet-unexplored bad things to happen as
well.
Instead, reconnect by waking up the eventloop via the _do_reconnect
method, which reconstructs the connection properties to reflect the
new (failed-over-to) host and ultimately crates a new
_SocketConnection (or re-uses a *valid* old one) in
eventloop.Thread.connect().
Closes-Bug: #1938945
Change-Id: I0c8dc447f4dc8d0d08c312a1f3e6fa1745fb69fd
This breaks out the generation of brokers and transport_url into
separate methods. These methods are used in the next patch in this
series, where TestSSL is updated to inherit from TestFailover, and
TestSSL overrides the _gen_brokers and _gen_transport_url methods to
supply the necessary SSL-aware options.
Change-Id: Ia2f977795abc2e81a996e299867e05d41057f33f
This reverts commit 8f5cfda6642ea7f75206d3183c2507e2e83c5693.
Reason for revert: This was supposed to be temporary to unblock the gate. Whatever broke SSL cert generation in the first place appears to be fixed because I can run SSL tests now.
Change-Id: I4f286cf3af0d578f472b84fe355c812910c7a121
We should properly limit the maximum timeout with a 'min' to avoid
long delays before message processing. Such delays may happen if
the connection to a RabbitMQ server is re-established at the same
time when the message arrives (see attached bug for more info).
Moreover, this change is in line with the original intent to
actually have an upper limit on maximum possible timeout (see
comments in code and in the original review).
Closes-Bug: #1935864
Change-Id: Iebc8a96e868d938a5d250bf9d66d20746c63d3d5
This commit added support to send rpc metrics to oslo.metrics.
Changes includes:
- Adding client wrapper for oslo.metrics to process metrics information
and send to oslo.metrics socket
- Modify rpc client to send metric when certain rpc events happens
For more information on oslo.metrics
https://opendev.org/openstack/oslo.metrics
Change-Id: Idf8cc0e52ced1f697ac4048655eff4c956fd5c79