Add auto eviction for rook-ceph app

When there is a buggy cephfs client, the ceph health detail output
will show a message like the following:

HEALTH_WARN 1 clients failing to respond to capability release; \
1 MDSs report slow requests

When this happens, the cephfs client cannot read or write to
the volume. To restore the communication, it is necessary to force a
client reconnection.

To force this reconnection, the client must be evicted by Ceph.
The mds_cap_revoke_eviction_timeout parameter is used to set a timeout
for a response made by the client and mds_session_blocklist_on_evict
is used to not add the client to the blacklist when it is detected
that the client has been evicted, to allow it to reconnect again
after eviction.

Test Plan:
- PASS: Starts a pod reading and writing to a cephfs pvc in an
        infinite loop
- PASS: Verifies that the mds client will automatically evict when the
        message is displayed in the 'ceph health detail' command

Closes-Bug: 2095024

Change-Id: I0b71d9b01d114d2fc27625ae6ac4ae5055f2d9db
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
This commit is contained in:
Gustavo Ornaghi Antunes 2025-01-15 09:19:52 -03:00
parent 3d6aacdb9b
commit 8aaecf0af2

View File

@ -1,5 +1,5 @@
#
# Copyright (c) 2024 Wind River Systems, Inc.
# Copyright (c) 2024-2025 Wind River Systems, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
@ -28,6 +28,10 @@ configOverride: |
operatorNamespace: rook-ceph
cephClusterSpec:
cephConfig:
mds:
mds_cap_revoke_eviction_timeout: "30"
mds_session_blocklist_on_evict: "false"
dataDirHostPath: /var/lib/ceph/data
cephVersion:
image: quay.io/ceph/ceph:v18.2.2