Allow Rook ceph auto-estimate pg's per pool

This change aims to allow the rook ceph auto-estimation of pg's per
pool.

Placement groups (PGs) are subsets of each logical Ceph pool.
Placement groups perform the function of placing objects
(as a group)into OSDs.Ceph manages data internally at
placement-group granularity: this scales better than would managing individual RADOS objects.
It fixes an issue when using a simplex environment with only 1 osd
it raises a warning alarm where there is too many pg's per pool,
being 250 the limit and when applied there's more than this limit
recommended.

Rook ceph is using the reef version which doesn't require to specify
the pg num at the pool creation[1] while in nautilus version the pg num
specification is a must at the pool creation[2].

Thinking about the different approaches between those versions,
a retro compatibility is also being implemented to preserve the pool
creation in both versions, the self estimate function is preserved.

comparison:

simplex rook ceph without the patch:
POOL                   PG_NUM
.mgr                   32
kube-cephfs-metadata   16
kube-rbd               32
kube-cephfs-data       32
images                 64
cinder.backups         64
cinder-volumes         64

simplex rook ceph with the patch:
POOL                   PG_NUM
kube-rbd               32
.mgr                    1
kube-cephfs-metadata   16
kube-cephfs-data       32
images                 32
cinder.backups         32
cinder-volumes         32

simplex host ceph before and after the patch:
dumped pgs_brief
kube-rbd (1): 64 PGs
dumped pgs_brief
kube-cephfs-data (2): 64 PGs
dumped pgs_brief
kube-cephfs-metadata (3): 64 PGs
dumped pgs_brief
images (4): 64 PGs
dumped pgs_brief
cinder.backups (5): 64 PGs
dumped pgs_brief
cinder-volumes (6): 64 PGs

Standard rook ceph without the patch:
POOL                   PG_NUM
kube-rbd               32
.mgr                    1
kube-cephfs-metadata   16
kube-cephfs-data       32
images                 32
cinder.backups         32
cinder-volumes         32

standard rook ceph with the patch:
POOL                   PG_NUM
kube-rbd               32
.mgr                    1
kube-cephfs-metadata   16
kube-cephfs-data       32
images                 32
cinder.backups         32
cinder-volumes         32

Test plan:
simplex:
  rook-ceph:
    PASS - build openstack
    PASS - apply openstack
    PASS - create vm's
    PASS - ping between vm's
    PASS - volume creation/backup creation
    PASS - validate alarm for total pg's
  host-ceph:
    PASS - build openstack
    PASS - apply openstack
    PASS - create vm's
    PASS - ping between vm's
    PASS - volume creation/backup creation
    PASS - validate alarm for total pg's
standard:
  rook-ceph:
    PASS - build openstack
    PASS - apply openstack
    PASS - create vm's
    PASS - ping between vm's
    PASS - volume creation/backup creation
    PASS - validate alarm for total pg's
  host-ceph:
    PASS - build openstack
    PASS - apply openstack
    PASS - create vm's
    PASS - ping between vm's
    PASS - volume creation/backup creation
    PASS - validate alarm for total pg's
miscellaneous:
    PASS - change pool pg_num through user-overrides

References:
[1] - https://docs.ceph.com/en/reef/rados/operations/placement-groups/#preselecting-pg-num
[2] - https://docs.ceph.com/en/nautilus/rados/operations/placement-groups/#a-preselection-of-pg-num

Closes-Bug: 2122620

Change-Id: I018f7302328c3789864d7f7875fe7d2b4b31f7ee
Signed-off-by: kgoncalv <kayo.goncalvesdacosta@windriver.com>
This commit is contained in:
kgoncalv
2025-09-11 10:00:18 -03:00
committed by Kayo Lourenço Gonçalves da Costa
parent 84a3f1643e
commit 75278a856c
6 changed files with 112 additions and 10 deletions

View File

@@ -0,0 +1,94 @@
From 534797f4235187f5f3d0a4a558ba487379855464 Mon Sep 17 00:00:00 2001
From: kgoncalv <kayo.goncalvesdacosta@windriver.com>
Date: Fri, 12 Sep 2025 11:12:11 -0300
Subject: [PATCH] Allow rook ceph auto estimation
This patch aims to allow rook ceph to use his feature to auto estimate
the pg_num per pool, to avoid a health warning issue on a simplex with
a single osd where it was too many pg per osd.
This issue only occurs on a simplex with only one osd.
To allow the auto estimation feature of rook ceph the argument that
add the minimum amount of pg must be removed, so ceph can estimate
automatically.
This patch also allows the retrocompatibility for ceph versions that
do not support this auto estimation.
Signed-off-by: kgoncalv <kayo.goncalvesdacosta@windriver.com>
---
cinder/templates/bin/_backup-storage-init.sh.tpl | 6 +++++-
cinder/templates/bin/_storage-init.sh.tpl | 6 +++++-
glance/templates/bin/_storage-init.sh.tpl | 6 +++++-
nova/templates/bin/_storage-init.sh.tpl | 6 +++++-
4 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/cinder/templates/bin/_backup-storage-init.sh.tpl b/cinder/templates/bin/_backup-storage-init.sh.tpl
index 1601172a1..a9752e854 100644
--- a/cinder/templates/bin/_backup-storage-init.sh.tpl
+++ b/cinder/templates/bin/_backup-storage-init.sh.tpl
@@ -31,7 +31,11 @@ if [[ $STORAGE_BACKEND =~ 'cinder.backup.drivers.swift' ]] || \
elif [[ $STORAGE_BACKEND =~ 'cinder.backup.drivers.ceph' ]]; then
ceph -s
function ensure_pool () {
- ceph osd pool stats $1 || ceph osd pool create $1 $2
+ if [ "$2" -eq 0 ]; then
+ ceph osd pool stats $1 || ceph osd pool create $1
+ else
+ ceph osd pool stats $1 || ceph osd pool create $1 $2
+ fi
# As of the Luminous release, it is mandatory to enable applications on pools.
# To find out if the release is greater than (or equal to) Luminous, just check
diff --git a/cinder/templates/bin/_storage-init.sh.tpl b/cinder/templates/bin/_storage-init.sh.tpl
index 4d1c28e4e..53b0b071c 100644
--- a/cinder/templates/bin/_storage-init.sh.tpl
+++ b/cinder/templates/bin/_storage-init.sh.tpl
@@ -28,7 +28,11 @@ set -ex
if [ "x$STORAGE_BACKEND" == "xcinder.volume.drivers.rbd.RBDDriver" ]; then
ceph -s
function ensure_pool () {
- ceph osd pool stats $1 || ceph osd pool create $1 $2
+ if [ "$2" -eq 0 ]; then
+ ceph osd pool stats $1 || ceph osd pool create $1
+ else
+ ceph osd pool stats $1 || ceph osd pool create $1 $2
+ fi
# As of the Luminous release, it is mandatory to enable applications on pools.
# To find out if the release is greater than (or equal to) Luminous, just check
diff --git a/glance/templates/bin/_storage-init.sh.tpl b/glance/templates/bin/_storage-init.sh.tpl
index 241c62e96..f780ff9e8 100644
--- a/glance/templates/bin/_storage-init.sh.tpl
+++ b/glance/templates/bin/_storage-init.sh.tpl
@@ -44,7 +44,11 @@ elif [ "x$STORAGE_BACKEND" == "xswift" ]; then
elif [ "x$STORAGE_BACKEND" == "xrbd" ]; then
ceph -s
function ensure_pool () {
- ceph osd pool stats "$1" || ceph osd pool create "$1" "$2"
+ if [ "$2" -eq 0 ]; then
+ ceph osd pool stats $1 || ceph osd pool create $1
+ else
+ ceph osd pool stats $1 || ceph osd pool create $1 $2
+ fi
# As of the Luminous release, it is mandatory to enable applications on pools.
# To find out if the release is greater than (or equal to) Luminous, just check
diff --git a/nova/templates/bin/_storage-init.sh.tpl b/nova/templates/bin/_storage-init.sh.tpl
index 70c71a9f4..ed56e27b9 100644
--- a/nova/templates/bin/_storage-init.sh.tpl
+++ b/nova/templates/bin/_storage-init.sh.tpl
@@ -28,7 +28,11 @@ set -ex
if [ "x$STORAGE_BACKEND" == "xrbd" ]; then
ceph -s
function ensure_pool () {
- ceph osd pool stats $1 || ceph osd pool create $1 $2
+ if [ "$2" -eq 0 ]; then
+ ceph osd pool stats $1 || ceph osd pool create $1
+ else
+ ceph osd pool stats $1 || ceph osd pool create $1 $2
+ fi
if [[ $(ceph mgr versions | awk '/version/{print $3}' | cut -d. -f1) -ge 12 ]]; then
ceph osd pool application enable $1 $3
fi
--
2.34.1

View File

@@ -24,4 +24,5 @@
0024-Enable-ceph-pool-creation-for-AIO-systems.patch
0025-Add-IPv6-compatibility-to-neutron-openvswitch-agent.patch
0026-Copy-host-UUID-into-Nova-s-config-dir.patch
0027-Add-retry-to-hostname-reading-by-neutron-agents.patch
0027-Add-retry-to-hostname-reading-by-neutron-agents.patch
0028-Allow-rook-ceph-auto-estimation.patch

View File

@@ -129,6 +129,12 @@ CEPH_POOL_BACKUP_NAME = 'backup'
CEPH_POOL_BACKUP_APP_NAME = 'cinder-backup'
CEPH_POOL_BACKUP_CHUNK_SIZE = 256
# Rook ceph constants
ROOK_CEPH_POOL_CINDER_VOLUME_CHUNK_SIZE = 0
ROOK_CEPH_POOL_CINDER_BACKUP_CHUNK_SIZE = 0
ROOK_CEPH_POOL_GLANCE_CHUNK_SIZE = 0
ROOK_CEPH_POOL_NOVA_RBD_CHUNK_SIZE = 0
# Cinder version used as the default value when getting service name and type
CINDER_CURRENT_VERSION = 'v3'

View File

@@ -432,21 +432,16 @@ class CinderHelm(openstack.OpenstackBaseHelm):
replication, _ = storage_backend_conf\
.StorageBackendConfig\
.get_ceph_pool_replication(self.dbapi, ceph_backend=rook_backend)
chunk_size = self._estimate_ceph_pool_pg_num(self.dbapi.istor_get_all())
pools = {
f'{app_constants.CEPH_POOL_VOLUMES_NAME}': {
'app_name': app_constants.CEPH_POOL_VOLUMES_APP_NAME,
'chunk_size': min(chunk_size,
app_constants.CEPH_POOL_VOLUMES_CHUNK_SIZE),
'chunk_size': app_constants.ROOK_CEPH_POOL_CINDER_VOLUME_CHUNK_SIZE,
'crush_rule': app_constants.CEPH_ROOK_POLL_CRUSH_RULE,
'replication': replication,
},
f'{app_constants.CEPH_POOL_BACKUP_NAME}': {
'app_name': app_constants.CEPH_POOL_BACKUP_APP_NAME,
'chunk_size': min(chunk_size,
app_constants.CEPH_POOL_BACKUP_CHUNK_SIZE),
'chunk_size': app_constants.ROOK_CEPH_POOL_CINDER_BACKUP_CHUNK_SIZE,
'crush_rule': app_constants.CEPH_ROOK_POLL_CRUSH_RULE,
'replication': replication,
},

View File

@@ -153,7 +153,10 @@ class GlanceHelm(openstack.OpenstackBaseHelm):
else:
rule_name = app_constants.CEPH_ROOK_POLL_CRUSH_RULE
chunk_size = self._estimate_ceph_pool_pg_num(self.dbapi.istor_get_all())
if self._rook_ceph:
chunk_size = app_constants.ROOK_CEPH_POOL_GLANCE_CHUNK_SIZE
else:
chunk_size = self._estimate_ceph_pool_pg_num(self.dbapi.istor_get_all())
conf = {
'glance': {

View File

@@ -740,7 +740,10 @@ class NovaHelm(openstack.OpenstackBaseHelm):
rook_ceph_rule_name = app_constants.CEPH_ROOK_POLL_CRUSH_RULE
rule_name = rook_ceph_rule_name if self._rook_ceph else ceph_rule_name
chunk_size = self._estimate_ceph_pool_pg_num(self.dbapi.istor_get_all())
if self._rook_ceph:
chunk_size = app_constants.ROOK_CEPH_POOL_NOVA_RBD_CHUNK_SIZE
else:
chunk_size = self._estimate_ceph_pool_pg_num(self.dbapi.istor_get_all())
# Form the dictionary with the info for the ephemeral pool.
# If needed, multiple pools can be specified.