When performing the BnR procedure with the wipe_ceph_osds
flag and the rook-ceph backend configured, an error was
given when removing the app.
This happened because a restore in progress check in the
DB was done in the app's lifecycle and false was always
returned, as the insert had not yet been performed before
this task.
To fix this, the database query has been replaced by
checking the '/etc/platform/.restore_in_progress' flag.
Test Plan:
- PASS: Build rook-ceph app
- PASS: optimized AIO-SX B&R with wipe_ceph_osds flag
- PASS: legacy STD + DX B&R with wipe_ceph_osds flag
Partial-Bug: 2086473
Change-Id: Ica3befe51ff08a53eb1b33af12e96fa4358e6c0f
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
This change is necessary to fix rook-ceph app upload when
service_config is None and fix rook-ceph app update when
conductor_obj is None
This issue occurs because these values must not be None to get data
from them.
The first issue related to uploading the rook-ceph application occurs
in the subcloud environment because the region_config in the system is
set to true and if the service_config is None the application cannot be
uploaded. So the issue is solved by adding a new check
in rook_ceph_provisioner.py
The second issue related to rook-ceph app update occurs because a new
check about --force argument was added, it uses conductor_obj to check
the metadata and in the update operation check the lifecycle does not
have access to conductor_obj. So, the issue is solved by adding new
checks in lifecycle_rook_ceph.py
Test Plan:
- PASS: Upload/Apply Rook-ceph application
- PASS: Update Rook-ceph application
Closes-Bug: 2086182
Change-Id: I4c2e51d3a79de1d5a37461f9f3c0d1da4bb244a5
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
Marks ecblock's storage class general as the default storage class.
Test Plan:
- PASS: Test installing Rook Ceph with ecblock present on the
service list and check the default storage class
- PASS: Test installing Rook Ceph with block present on the
service list and check problems on the installation
Closes-bug: 2085652
Change-Id: Ib2bec10988294161891bd4ce8e2a6639486284b1
Signed-off-by: Caio Correa <caio.correa@windriver.com>
Fixes propagation of min_replication from storage-backend to every pool
of every service.
Fixes a problem that prevents reapply when a replication parameter
is changed while using ecblock service.
Adds support for storage-backend-modify reapply trigger
Fixes RBD storageClass and pools from values.yaml being present when
using ecblock.
Test Plan:
- PASS: Test changing replication and min_replication on SX, DX
and STD environments and check for correct propagation
on pools
- PASS: Test storage-backend-modify reapply trigger
- PASS: Test reapply when a replication parameter is changed using
ecblock service
- PASS: Check for unwanted pools or storageClasses in all variations
of services
Story: 2011066
Task: 51217
Change-Id: I174e44a71f5ed515feb32c7e5909dfedac85e684
Signed-off-by: Caio Correa <caio.correa@windriver.com>
These fixes prevent the rook-ceph application from being removed without
any force arguments and remain auto-apply feature working.
This issue occurred because functions were not returning and super was
being called after rook-ceph semantic checks and when executing the
auto-apply hook, super was raising an exception which prevented
auto-apply from working.
Test Plan:
- PASS: Check if an exception message when trying to remove the
rook-ceph app with no force argument
- PASS: Check if the rook-ceph application removal normally occurs
when using the force argument
- PASS: Check if the rook-ceph is auto-applying when all requirements
match
Closes-Bug: 2084681
Change-Id: I7bcb75f08b376d7c8a38dbc1c7df52e061fefd03
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
This fix prevents the rook-ceph application from being removed with
no force argument.
The issue occurred because the pre_remove_semantic_checks method was
called with return, which stopped the code execution and prevented
the logic that blocks the removal within the super command from
being executed.
Test Plan:
- PASS: Check if an exception message when trying to remove the
rook-ceph app with no force argument
- PASS: Check if the rook-ceph application removal normally occurs
when using the force argument
Closes-Bug: 2084681
Change-Id: I4ad11044659eed659c06f540a557b5f56c60c46a
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
Fixes a bug that prevents rook ceph removal when the system
have non-Ceph offline hosts.
Test Plan:
- PASS: Test removal with STD 2c+2w with all variations of
deployment models and topology of monitors.
- PASS: Test removal with DX+ with all variations of deployment
models and topology of monitores
Closes-Bug: 2084681
Change-Id: If2bd3bdd1b4e7199aa5547e6936ec8ed4ef81d21
Signed-off-by: Caio Correa <caio.correa@windriver.com>
This change is to synchronize the OSDs deployed in the ceph cluster and
the OSDs in the inventory to use the same OSD IDs
Updated the osd in host-stor-list to use the same ID used by the same
OSD in the ceph cluster
Test Plan:
- PASS: Upload and apply the Rook-Ceph application and verify that the
osdid in host-stor-list has been updated using the ceph cluster
OSD IDs
- PASS: Reapply the Rook-Ceph application after adding more osds
and verify that the osdid in host-stor-list has been updated
using the ceph cluster OSD IDs
- PASS: Remove and apply the Rook-Ceph application and verify that the
osdid in host-stor-list has been updated using the ceph cluster
OSD IDs
- PASS: Check the script using a shellcheck
Note: All tests were provisioned in SX IPv6, STD IPv6, DX+ IPv4,
STD IPv4
Depends-On: https://review.opendev.org/c/starlingx/config/+/931988
Closes-bug: 2083332
Change-Id: I1d48f634dcaf1ca4ebd5db375c5bd9c3d36b3967
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
When analyzing the lifecycle, we observed an unusual way of
checking whether it is AIO-SX or not, using
'is_host_simplex_controller'. So to standardize and avoid
any problems, it was replaced by 'is_aio_simplex_system'.
Test Plan:
- PASS: Build app package
- PASS: STD fresh install
- PASS: Configure rook-ceph with the wrong amount
of osd and mon
- PASS: Check the alarms
- PASS: Configure the missing osds and mons and
reapply app
- PASS: Check if the alarms are gone
Closes-Bug: 2084202
Change-Id: I3fbc02f0973dce7b9318898b2f13775e3d1a7950
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
When a monitor is moved from one host to another, the
monitor name is being incremented and the monitor data
directory in /var/lib/ceph/data is not being deleted.
To resolve the monitor data directory issue, the
AgentAPI function "execute_command" was used to
execute the "rm -rf" command on the old host.
To resolve the monitor increment, the mon to be moved
was removed from the rook-ceph-mon-endpoints configmap
mapping. With this, the operator will not have host
information and will create it on the new host.
In case the mon is removed instead of moved, the configmap
is patched, removing the mon name from data. As a result,
the operator stops using that monitor and the monitor is
then removed from the cluster. So that the next monitor
comes with the same name, the 'maxMonId' is adjusted.
Example for moving from compute-o to compute-1
$ system host-fs-modify --functions= compute-0 ceph
$ system host-fs-modify --functions=monitor compute-1 ceph
$ system application-apply rook-ceph
Test Plan:
- PASS: Build app package
- PASS: Update app package on STD
- PASS: Remove one monitor only
- PASS: Remove two monitors at a time
- PASS: Move one monitor only
- PASS: Move two monitors at a time
- PASS: AIO-DX -> AIO-DX+
- PASS: For each apply after moving or removing a mon,
check whether the "/var/lib/ceph/data/mon-X"
directory has been removed.
Closes-Bug: 2082658
Change-Id: If3f2a27f9244ceff13e30fb3c25f3fa46432a1c8
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
Improvements on the cleanup to ensure that native cleanup
jobs are running well.
Added failsafe actions on cleanup in error cases
Removed some pool checks in post-install hook that was preventing
the correct installation with ecblock
Test Plan:
- PASS: Remove successfull Rook Ceph installations with all
services on SX/DX/STD
- PASS: Reapply the application after removal on all deployments
without any issues
- PASS: Test backup&restore with OSD wipe flag on SX/DX/STD
Story: 2011066
Task: 51030
Change-Id: Id2d49deaece07e6b314d3fab823f207f13b0da31
Signed-off-by: Caio Correa <caio.correa@windriver.com>
Since the capability introduction, the monitor availability has been
based on the monitor function assignment to the ceph host filesystem
and the ceph-float controller filesystem.
Lifecycle plugin changes:
- Add a new method to update controller filesystem status in case of
post-apply and post-remove.
- The update_controller_fs method is responsible for updating the
filesystem's state on the app transition status (uploaded/applied
and apply-failed).
- Rework update_host_fs to verify the capabilities to determine based
on the functions whether the filesystem should be set to Ready or
In-Use state. Also, required_state accepts more than one input now.
- Now in the semantic checking, monitor availability is based on
monitor capability function instead of host-fs count, leading to
consider local monitors (ceph host-fs) and the floating monitor
(ceph-float controller-fs). It requires at least one monitor.
- Rework apply_topology_labels to consider the monitor function to
set the 'ceph-mon-placement' and 'ceph-mgr-placement' labels to
the host.
- Add a new method cleanup_mon_mgr in case of post-apply and
post-remove.
- The cleanup_mon_mgr is responsible for checking the
monitor function and removing specific labels from the host when
the capability is missing and delete deployments.
The labels are removed from database and also from the node using
the Kubernetes client.
- Now, the is_floating_monitor_assigned method references to the
sysinv common function that checks the monitor capability from the
ceph-float controller-fs.
- Add the new function is_local_monitor_assigned to verify if there is
at least one ceph host-fs using the monitor function.
- Add the new function is_monitor_assigned to verify if there is a
local or floating monitor assigned.
- Semantic Check to limit the number of local monitors assigned to 2
while the floating monitor is assigned.
- Add Semantic Check rejecting apply operation when the setup is an
AIO-DX with worker nodes and the floating monitor is assigned.
- Add new method delete_deployment to delete the
local monitor and manager deployments where the label was removed
from the host, and ensuring that it has been removed from the
monitor quorum.
Helm Overrides:
- Enabling floating monitor when the function is assigned.
- Update the desired mon count to use the monitor capability function.
Test Plan:
PASS: Setup 2+2, add a monitor and check the mon and mgr labels.
PASS: Setup 2+2, remove a monitor and check the mon and mgr labels.
PASS: Setup 2+2, move a monitor from one worker to another (and back).
PASS: Setup 2+2, delete the worker nodes, and add floating mon.
PASS: Setup AIO-DX, with floating mon enabled, install worker,
move floating mon to fixed mon on worker.
Depends-on: https://review.opendev.org/c/starlingx/config/+/926098
Story: 2011066
Task: 50827
Change-Id: I2d0073e7f8c8c76c8505f3ad1abb7ebd4f09d4e3
Signed-off-by: Hediberto C Silva <hediberto.cavalcantedasilva@windriver.com>
Improvements on the cleanup to ensure that native cleanup
jobs are running well.
Fixed floating monitor jobs. They are now deleted when the
application is removed.
Test Plan:
- PASS: Remove successfull Rook Ceph installations on SX/DX/STD
- PASS: Reapply the application after removal on all deployments
without any issues
Story: 2011066
Task: 50976
Change-Id: I050df09a01a9f3869cac8544e8e6da512828d6b5
Signed-off-by: Caio Correa <caio.correa@windriver.com>
This change includes the OSD removal possibility, so the user can remove
an OSD from ceph cluster using host-stor-delete, then application-apply
and wait for the OSD removal
Test Plan:
- PASS: Upload/Apply rook-ceph app with deployment dedicated and
3 workers with 1 OSD in 2 workers, and 2 OSDs in 1 worker and
remove OSD from inventory using the host-stor-delete command,
reapply the app, and wait for OSD removal. After OSD removal,
add another OSD in the same worker using the host-stor-add
command, reapply the app, and wait for OSD to be recreated.
Story: 2011066
Task: 50938
Change-Id: I945222181f04b297b9e79ccd323e9316c1f3f230
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
This change includes a new service template named
rook-ceph-mgr-restful in rook-ceph-provisioner chart
to enable access to ceph-mgr via restful API using
an endpoint with default port 7999 to support STX integration
Test Plan:
- PASS: Upload/Apply rook-ceph app and check if the service and
endpoint have been created.
- PASS: Upload/Apply rook-ceph app and use cURL to access the
restful service
Story: 2011066
Task: 50967
Change-Id: Ibb8deb69cd55122c2b6a2a08d2f8d9306ebcc06e
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
This updates the misspelled pycryptodomex requirement is in
requirements.txt.
Change-Id: I858bb40f027470d5384309415bc7521bb09c9421
Story: 2011066
Task: 50873
Signed-off-by: Robert Church <robert.church@windriver.com>
This will enable integration of the floating monitor chart into the
application with:
- SM service monitor changes:
- Add and remove floating monitor placement labels in the start/stop
functions. This will ensure that when SM is transitioning activity
labels will align on the active controller.
- The stop function will delete the pod to force a reschedule.
- The status function will detect the presence of the DRBD mounted
filesystem and adjust the labeling accordingly in case start/stop
functions did not label as desired.
- application plugin changes:
- Add constants support for 'rook-ceph-floating-monitor' helmrelease
- Provide initial utility functions to detect if the DRBD controller
filesystem is enabled and if the floating monitor is assigned (via a
helm use override)
- Add a new function to get the IP family from the cluster-pod network
to set overrides and determine the IPv4/IPv6 static address
- Update the ceph cluster plugin to use a new utility function for
detecting the IP family
- Add the floating monitor helm plugin to generate the ip_family and
static ip_address based on that family. Initial support provided for
the cluster-pod network
- Update the lifecycle plugin to optionally remove the floating
monitor helm release on application remove
- application metadata
- disable the 'rook-ceph-floating-monitor' chart by default
- FluxCD manifest changes
- Change helmrepository API to v1 to clean up an error
- Add manifests for the 'rook-ceph-floating-monitor' helm release
- Temporarily set deletionPropagation in the rook-ceph-cluster, the
rook-ceph-provisioner and rook-ceph-floating-monitor helmreleases to
provide more predictive delete behavior
- Update rook-ceph-cluster-static-overrides.yaml to add network
defaults and disable the host network as the default provider. This
was done to avoid port conflicts with the floating monitor. The
cluster-pod network will now be the network used for the ceph
cluster and its pods
Enable monitor at runtime:
- system helm-override-list rook-ceph -l
- system helm-override-show rook-ceph rook-ceph-floating-monitor \
rook-ceph
- system helm-override-update rook-ceph rook-ceph-floating-monitor \
rook-ceph --set assigned="true"
- system helm-override-show rook-ceph rook-ceph-floating-monitor \
rook-ceph
- system application-apply rook-ceph
Disable monitor at runtime:
- system helm-override-list rook-ceph -l
- system helm-override-show rook-ceph rook-ceph-floating-monitor \
rook-ceph
- system helm-override-update rook-ceph rook-ceph-floating-monitor \
rook-ceph --set assigned="false"
- system helm-override-show rook-ceph rook-ceph-floating-monitor \
rook-ceph
- system application-apply rook-ceph
Future Improvements:
- Pickup the desired network from the storage backend (cluster-pod,
cluster-host, etc) and
- update _get_ip_family() to use this value
- update _get_static_floating_mon_ip() to get address pool range and
calculate an appropriate static IP address for the monitor
Test Plan:
PASS - Pkg build + ISO generation
PASS - Successful AIO-DX Installation
PASS - Initial Rook deployment without floating monitor.
PASS - Initial Rook deployment with floating monitor.
PASS - Runtime override enable of Rook floating monitor + reapply
PASS - Runtime override disable of Rook floating monitor + reapply
Change-Id: Ie1ff75481b6c2f0d9d34eb228d3019465e36bc1e
Depends-On: https://review.opendev.org/c/starlingx/config/+/926374
Story: 2011066
Task: 50838
Signed-off-by: Robert Church <robert.church@windriver.com>
When applying the app in some systems, the app was
stuck at 67%.
Analyzing the pod logs, it was possible to
observe that stx-ceph-manager was running with python2
instead of python3. Additionally, restful module certificates
were not being generated for all mgrs.
Finally, the rook-ceph-provision job pod was also observed
to have an IPV6 formatting error.
Test Plan:
- PASS: Build stx-ceph-manager image with the changes
from the review in 'Depends-On' below.
- PASS: Change the stx-ceph-manager deployment in
the rook-ceph app to use this image
- PASS: Build rook-ceph app
- PASS: Apply the app and check if it was applied successfully.
Story: 2011066
Task: 50703
Depends-On: https://review.opendev.org/c/starlingx/utilities/+/924883
Change-Id: Ic33dce418c11279462420c7f515fb443cbfe2379
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
During the application-remove, sometimes gets stuck on
removing state because k8s resources were not removed
successfully.
This happens because some resources did not have its
"finalizers" changed, in addition, the directories
in /var/lib/ceph/data are not cleaned.
To resolve this, "cleanupPolicy" was defined in ceph-cluster,
this way the operator itself will do a complete cleanup
before removing the app, including wiping OSDs.
Additionally, it was also identified that the jobs.batch
resources were not being removed due to a permission failure
in the ClusterRole.
Finally, the versioning of the rook-ceph-provisioner chart
was fixed, which was always 2.0.0.
Before the change: rook-ceph-provisioner-2.0.0.tgz
After the change: rook-ceph-provisioner-2.0.6.tgz
NOTE: The operation of removing and deleting the application is
now forbidden and is only possible through the "--force" argument.
Test Plan:
- PASS: Remove rook-ceph app
- PASS: Check that the state is not stuck on removing
- PASS: Check if all resources are in rook-ceph
namespace are deleted
Story: 2011066
Task: 50570
Change-Id: I007fcdf63ec9611c8839a6e7c0e2bff8d38e6086
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
This change includes some improvements to
support the auto-apply functionality of the app,
the improvements are:
- Update host-stor or host-fs states only if
ceph-rook backend exists
- Desired_state updated to applied
- Add new locks on semantic checking to only
allow auto_apply when minimum criteria match
Test Plan:
- PASS: Check if sysinv log is not show log about
host-stor or host-fs state was updated.
- PASS: Check if the app is auto-applying only
when the minimum criteria match.
Story: 2011066
Task: 50567
Change-Id: Ifd324e3eddbd3c1d2d3a22ad8ca0967e3f07be55
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
Changes include:
- Add common/utils.py for functionality common to lifecycle and helm
plugins
- Disable floating monitor support until properly integrated with the
optional controllerfs
- Determine Ceph services count (MONs, MDSs, MGRs) based on hostfs
assignemnts not labels as the lifecycle plugins will label AFTER the
counts are needed for semantic checks
- Disable ecblock and rgw until configuration can be validated
- Disable the mon/osd audits temporarily
- Disable host provisioning of /etc/ceph/ceph.conf so that the
rook-ceph-tools pod is used for client access
- Rename _get_hosts() to _get_hosts_by_deployment_model()
- Rename _get_nodes_osds() to _get_osds_by_node()
- Add support to the kustomize plugin to break out and enable/disable
specific static overrides manifests based if the service is enabled
in the backend
- Rename pre_apply_check_ceph_rook() to pre_apply_semantic_checks()
- Add handle_incomplete_config_alarm() to centralize alarming
information
- Update various messages for content and readability
- Update the cephClusterSpec to use the ceph host-fs filesystem for
dataDirHostPath
- Disable logs when command succeeds
- Update the remove custom resource finalizers command to run
successfully
- Enable host lock/unlock or remove the app when the app is in the
uploaded state and the ceph-rook backend has not yet been create.
- Remove all resources from rook-ceph namespace
- Enable the triggering of alarms, if necessary, in the lifecycle
pre-apply hook
Test Plan:
PASS - AIO-SX install and deployment of Rook via
system storage-backend-add ceph-rook --confirmed
system host-fs-add controller-0 ceph=20
system host-disk-wipe -s --confirm controller-0 /dev/sdb
system host-disk-list controller-0 | \
awk '/\/dev\/sdb/{print $2}' | \
xargs -i system host-stor-add controller-0 {}
system host-stor-list controller-0
system application-apply rook-ceph
ROOK_TOOLS_POD=$(kubectl -n rook-ceph get pod \
-l "app=rook-ceph-tools" \
-o jsonpath='{.items[0].metadata.name}')
kubectl -n rook-ceph exec -it $ROOK_TOOLS_POD -- ceph -s
Change-Id: Ib5964bbc3eaae173d6a47da3d44c71db9b35ee55
Depends-On: https://review.opendev.org/c/starlingx/config/+/922365
Story: 2011066
Task: 50391
Signed-off-by: Robert Church <robert.church@windriver.com>
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
This change implements new checks in the rook-ceph app, the checks are:
- Raise an alarm based on replication factor and OSDs
- Block the app when the k8s version is not the latest
Test Plan:
- PASS: Check whether the alarm is raised when the replication factor
is greater than OSD quantity in SX.
- PASS: Check whether the alarm is raised when the replication factor
is greater than the number of hosts with OSDs.
- PASS: Check if the app is being blocked when the k8s version is not
the latest supported version.
Story: 2011066
Task: 50370
Change-Id: I71df6bb5816cc5e8271c8e5b77e702db772a9a72
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
This commit adds the state transition in the rook-ceph app
lifecycle to each host's host-fs 'ceph'.
Added state transitions:
- constants.HOST_FS_STATUS_READY -> constants.HOST_FS_STATUS_IN_USE
(During and after apply the app)
- constants.HOST_FS_STATUS_IN_USE -> constants.HOST_FS_STATUS_READY
(When the app is uploaded and the old state of host-fs was in use)
Test Plan:
PASS: AIO-SX -> Check that the state of host-fs 'ceph' of
controller-0 is ready before applying and that it changes
to in-use upon completion of the application-apply.
PASS: AIO-SX -> With the app applied, check that host-fs 'ceph' is
in use, remove the app and when returning to uploaded check
that the state goes to ready.
Story: 2011117
Task: 50343
Depends-On: https://review.opendev.org/c/starlingx/config/+/921446
Change-Id: I63af4325c6386879794d7e09ca4de99d3ca0c37d
Signed-off-by: Gabriel de Araújo Cabral <gabriel.cabral@windriver.com>
This change add new dynamic overrides and enable/disable services based
on storage-backend.
Dynamic overrides added:
Overrides based on how many hosts have host-fs ceph:
- mds replicas size
- mon count
- mgr count
Overrides based on host-stor
- nodes
- devices (osds)
Services that can be enabled:
- CephFS (filesystem)
- RBD (block or ecblock)
- RGW (object)
Test Plan:
- PASS: Load the rook-ceph app and check system-overrides for each
chart
- PASS: Apply the rook-ceph app and check if system-overrides have
changed, only if something has changed before applying the app
- PASS: Check if the services are enabled correctly based on the
storage-backend services column
- PASS: Check if the ceph is in HEALTH_OK status
Depends-On: https://review.opendev.org/c/starlingx/config/+/921801
Story: 2011066
Task: 50298
Change-Id: Ib245b0f1195d4c6437ed45346fe00cf16a69f67f
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
This change implements the lifecycle features in pre-apply and
post-apply actions. The features added were about checks and additions
automatically.
Post-upload:
- Change storage-backend task to app status
Pre-apply:
- Add Topology labels in each host on the system
- Block if there is no ceph-rook backend in configuring state
- Block if there is another backend in storage-backend list.
- Block if the cephmon-label was not correctly added (host-fs)
- Block if the OSDs were not correctly added (host-stor)
Post-apply:
- Change storage-backend state to configured when app applied success
- Change storage-backend state to configuration-failed when app fails
on apply
- Change host-stor state to configured when app applied success
- Change host-stor state to configuration-failed when app fails on
apply
- Change storage-backend task to app status
Post-remove:
- Change storage-backend task to app status
- Change host-stor state to configuring
Post-delete:
- Change storage-backend task to app status
Test Plan:
- PASS: Check if the Topology Labels are added in each host
- PASS: Check if the rook-ceph app can't be applied when
storage-backend there isn't
- PASS: Check if the rook-ceph app can't be applied when cephmon-label
is not correctly added (host-fs)
- PASS: Check if the rook-ceph app can't be applied when OSDs is not
correctly added.
- PASS: Check if when the rook-ceph app applied successfully is
changing the storage-backend and host-stor states to configured
- PASS: Check if when the rook-ceph app applied fails is changing the
storage-backend and host-stor states to configuration-failed
- PASS: Check if storage-backend task are being updated on each
lifecycle hook
- PASS: Check if storage-backend and host-stor states are being reset
to configuring on app removal
Story: 2011066
Task: 50126
Change-Id: Ic77db9176b53411635ad0fc87b0fc57a12620679
Signed-off-by: Gustavo Ornaghi Antunes <gustavo.ornaghiantunes@windriver.com>
Fixed mds pod scheduling:
- Adds control-plane and master tolerations.
- Adds nodeAffinity for ceph-mon-placement.
- Adds starlingx.io/component label without need for a post-install
helm hook.
Test Plan:
PASS: Test pod allocation on controller nodes in STD.
PASS: Test nodeAffinity by turning off all nodes with
ceph-mon-placement label. Pod should be pending.
Story: 2011066
Task: 50232
Change-Id: Iba7c097b9f58826d01008c41fd0caa84a24a94a3
Signed-off-by: Caio Correa <caio.correa@windriver.com>
Adding the app.starlingx to all pods to ensure that the entire
application runs under platform cores.
Also a correction in setup.cfg regarding the name of the app.
Test Plan:
PASS: build all app-rook-ceph packages successfully.
PASS: app-rook-ceph upload/apply/remove/delete on
SX platform.
PASS: cluster status HEALTH_OK.
PASS: all pods contains the app.starlingx.io/component=true
label.
Story: 2011066
Task: 50109
Change-Id: Iee3055fa916828f4e5627a072e245aa9aec850a9
Signed-off-by: Caio Correa <caio.correa@windriver.com>
The app is based on the old StarlingX Rook Ceph application.
This provides support for the latest versions of Rook Ceph
storage and packs it as a StarlingX Application.
Auto-increment helm chart versions is already present on this
initial commit.
Support for Dual-Stack.
Partial IPv6 support was added: there is a bug with DX IPv6
configuration involving the floating monitor.
Remove/delete is successful for FluxCD, however some residual
kubernetes assets remains on the system after the remove.
Rook Ceph version: 1.13.7
Test Plan:
PASS: build all app-rook-ceph packages successfully.
PASS: app-rook-ceph upload/apply/remove/delete on
SX/DX/DX+/Standard platforms.
PASS: create a volume using PVC through cephfs and rbd
storageClasses and test read/write on the corresponding
pools at SX/DX/DX+/Standard plaforms.
Story: 2011066
Task: 49846
Change-Id: I7aa6b08a30676095c86a974eaca79084b2f06859
Signed-off-by: Caio Correa <caio.correa@windriver.com>