Automated cleaning by runbook

This enables an operator to override Ironic's autogenerated cleaning
step functionality, instead providing a runbook to be used for
automated cleaning.

Operators will be able to configure runbooks globally, by resource
class, and as a node override. Configuration exists to enable/disable
this functionality at the will of the deployer, and defaults to
maintaining existing behavior. Runbooks are also validated, by default,
against node traits and will fail cleaning on a mismatch; this
behavior is also configurable.

Unit tests generated and fixed by the various different AI agents I've
been trying out through the lifetime of this change, then heavily edited.

Generated-By: Cursor, Jetbrains Junie, claude-code
Closes-bug: #2100545
Change-Id: I7c312885793ee72b1ca8c415354b9e73a3dac9d7
This commit is contained in:
Jay Faulkner
2025-03-21 14:06:25 -07:00
parent 1c3dde2013
commit febb6e24a0
13 changed files with 563 additions and 41 deletions

View File

@@ -0,0 +1,28 @@
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
"""
Mapping of values for CONF.conductor.automated_clean_step_source, representing
different possible sources for cleaning steps.
"""
AUTOGENERATED = 'autogenerated'
"Use traditional Ironic method with automatically determined cleaning steps"
RUNBOOK = 'runbook'
"Use a runbook for automated cleaning; fail if no runbook is configured."
HYBRID = 'hybrid'
"Use a runbook if configured; use autogenerated steps if not."

View File

@@ -29,7 +29,8 @@ LOG = log.getLogger(__name__)
@task_manager.require_exclusive_lock
def do_node_clean(task, clean_steps=None, disable_ramdisk=False):
def do_node_clean(task, clean_steps=None, disable_ramdisk=False,
automated_with_steps=False):
"""Internal RPC method to perform cleaning of a node.
:param task: a TaskManager instance with an exclusive lock on its node
@@ -38,9 +39,11 @@ def do_node_clean(task, clean_steps=None, disable_ramdisk=False):
For more information, see the clean_steps parameter
of :func:`ConductorManager.do_node_clean`.
:param disable_ramdisk: Whether to skip booting ramdisk for cleaning.
:param automated_with_steps: Indicates this is a declarative automated
clean powered by a runbook.
"""
node = task.node
manual_clean = clean_steps is not None
manual_clean = clean_steps is not None and automated_with_steps is False
clean_type = 'manual' if manual_clean else 'automated'
LOG.debug('Starting %(type)s cleaning for node %(node)s',
{'type': clean_type, 'node': node.uuid})
@@ -77,10 +80,11 @@ def do_node_clean(task, clean_steps=None, disable_ramdisk=False):
return utils.cleaning_error_handler(task, msg)
utils.wipe_cleaning_internal_info(task)
if manual_clean:
if clean_steps:
node.set_driver_internal_info('clean_steps', clean_steps)
node.set_driver_internal_info('cleaning_disable_ramdisk',
disable_ramdisk)
node.set_driver_internal_info('declarative_cleaning', True)
task.node.save()
utils.node_update_cache(task)
@@ -120,7 +124,9 @@ def do_node_clean(task, clean_steps=None, disable_ramdisk=False):
try:
conductor_steps.set_node_cleaning_steps(
task, disable_ramdisk=disable_ramdisk)
task, disable_ramdisk=disable_ramdisk,
use_existing_steps=bool(clean_steps)
)
except Exception as e:
# Catch all exceptions and follow the error handling
# path so things are cleaned up properly.

View File

@@ -51,6 +51,7 @@ from oslo_utils import excutils
from oslo_utils import timeutils
from oslo_utils import uuidutils
from ironic.common import automated_clean_methods
from ironic.common import boot_devices
from ironic.common import driver_factory
from ironic.common import exception
@@ -1181,9 +1182,74 @@ class ConductorManager(base_manager.BaseConductorManager):
# instance_uuid, refresh the node to get these changes.
node.refresh()
# Get steps for automated cleaning, if we are configured to use
# a cleaning runbook for this node. clean_steps will be None
# in the case we need autogenerated cleaning steps
clean_steps, dis_rd = self._get_steps_for_automated_cleaning(task)
# Begin cleaning
task.process_event('clean')
cleaning.do_node_clean(task)
cleaning.do_node_clean(task,
clean_steps=clean_steps,
disable_ramdisk=dis_rd,
automated_with_steps=bool(clean_steps))
def _get_steps_for_automated_cleaning(self, task):
"""Return steps for automated cleaning, if configured
Automated cleaning steps are determined two ways:
- Configured explicitly via runbook
- Autogenerated based on driver and step priorities -- usually
from IPA hardware managers
This method returns the steps used if the node is configured
for automated cleaning via runbook. If the default, autogenerated
steps are to be used, we return none.
:param node: A node object
:returns: tuple of: None or list of cleaning steps, disable_ramdisk
"""
node = task.node
rb_source = CONF.conductor.automated_cleaning_step_source
rb_validate = CONF.conductor.automated_cleaning_runbook_validate_traits
if rb_source == automated_clean_methods.AUTOGENERATED:
# Do not attempt to use a cleaning runbook, just return None
return None, False
runbook_ident = (
CONF.conductor.automated_cleaning_runbook_by_resource_class.get(
node.resource_class, CONF.conductor.automated_cleaning_runbook
)
)
if CONF.conductor.automated_cleaning_runbook_from_node:
runbook_ident = node.driver_info.get('cleaning_runbook',
runbook_ident)
if runbook_ident:
runbook = self.get_runbook(task.context, runbook_ident)
if rb_validate and (runbook.name not in
node.traits.get_trait_names()):
msg = _(
"Automated cleaning runbook %(rb)s is not valid for "
"node %(node)s. Runbook name must match a node trait."
) % {'rb': runbook.name, 'node': node.uuid}
# NOTE(JayF): Cleaning failures due to misconfiguration
# logged at error, here and below.
LOG.error(msg)
raise exception.NodeCleaningFailure(node=node, message=msg)
return convert_steps(runbook.steps), runbook.disable_ramdisk
elif rb_source == automated_clean_methods.RUNBOOK:
msg = _("Unable to perform cleaning for node %(node)s: "
"configuration requires runbook for automated "
"cleaning and none was configured." % {'node': node.uuid})
LOG.error(msg)
raise exception.NodeCleaningFailure(node=node, message=msg)
else:
# Node is using automated_clean_methods.HYBRID and does not have
# a configured runbook; return "None" to signal normal cleaning.
return None, False
@METRICS.timer('ConductorManager.do_node_clean')
@messaging.expected_exceptions(exception.InvalidParameterValue,
@@ -1249,8 +1315,8 @@ class ConductorManager(base_manager.BaseConductorManager):
task.process_event(
'clean',
callback=self._spawn_worker,
call_args=(cleaning.do_node_clean, task, clean_steps,
disable_ramdisk),
call_args=(cleaning.do_node_clean, task,
clean_steps, disable_ramdisk, False),
err_handler=utils.provisioning_error_handler,
target_state=states.MANAGEABLE)
except exception.InvalidState:
@@ -1332,10 +1398,15 @@ class ConductorManager(base_manager.BaseConductorManager):
if node.retired:
raise exception.NodeIsRetired(op=_('providing'),
node=node.uuid)
steps, dis_rd = self._get_steps_for_automated_cleaning(task)
# Begin cleaning
task.process_event(
'provide',
callback=self._spawn_worker,
call_args=(cleaning.do_node_clean, task),
call_args=(
cleaning.do_node_clean,
task, steps, dis_rd, bool(steps)),
err_handler=utils.provisioning_error_handler)
return
@@ -3995,11 +4066,28 @@ class ConductorManager(base_manager.BaseConductorManager):
"%(device_types)s from node %(node)s: %(exc)s",
device_types=device_types)
@messaging.expected_exceptions(exception.NotFound)
def get_runbook(self, context, runbook_ident):
"""Get runbook from the UUID or logical name.
:param runbook_ident: the UUID or logical name of a runbook.
:returns: object.Runbook
:raises: InvalidUuidOrName if the name or uuid provided is not valid.
:raises: RunbookNotFound if the runbook is not found.
"""
# If runbook_ident is instead a valid UUID, treat it as a UUID.
if uuidutils.is_uuid_like(runbook_ident):
return objects.Runbook.get_by_uuid(
context, runbook_ident)
else:
return objects.Runbook.get_by_name(
context, runbook_ident)
# NOTE(TheJulia): This is the end of the class definition for the
# conductor manager. Methods for RPC and stuffs should go above this
# point in the File. Everything below is a helper or periodic.
@METRICS.timer('get_vendor_passthru_metadata')
def get_vendor_passthru_metadata(route_dict):
d = {}
@@ -4230,3 +4318,22 @@ def do_detach_virtual_media(task, device_types):
for device_type in device_types:
file_name = _virtual_media_file_name(task, device_type)
image_utils.cleanup_remote_image(task, file_name)
def convert_steps(steps):
"""Removes nonserializable fields from steps."""
fixed_steps = []
for step in steps:
result = {
'interface': step['interface'],
'step': step['step'],
'args': step['args'],
}
if 'priority' in step:
result['priority'] = step['priority']
elif 'order' in step:
result['order'] = step['order']
fixed_steps.append(result)
return fixed_steps

View File

@@ -293,15 +293,22 @@ def _get_verify_steps(task, enabled=False, sort=True):
return verify_steps
def set_node_cleaning_steps(task, disable_ramdisk=False):
def set_node_cleaning_steps(task, disable_ramdisk=False,
use_existing_steps=False):
"""Set up the node with clean step information for cleaning.
For automated cleaning, get the clean steps from the driver.
For manual cleaning, the user's clean steps are known but need to be
validated against the driver's clean steps.
:param task: A TaskManager object
:param disable_ramdisk: If `True`, only steps with requires_ramdisk=False
are accepted.
:param use_existing_steps: If `True`, validate existing steps from
driver_internal_info['clean_steps'] which comes from manual cleaning
API call or a runbook-based declarative automated cleaning. If
`False`, retrieve cleaning steps from the driver (implicit
automated cleaning).
:raises: InvalidParameterValue if there is a problem with the user's
clean steps.
:raises: NodeCleaningFailure if there was a problem getting the
@@ -309,14 +316,9 @@ def set_node_cleaning_steps(task, disable_ramdisk=False):
"""
node = task.node
# For manual cleaning, the target provision state is MANAGEABLE, whereas
# for automated cleaning, it is AVAILABLE.
manual_clean = node.target_provision_state == states.MANAGEABLE
if not manual_clean:
# Get the prioritized steps for automated cleaning
steps = _get_cleaning_steps(task, enabled=True)
else:
# NOTE(JayF): This is true when doing manual cleaning or when doing
# cleaning via runbook.
if use_existing_steps is True:
# For manual cleaning, the list of cleaning steps was specified by the
# user and already saved in node.driver_internal_info['clean_steps'].
# Now that we know what the driver's available clean steps are, we can
@@ -324,7 +326,13 @@ def set_node_cleaning_steps(task, disable_ramdisk=False):
steps = _validate_user_clean_steps(
task, node.driver_internal_info['clean_steps'],
disable_ramdisk=disable_ramdisk)
else:
# Get the prioritized steps for automated cleaning
steps = _get_cleaning_steps(task, enabled=True)
# For manual cleaning, the target provision state is MANAGEABLE, whereas
# for automated cleaning, it is AVAILABLE.
manual_clean = node.target_provision_state == states.MANAGEABLE
LOG.debug('List of the steps for %(type)s cleaning of node %(node)s: '
'%(steps)s', {'type': 'manual' if manual_clean else 'automated',
'node': node.uuid,

View File

@@ -676,6 +676,7 @@ def wipe_cleaning_internal_info(task):
node.del_driver_internal_info('clean_step_index')
node.del_driver_internal_info('cleaning_disable_ramdisk')
node.del_driver_internal_info('steps_validated')
node.del_driver_internal_info('declarative_cleaning')
async_steps.remove_node_flags(node)

View File

@@ -18,6 +18,7 @@
from oslo_config import cfg
from oslo_config import types
from ironic.common import automated_clean_methods
from ironic.common import boot_modes
from ironic.common.i18n import _
from ironic.common import lessee_sources
@@ -146,12 +147,71 @@ opts = [
'state. When enabled, the particular steps '
'performed to clean a node depend on which driver '
'that node is managed by; see the individual '
'driver\'s documentation for details. '
'NOTE: The introduction of the cleaning operation '
'causes instance deletion to take significantly '
'longer. In an environment where all tenants are '
'trusted (eg, because there is only one tenant), '
'this option could be safely disabled.')),
'driver\'s documentation for details.')),
cfg.StrOpt('automated_cleaning_step_source',
default='autogenerated',
mutable=True,
choices=[
(automated_clean_methods.AUTOGENERATED, _(
'Collects steps from hardware interfaces and orders '
'by priority. This provides the original Ironic '
'cleaning behavior originally implemented in Kilo.')),
(automated_clean_methods.RUNBOOK, _(
'Runs cleaning via a runbook specified in '
'configuration or node driver_info. If a runbook '
'is not specified while automated_clean is enabled, '
'cleaning will fail.')),
(automated_clean_methods.HYBRID, _(
'Runs cleaning via a runbook if one is specified in '
'configuration or node driver_info. If a runbook '
'is not specified while automated_clean is enabled, '
'Ironic will fallback to \'autogenerated\' '
'cleaning steps.'))
],
help=_('Determines how automated_cleaning is performed; the '
'default, \'autogenerated\' collects steps from '
'hardware interfaces, then ordering by priority; '
'\'runbook\' requires a runbook to be specified in '
'config or driver_info, which is then used to clean the'
'node; \'hybrid\' uses a runbook if available, '
'and falls-back to autogenerated cleaning steps if not.'
)),
cfg.StrOpt('automated_cleaning_runbook',
default=None,
mutable=True,
help=_('If set and [conductor]/automated_clean_step_source '
'is set to \'hybrid\' or \'runbook\', the runbook '
'UUID or name provided here will be used during '
'automated_cleaning for nodes which do not have a '
'resource_class-specific runbook or runbook set in '
'driver_info.')),
cfg.DictOpt('automated_cleaning_runbook_by_resource_class',
default={},
mutable=True,
help=_('A dictionary of key-value pairs of node '
'resource_class and runbook UUID or name which will be '
'used to clean the node if '
'[conductor]automated_clean_step_source is set to '
'\'hybrid\' or \'runbook\' and a more specific runbook '
'has not been configured in driver_info.')),
cfg.BoolOpt('automated_cleaning_runbook_from_node',
default=False,
mutable=True,
help=_('When enabled, allows an administrator to configure '
'a runbook in '
'node[\'driver_info\'][\'cleaning_runbook\'] to '
'use for that node when '
'[conductor]automated_clean_step_source is set to '
'\'hybrid\' or \'runbook\'. '
'NOTE: This will permit any user with access to edit '
'node[\'driver_info\'] to circumvent cleaning.')),
cfg.BoolOpt('automated_cleaning_runbook_validate_traits',
default=True,
mutable=True,
help=_('When enabled, this option requires validation of a '
'runbook before it\'s used for automated cleaning. '
'Nodes configured with a runbook that is not validated '
'for use via trait matching will fail to clean.')),
cfg.BoolOpt('allow_provisioning_in_maintenance',
default=True,
mutable=True,

View File

@@ -570,8 +570,11 @@ class HeartbeatMixin(object):
task.resume_cleaning()
# First, cache the clean steps
self.refresh_clean_steps(task)
use_existing_steps = node.driver_internal_info.get(
'declarative_cleaning', False)
# Then set/verify node clean steps and start cleaning
conductor_steps.set_node_cleaning_steps(task)
conductor_steps.set_node_cleaning_steps(
task, use_existing_steps=use_existing_steps)
cleaning.continue_node_clean(task)
else:
msg = _('Node failed to check cleaning progress')

View File

@@ -51,7 +51,8 @@ class CommonMixIn(object):
if node is None:
node = self._create_node(**node_attrs)
task = mock.Mock(spec_set=['node', 'release_resources',
'spawn_after', 'process_event', 'driver'])
'spawn_after', 'process_event',
'driver', 'context'])
task.node = node
return task

View File

@@ -439,7 +439,8 @@ class DoNodeCleanTestCase(db_base.DbTestCase):
node.refresh()
self.assertEqual(states.CLEANFAIL, node.provision_state)
self.assertEqual(tgt_prov_state, node.target_provision_state)
mock_steps.assert_called_once_with(mock.ANY, disable_ramdisk=False)
mock_steps.assert_called_once_with(mock.ANY, disable_ramdisk=False,
use_existing_steps=mock.ANY)
self.assertFalse(node.maintenance)
self.assertIsNone(node.fault)
@@ -470,7 +471,8 @@ class DoNodeCleanTestCase(db_base.DbTestCase):
node.refresh()
self.assertEqual(states.CLEANFAIL, node.provision_state)
self.assertEqual(tgt_prov_state, node.target_provision_state)
mock_steps.assert_called_once_with(mock.ANY, disable_ramdisk=False)
mock_steps.assert_called_once_with(mock.ANY, disable_ramdisk=False,
use_existing_steps=mock.ANY)
self.assertTrue(mock_power.called)
def test__do_node_clean_automated_steps_fail(self):
@@ -491,13 +493,13 @@ class DoNodeCleanTestCase(db_base.DbTestCase):
autospec=True)
def __do_node_clean(self, mock_power_valid, mock_network_valid,
mock_next_step, mock_steps, clean_steps=None,
disable_ramdisk=False):
if clean_steps:
disable_ramdisk=False, automated_with_steps=False):
if clean_steps and not automated_with_steps:
tgt_prov_state = states.MANAGEABLE
else:
tgt_prov_state = states.AVAILABLE
def set_steps(task, disable_ramdisk=None):
def set_steps(task, disable_ramdisk=None, use_existing_steps=None):
dii = task.node.driver_internal_info
dii['clean_steps'] = self.clean_steps
task.node.driver_internal_info = dii
@@ -516,7 +518,8 @@ class DoNodeCleanTestCase(db_base.DbTestCase):
with task_manager.acquire(
self.context, node.uuid, shared=False) as task:
cleaning.do_node_clean(task, clean_steps=clean_steps,
disable_ramdisk=disable_ramdisk)
disable_ramdisk=disable_ramdisk,
automated_with_steps=automated_with_steps)
node.refresh()
@@ -529,7 +532,8 @@ class DoNodeCleanTestCase(db_base.DbTestCase):
mock_next_step.assert_called_once_with(
task, 0, disable_ramdisk=disable_ramdisk)
mock_steps.assert_called_once_with(
task, disable_ramdisk=disable_ramdisk)
task, disable_ramdisk=disable_ramdisk,
use_existing_steps=mock.ANY)
if clean_steps:
self.assertEqual(clean_steps,
node.driver_internal_info['clean_steps'])
@@ -1204,6 +1208,12 @@ class DoNodeCleanTestCase(db_base.DbTestCase):
def test_continue_node_clean_no_skip_step(self):
self._continue_node_clean(skip=False)
def test__do_node_clean_automated_with_steps(self):
"""Test automated cleaning with steps from runbook."""
self.__do_node_clean(
clean_steps=self.clean_steps,
automated_with_steps=True)
class DoNodeCleanAbortTestCase(db_base.DbTestCase):
@mock.patch.object(fake.FakeDeploy, 'tear_down_cleaning', autospec=True)

View File

@@ -2551,7 +2551,7 @@ class DoNodeTearDownTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
self.assertNotIn('automatic_lessee', node.driver_internal_info)
self.assertNotIn('is_source_a_path', node.driver_internal_info)
mock_tear_down.assert_called_once_with(task.driver.deploy, task)
mock_clean.assert_called_once_with(task)
mock_clean.assert_called_once_with(task, mock.ANY, mock.ANY, mock.ANY)
self.assertEqual({}, port.internal_info)
mock_unbind.assert_called_once_with('foo', context=mock.ANY)
if enabled_console:
@@ -2600,7 +2600,8 @@ class DoNodeTearDownTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
self.assertIsNone(node.last_error)
self.assertEqual({}, node.instance_info)
mock_tear_down.assert_called_once_with(mock.ANY, mock.ANY)
mock_clean.assert_called_once_with(mock.ANY)
mock_clean.assert_called_once_with(mock.ANY, mock.ANY, mock.ANY,
mock.ANY)
if is_rescue_state:
mock_rescue_clean.assert_called_once_with(mock.ANY, mock.ANY)
else:
@@ -2735,7 +2736,8 @@ class DoProvisioningActionTestCase(mgr_utils.ServiceSetUpMixin,
self.assertEqual(states.AVAILABLE, node.target_provision_state)
self.assertIsNone(node.last_error)
mock_spawn.assert_called_with(self.service,
cleaning.do_node_clean, mock.ANY)
cleaning.do_node_clean,
mock.ANY, mock.ANY, mock.ANY, mock.ANY)
@mock.patch('ironic.conductor.manager.ConductorManager._spawn_worker',
autospec=True)
@@ -2974,7 +2976,8 @@ class DoNodeCleanTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
mock_power_valid.assert_called_once_with(mock.ANY, mock.ANY)
mock_network_valid.assert_called_once_with(mock.ANY, mock.ANY)
mock_spawn.assert_called_with(
self.service, cleaning.do_node_clean, mock.ANY, clean_steps, False)
self.service, cleaning.do_node_clean, mock.ANY, clean_steps,
False, False)
node.refresh()
# Node will be moved to CLEANING
self.assertEqual(states.CLEANING, node.provision_state)
@@ -3007,7 +3010,8 @@ class DoNodeCleanTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase):
mock_power_valid.assert_called_once_with(mock.ANY, mock.ANY)
mock_network_valid.assert_called_once_with(mock.ANY, mock.ANY)
mock_spawn.assert_called_with(
self.service, cleaning.do_node_clean, mock.ANY, clean_steps, False)
self.service, cleaning.do_node_clean, mock.ANY, clean_steps,
False, False)
node.refresh()
# Make sure states were rolled back
self.assertEqual(prv_state, node.provision_state)
@@ -6732,6 +6736,194 @@ class ParallelPowerSyncTestCase(mgr_utils.CommonMixIn, db_base.DbTestCase):
queue_mock.return_value.put.assert_has_calls(expected_calls)
@mgr_utils.mock_record_keepalive
@mock.patch.object(task_manager, 'acquire', autospec=True)
class GetStepsForAutomatedCleaningTestCase(mgr_utils.ServiceSetUpMixin,
mgr_utils.CommonMixIn,
db_base.DbTestCase):
def _set_node_trait(self, name):
self.node.traits = objects.TraitList.create(context=self.context,
node_id=self.node.id,
traits=[name])
def setUp(self):
super(GetStepsForAutomatedCleaningTestCase, self).setUp()
self._start_service()
self.node = obj_utils.create_test_node(
self.context,
driver='fake-hardware',
uuid=uuidutils.generate_uuid())
self.node.create()
self.task = self._create_task(node=self.node)
self.rb_name = 'CUSTOM_BM_RB'
self._set_node_trait(self.rb_name)
@mock.patch.object(manager.ConductorManager, 'get_runbook', autospec=True)
def test_gsfac_runbook_by_resource_class(self, mock_get_runbook, mt):
CONF.set_override('automated_cleaning_step_source', 'runbook',
group='conductor')
CONF.set_override('automated_cleaning_runbook_by_resource_class',
{'baremetal': self.rb_name},
group='conductor')
self.node.resource_class = 'baremetal'
rb = mock.Mock(spec_set=objects.Runbook)
rb.name = self.rb_name
exp_steps = [{'interface': 'deploy', 'step': 'test_step', 'args': {}}]
rb.steps = exp_steps
rb.disable_ramdisk = True
mock_get_runbook.return_value = rb
steps, dr = self.service._get_steps_for_automated_cleaning(self.task)
mock_get_runbook.assert_called_once_with(mock.ANY, mock.ANY,
self.rb_name)
self.assertEqual(exp_steps, steps)
self.assertTrue(dr)
@mock.patch.object(manager.ConductorManager, 'get_runbook', autospec=True)
def test_gsfac_fallback_to_default_runbook(self, mock_get_runbook, mt):
rb_to_use = 'CUSTOM_DEFAULT'
CONF.set_override('automated_cleaning_step_source', 'runbook',
group='conductor')
CONF.set_override('automated_cleaning_runbook', rb_to_use,
group='conductor')
CONF.set_override('automated_cleaning_runbook_by_resource_class',
{'other': 'CUSTOM_OTHER'}, group='conductor')
self.node.resource_class = 'baremetal'
self._set_node_trait(rb_to_use)
rb = mock.Mock(spec_set=objects.Runbook)
rb.name = rb_to_use
exp_steps = [{'interface': 'deploy', 'step': 'test_step', 'args': {}}]
rb.steps = exp_steps
rb.disable_ramdisk = True
mock_get_runbook.return_value = rb
steps, dr = self.service._get_steps_for_automated_cleaning(self.task)
mock_get_runbook.assert_called_once_with(mock.ANY, mock.ANY, rb_to_use)
self.assertEqual(exp_steps, steps)
self.assertTrue(dr)
@mock.patch.object(manager.ConductorManager, 'get_runbook', autospec=True)
def test_gsfac_autogenerated(self, mock_get_runbook, mt):
CONF.set_override('automated_cleaning_step_source', 'autogenerated',
group='conductor')
CONF.set_override('automated_cleaning_runbook_by_resource_class',
{'baremetal': self.rb_name},
group='conductor')
self.node.resource_class = 'baremetal'
steps, dr = self.service._get_steps_for_automated_cleaning(self.task)
self.assertIsNone(steps)
self.assertFalse(dr)
mock_get_runbook.assert_not_called()
@mock.patch.object(manager.ConductorManager, 'get_runbook', autospec=True)
def test_gsfac_runbook_from_node(self, mock_get_runbook, mt):
rb_to_use = 'CUSTOM_NODE_RB'
CONF.set_override('automated_cleaning_step_source', 'runbook',
group='conductor')
CONF.set_override('automated_cleaning_runbook_from_node', True,
group='conductor')
self.node.driver_info = {'cleaning_runbook': rb_to_use}
self._set_node_trait(rb_to_use)
rb = mock.Mock(spec_set=objects.Runbook)
rb.name = rb_to_use
exp_steps = [{'interface': 'deploy', 'step': 'test_step', 'args': {}}]
rb.steps = exp_steps
rb.disable_ramdisk = True
mock_get_runbook.return_value = rb
steps, dr = self.service._get_steps_for_automated_cleaning(self.task)
mock_get_runbook.assert_called_once_with(mock.ANY, mock.ANY,
rb_to_use)
self.assertEqual(steps, exp_steps)
self.assertTrue(dr)
@mock.patch.object(manager.ConductorManager, 'get_runbook', autospec=True)
def test_gsfac_hybrid_no_runbook(self, mock_get_runbook, mt):
CONF.set_override('automated_cleaning_step_source', 'hybrid',
group='conductor')
CONF.set_override('automated_cleaning_runbook', None,
group='conductor')
CONF.set_override('automated_cleaning_runbook_by_resource_class',
{}, group='conductor')
steps, dr = self.service._get_steps_for_automated_cleaning(self.task)
self.assertFalse(mock_get_runbook.called)
self.assertIsNone(steps)
self.assertFalse(dr)
@mock.patch.object(manager.ConductorManager, 'get_runbook', autospec=True)
def test_gsfac_no_runbook_required(self, mock_get_runbook, mt):
CONF.set_override('automated_cleaning_step_source', 'runbook',
group='conductor')
CONF.set_override('automated_cleaning_runbook', None,
group='conductor')
CONF.set_override('automated_cleaning_runbook_by_resource_class',
{}, group='conductor')
self.assertRaises(exception.NodeCleaningFailure,
self.service._get_steps_for_automated_cleaning,
self.task)
self.assertFalse(mock_get_runbook.called)
@mock.patch.object(manager.ConductorManager, 'get_runbook', autospec=True)
def test_gsfac_runbook_incompatible_with_node(self, mock_get_runbook, mt):
bad_rb_name = "CUSTOM_INCOMPATIBLE_RB"
CONF.set_override('automated_cleaning_step_source', 'runbook',
group='conductor')
CONF.set_override('automated_cleaning_runbook_by_resource_class',
{'baremetal': bad_rb_name},
group='conductor')
# NOTE(JayF): We specifically are not adding bad_rb_name to traits,
# which means this should be considered incompatible
self.node.resource_class = 'baremetal'
rb = mock.Mock(spec_set=objects.Runbook)
rb.name = bad_rb_name
# exp_steps = [{'interface': 'deploy', 'step': 'test_step', 'args': {}}]
# rb.steps = exp_steps
# rb.disable_ramdisk = True
mock_get_runbook.return_value = rb
self.assertRaises(exception.NodeCleaningFailure,
self.service._get_steps_for_automated_cleaning,
self.task)
# NOTE(JayF): get_runbook is called in this case because we need to
# fetch the runbook to know it's incompatible
self.assertTrue(mock_get_runbook.called)
@mock.patch.object(manager.ConductorManager, 'get_runbook', autospec=True)
def test_gsfac_runbook_incompatible_with_node_ignored(
self, mock_get_runbook, mt):
bad_rb_name = "CUSTOM_INCOMPATIBLE_RB"
CONF.set_override('automated_cleaning_runbook_validate_traits', False,
group='conductor')
CONF.set_override('automated_cleaning_step_source', 'runbook',
group='conductor')
CONF.set_override('automated_cleaning_runbook_by_resource_class',
{'baremetal': bad_rb_name},
group='conductor')
# NOTE(JayF): We specifically are not adding bad_rb_name to traits,
# which means this should be considered incompatible
self.node.resource_class = 'baremetal'
rb = mock.Mock(spec_set=objects.Runbook)
rb.name = bad_rb_name
exp_steps = [{'interface': 'deploy', 'step': 'test_step', 'args': {}}]
rb.steps = exp_steps
rb.disable_ramdisk = True
mock_get_runbook.return_value = rb
steps, dr = self.service._get_steps_for_automated_cleaning(self.task)
mock_get_runbook.assert_called_once_with(mock.ANY, mock.ANY,
bad_rb_name)
self.assertEqual(steps, exp_steps)
self.assertTrue(dr)
@mock.patch.object(task_manager, 'acquire', autospec=True)
@mock.patch.object(manager.ConductorManager, '_mapped_to_this_conductor',
autospec=True)

View File

@@ -843,7 +843,8 @@ class NodeCleaningStepsTestCase(db_base.DbTestCase):
with task_manager.acquire(
self.context, node.uuid, shared=False) as task:
conductor_steps.set_node_cleaning_steps(task)
conductor_steps.set_node_cleaning_steps(task,
use_existing_steps=False)
node.refresh()
self.assertEqual(self.clean_steps,
node.driver_internal_info['clean_steps'])
@@ -870,7 +871,8 @@ class NodeCleaningStepsTestCase(db_base.DbTestCase):
with task_manager.acquire(
self.context, node.uuid, shared=False) as task:
conductor_steps.set_node_cleaning_steps(task)
conductor_steps.set_node_cleaning_steps(task,
use_existing_steps=True)
node.refresh()
self.assertEqual(clean_steps,
node.driver_internal_info['clean_steps'])

View File

@@ -334,7 +334,32 @@ class HeartbeatMixinTest(AgentDeployMixinBaseTest):
mock_touch.assert_called_once_with(mock.ANY)
mock_refresh.assert_called_once_with(mock.ANY, task, 'clean')
mock_clean.assert_called_once_with(task)
mock_set_steps.assert_called_once_with(task)
mock_set_steps.assert_called_once_with(task, use_existing_steps=False)
@mock.patch.object(objects.node.Node, 'touch_provisioning', autospec=True)
@mock.patch.object(agent_base.HeartbeatMixin,
'refresh_steps', autospec=True)
@mock.patch.object(conductor_steps, 'set_node_cleaning_steps',
autospec=True)
@mock.patch.object(cleaning, 'continue_node_clean', autospec=True)
def test_heartbeat_resume_clean_declarative(self, mock_clean,
mock_set_steps,
mock_refresh, mock_touch):
# Test declarative cleaning case where use_existing_steps=True
self.node.clean_step = {}
self.node.provision_state = states.CLEANWAIT
info = self.node.driver_internal_info
info['declarative_cleaning'] = True
self.node.driver_internal_info = info
self.node.save()
with task_manager.acquire(
self.context, self.node.uuid, shared=False) as task:
self.deploy.heartbeat(task, 'http://127.0.0.1:8080', '1.0.0')
mock_touch.assert_called_once_with(mock.ANY)
mock_refresh.assert_called_once_with(mock.ANY, task, 'clean')
mock_clean.assert_called_once_with(task)
mock_set_steps.assert_called_once_with(task, use_existing_steps=True)
@mock.patch.object(manager_utils, 'cleaning_error_handler', autospec=True)
@mock.patch.object(objects.node.Node, 'touch_provisioning', autospec=True)
@@ -1217,6 +1242,42 @@ class ContinueCleaningTest(AgentDeployMixinBaseTest):
self.deploy.continue_cleaning(task)
error_mock.assert_called_once_with(task, mock.ANY, traceback=False)
@mock.patch.object(cleaning, 'continue_node_clean', autospec=True)
@mock.patch.object(agent_client.AgentClient, 'get_commands_status',
autospec=True)
def test_continue_cleaning_declarative_flag_preserved(self, status_mock,
clean_mock):
# Test that declarative_cleaning flag is preserved during cleaning
self.node.clean_step = {
'priority': 10,
'interface': 'deploy',
'step': 'erase_devices',
'reboot_requested': False
}
info = self.node.driver_internal_info
info['declarative_cleaning'] = True
self.node.driver_internal_info = info
self.node.save()
status_mock.return_value = [{
'command_status': 'SUCCEEDED',
'command_name': 'execute_clean_step',
'command_result': {
'clean_step': self.node.clean_step
}
}]
with task_manager.acquire(self.context, self.node['uuid'],
shared=False) as task:
self.deploy.continue_cleaning(task)
clean_mock.assert_called_once_with(task)
self.assertEqual(states.CLEANING, task.node.provision_state)
self.assertEqual(states.AVAILABLE,
task.node.target_provision_state)
# Verify the flag is still present during cleaning
self.assertIn('declarative_cleaning',
task.node.driver_internal_info)
self.assertTrue(
task.node.driver_internal_info['declarative_cleaning'])
class ContinueServiceTest(AgentDeployMixinBaseTest):

View File

@@ -0,0 +1,43 @@
features:
- |
Ironic operators can now assign specific steps to run during automated
cleaning instead of relying on the driver-based implicit steps typically
used for automated cleaning.
To generally opt-in to this behavior, first set
``[conductor]/automated_cleaning_step_source`` to either 'hybrid' or
'runbook'. A setting of 'autogenerated' (default), preserves existing
behavior. Setting 'runbook' causes Ironic to attempt to find and use a
configured runbook for automated cleaning, and fails cleaning if one is
not found. The 'hybrid' setting causes Ironic to attempt to find and use a
configured runbook for automated cleaning. If one is not found, we fallback
to the default autogenerated cleaning.
Configuration for cleaning runbooks is tiered to allow maximum flexibility;
Ironic will try to find a cleaning runbook in the following places. Values
representing the runbook name or the runbook UUID are accepted; and the
most specific value found is used. First, ``node.driver_info['cleaning_runbook']``
is a node specific override, disabled by default. To enable it, set
``[conductor]automated_cleaning_runbook_from_node`` to True. Next,
``[conductor]automated_cleaning_runbook_by_resource_class`` allows you to
map resource classes to the expected runbook for them to use in
automated cleaning. The global default, and final fallback, is
``[conductor]automated_cleaning_runbook``.
As with normal runbook usage, the runbook name must match an active trait
in node.traits. This behavior can be changed by setting
``[conductor]automated_cleaning_runbook_validate_traits`` to False. A cleaning
attempt that resolves to an incompatible runbook will cause an error
and leave the node uncleaned in a clean fail state.
security:
- |
This change permits declarative cleaning via use of runbooks. Please note
there is no validation that a runbook performs typical, expected cleaning
actions such as a disk wipe. Operators should be careful to ensure they
are using sufficient steps to securely wipe the system.
Additionally, operators of multitenant Ironic clusters should be careful
before setting ``[conductor]automated_cleaning_runbook_from_node`` to True,
as it may permit a node owner to render cleaning ineffective. This is not
a concern in deployments utilizing node.lessee via automated_lessee.