diff --git a/doc/source/admin/index.rst b/doc/source/admin/index.rst index a2f0792fdf..204c6205fa 100644 --- a/doc/source/admin/index.rst +++ b/doc/source/admin/index.rst @@ -58,6 +58,7 @@ Advanced Topics Tuning Ironic Role Based Access Control Deploying with Anaconda + Steps .. toctree:: :hidden: diff --git a/doc/source/admin/steps.rst b/doc/source/admin/steps.rst new file mode 100644 index 0000000000..89bc6eac91 --- /dev/null +++ b/doc/source/admin/steps.rst @@ -0,0 +1,94 @@ +===== +Steps +===== + +What are steps? +=============== + +Steps are exactly that, steps to achieve a goal, and in most cases, they +are what an operator requested. + +However, originally they were the internal list of actions to achieve to +perform *automated cleaning*. The conductor would determine a list of +steps or actions to take by generating a list of steps from data the +conductor via drivers, the ``ironic-python-agent``, and any loaded +hardware managers determined to be needed. + +As time passed and Ironic's capabilities were extended, this was extended +to *manual cleaning*, and later into *deploy steps*, and *deploy templates* +allowing an operator to request for firmware to be updated by a driver, or +RAID to be configured by the agent prior to the machine being released +to the end user for use. + +Reserved Functional Steps +========================= +In the execution of the cleaning, and deployment steps frameworks, some step +names are reserved for specific functions which can be invoked by a user to +perform specific actions. + ++-----------+----------------------------------------------------------+ +| Step Name | Description | ++-----------+----------------------------------------------------------+ +| hold | Pauses the execution of the steps by moving the node | +| | from the current ``deploy wait`` or ``clean wait`` state | +| | to the appropriate "hold" state, such as ``deploy hold`` | +| | or ``clean hold``. The process can be resumed by sending | +| | a ``unhold`` verb to the provision state API endpoint | +| | which will result in the process resuming upon the next | +| | heartbeat operation. During this time, heartbeat | +| | operations will continue be recorded by Ironic, but will | +| | not be acted upon, preventing the node from timing out. | +| | | +| | This step cannot be used against a child node in the | +| | context of being requested when executing against a | +| | parent node. | +| | | +| | The use case for this verb is if you have external | +| | automation or processes which need to be executed in the | +| | entire process to achieve the overall goal. | ++-----------+----------------------------------------------------------+ +| power_on | Powers on the node, which may be useful if a node's | +| | power must be toggled multiple times to enable | +| | embedded behavior such as to boot from network. | +| | This step can be executed against child nodes. | ++-----------+----------------------------------------------------------+ +| power_off | Turn the node power off via the conductor. | +| | This step can be used against child nodes. When used | +| | outside of the context of a child node, any agent token | +| | metadata is also removed as so the machine can reboot | +| | back to the agent, if applicable. | ++-----------+----------------------------------------------------------+ +| reboot | Reboot the node utilizing the conductor. This generally | +| | signals for power to be turned off and back on, however | +| | driver specific code may request an CPU interrupt based | +| | reset. This step can be executed on child nodes. | ++-----------+----------------------------------------------------------+ + +In the these cases, the interface upon which the method is expected is +ignored, and the step is acted upon based upon just the step's name. + + +Example +------- + +In this example, we utilize the cleaning step ``erase_devices`` and then +trigger hold of the node. In this specific case the node will enter +a ``clean hold`` state. + +.. code-block:: json + + { + "target":"clean", + "clean_steps": [{ + "interface": "deploy", + "step": "erase_devices" + }, + { + "interface": "deploy", + "step": "hold" + }] + } + +Once you have completed whatever action which needed to be performed while +the node was in a held state, you will need to issue an unhold provision +state command, via the API or command line to inform the node to proceed. diff --git a/doc/source/contributor/webapi-version-history.rst b/doc/source/contributor/webapi-version-history.rst index b9f5f171e2..a08b674c2c 100644 --- a/doc/source/contributor/webapi-version-history.rst +++ b/doc/source/contributor/webapi-version-history.rst @@ -2,6 +2,15 @@ REST API Version History ======================== +1.85 (Bobcat) +------------- + +This version adds a new provision state change verb called ``unhold`` +to be utilized with the new ``provision_state`` values ``clean hold`` +and ``deploy hold``. The verb instructs Ironic to remove the node +from it's present hold and to resume it's prior cleaning or +deployment process. + 1.84 (Bobcat) ------------- diff --git a/doc/source/images/states.png b/doc/source/images/states.png new file mode 100644 index 0000000000..e63be47e48 Binary files /dev/null and b/doc/source/images/states.png differ diff --git a/doc/source/images/states.svg b/doc/source/images/states.svg deleted file mode 100644 index f4d7ed0440..0000000000 --- a/doc/source/images/states.svg +++ /dev/null @@ -1,511 +0,0 @@ - - - - - - -Ironic states - - -enroll - -enroll - - -verifying - -verifying - - -enroll->verifying - - -manage (via API) - - -verifying->enroll - - -fail - - -manageable - -manageable - - -verifying->manageable - - -done - - -cleaning - -cleaning - - -manageable->cleaning - - -provide (via API) - - -manageable->cleaning - - -clean (via API) - - -inspecting - -inspecting - - -manageable->inspecting - - -inspect (via API) - - -adopting - -adopting - - -manageable->adopting - - -adopt (via API) - - -cleaning->manageable - - -manage - - -available - -available - - -cleaning->available - - -done - - -clean failed - -clean failed - - -cleaning->clean failed - - -fail - - -clean wait - -clean wait - - -cleaning->clean wait - - -wait - - -inspecting->manageable - - -done - - -inspect failed - -inspect failed - - -inspecting->inspect failed - - -fail - - -inspect wait - -inspect wait - - -inspecting->inspect wait - - -wait - - -active - -active - - -adopting->active - - -done - - -adopt failed - -adopt failed - - -adopting->adopt failed - - -fail - - -available->manageable - - -manage (via API) - - -deploying - -deploying - - -available->deploying - - -active (via API) - - -deploying->active - - -done - - -deploy failed - -deploy failed - - -deploying->deploy failed - - -fail - - -wait call-back - -wait call-back - - -deploying->wait call-back - - -wait - - -active->deploying - - -rebuild (via API) - - -deleting - -deleting - - -active->deleting - - -deleted (via API) - - -rescuing - -rescuing - - -active->rescuing - - -rescue (via API) - - -deleting->cleaning - - -clean - - -error - -error - - -deleting->error - - -error - - -rescue - -rescue - - -rescuing->rescue - - -done - - -rescue wait - -rescue wait - - -rescuing->rescue wait - - -wait - - -rescue failed - -rescue failed - - -rescuing->rescue failed - - -fail - - -error->deploying - - -rebuild (via API) - - -error->deleting - - -deleted (via API) - - -rescue->deleting - - -deleted (via API) - - -rescue->rescuing - - -rescue (via API) - - -unrescuing - -unrescuing - - -rescue->unrescuing - - -unrescue (via API) - - -unrescuing->active - - -done - - -unrescue failed - -unrescue failed - - -unrescuing->unrescue failed - - -fail - - -deploy failed->deploying - - -rebuild (via API) - - -deploy failed->deploying - - -active (via API) - - -deploy failed->deleting - - -deleted (via API) - - -wait call-back->deploying - - -resume - - -wait call-back->deleting - - -deleted (via API) - - -wait call-back->deploy failed - - -fail - - -clean failed->manageable - - -manage (via API) - - -clean wait->cleaning - - -resume - - -clean wait->clean failed - - -fail - - -clean wait->clean failed - - -abort (via API) - - -inspect failed->manageable - - -manage (via API) - - -inspect failed->inspecting - - -inspect (via API) - - -inspect wait->manageable - - -done - - -inspect wait->inspect failed - - -fail - - -inspect wait->inspect failed - - -abort (via API) - - -adopt failed->manageable - - -manage (via API) - - -adopt failed->adopting - - -adopt (via API) - - -rescue wait->deleting - - -deleted (via API) - - -rescue wait->rescuing - - -resume - - -rescue wait->rescue failed - - -fail - - -rescue wait->rescue failed - - -abort (via API) - - -rescue failed->deleting - - -deleted (via API) - - -rescue failed->rescuing - - -rescue (via API) - - -rescue failed->unrescuing - - -unrescue (via API) - - -unrescue failed->deleting - - -deleted (via API) - - -unrescue failed->rescuing - - -rescue (via API) - - -unrescue failed->unrescuing - - -unrescue (via API) - - - diff --git a/doc/source/user/states.rst b/doc/source/user/states.rst index d13cbbe35b..c6eb8b70c7 100644 --- a/doc/source/user/states.rst +++ b/doc/source/user/states.rst @@ -17,7 +17,7 @@ API-initiated-transitions that are possible from non-stable states. The events for these API-initiated transitions are indicated with '(via API)'. Internally, the conductor initiates the other transitions (depicted in gray). -.. figure:: ../images/states.svg +.. figure:: ../images/states.png :width: 660px :align: left :alt: Ironic state transitions diff --git a/ironic/api/controllers/v1/node.py b/ironic/api/controllers/v1/node.py index 1a779b8384..94e1e5d844 100644 --- a/ironic/api/controllers/v1/node.py +++ b/ironic/api/controllers/v1/node.py @@ -122,7 +122,8 @@ _DEFAULT_RETURN_FIELDS = ['instance_uuid', 'maintenance', 'power_state', PROVISION_ACTION_STATES = (ir_states.VERBS['manage'], ir_states.VERBS['provide'], ir_states.VERBS['abort'], - ir_states.VERBS['adopt']) + ir_states.VERBS['adopt'], + ir_states.VERBS['unhold']) _NODES_CONTROLLER_RESERVED_WORDS = None @@ -1129,6 +1130,13 @@ class NodeStatesController(rest.RestController): if not api_utils.allow_inspect_abort(): raise exception.NotAcceptable() + if target == ir_states.VERBS['unhold']: + # NOTE(TheJulia): There is no solid reason to do state checks + # as well, other than add additional complexity as multiple + # states are involved. + if not api_utils.allow_unhold_verb(): + raise exception.NotAcceptable() + self._do_provision_action(rpc_node, target, configdrive, clean_steps, deploy_steps, rescue_password, disable_ramdisk) diff --git a/ironic/api/controllers/v1/utils.py b/ironic/api/controllers/v1/utils.py index feaefbdb78..23b9c24a26 100644 --- a/ironic/api/controllers/v1/utils.py +++ b/ironic/api/controllers/v1/utils.py @@ -1951,6 +1951,11 @@ def allow_status_in_heartbeat(): return api.request.version.minor >= versions.MINOR_72_HEARTBEAT_STATUS +def allow_unhold_verb(): + """Check if the unhold verb may be passed to the API""" + return api.request.version.minor >= versions.MINOR_85_UNHOLD_VERB + + def check_allow_deploy_steps(target, deploy_steps): """Check if deploy steps are allowed""" diff --git a/ironic/api/controllers/v1/versions.py b/ironic/api/controllers/v1/versions.py index 827f549a63..797de17f7a 100644 --- a/ironic/api/controllers/v1/versions.py +++ b/ironic/api/controllers/v1/versions.py @@ -122,6 +122,7 @@ BASE_VERSION = 1 # v1.82: Add node sharding capability # v1.83: Add child node modeling # v1.84: Add ramdisk callback to continue inspection. +# v1.85: Add unhold verb MINOR_0_JUNO = 0 MINOR_1_INITIAL_VERSION = 1 MINOR_2_AVAILABLE_STATE = 2 @@ -207,6 +208,7 @@ MINOR_81_NODE_INVENTORY = 81 MINOR_82_NODE_SHARD = 82 MINOR_83_PARENT_CHILD_NODES = 83 MINOR_84_CONTINUE_INSPECTION = 84 +MINOR_85_UNHOLD_VERB = 85 # When adding another version, update: # - MINOR_MAX_VERSION @@ -214,7 +216,7 @@ MINOR_84_CONTINUE_INSPECTION = 84 # explanation of what changed in the new version # - common/release_mappings.py, RELEASE_MAPPING['master']['api'] -MINOR_MAX_VERSION = MINOR_84_CONTINUE_INSPECTION +MINOR_MAX_VERSION = MINOR_85_UNHOLD_VERB # String representations of the minor and maximum versions _MIN_VERSION_STRING = '{}.{}'.format(BASE_VERSION, MINOR_1_INITIAL_VERSION) diff --git a/ironic/common/release_mappings.py b/ironic/common/release_mappings.py index ab8d3aa464..2fc39ed4d3 100644 --- a/ironic/common/release_mappings.py +++ b/ironic/common/release_mappings.py @@ -574,7 +574,7 @@ RELEASE_MAPPING = { } }, 'master': { - 'api': '1.84', + 'api': '1.85', 'rpc': '1.56', 'objects': { 'Allocation': ['1.1'], diff --git a/ironic/common/states.py b/ironic/common/states.py index ae87b32668..b3f913da44 100644 --- a/ironic/common/states.py +++ b/ironic/common/states.py @@ -52,6 +52,7 @@ VERBS = { 'adopt': 'adopt', 'rescue': 'rescue', 'unrescue': 'unrescue', + 'unhold': 'unhold', } """ Mapping of state-changing events that are PUT to the REST API @@ -129,6 +130,9 @@ This is mainly a target provision state used during deployment. A successfully deployed node should go to ACTIVE status. """ +DEPLOYHOLD = 'deploy hold' +""" Node is being held by a deploy step. """ + DELETING = 'deleting' """ Node is actively being torn down. """ @@ -159,6 +163,9 @@ the driver to finish a cleaning step. CLEANFAIL = 'clean failed' """ Node failed cleaning. This requires operator intervention to resolve. """ +CLEANHOLD = 'clean hold' +""" Node is a holding state due to a clean step. """ + ERROR = 'error' """ An error occurred during node processing. @@ -354,11 +361,13 @@ machine.add_state(VERIFYING, target=MANAGEABLE, **watchers) machine.add_state(DEPLOYING, target=ACTIVE, **watchers) machine.add_state(DEPLOYWAIT, target=ACTIVE, **watchers) machine.add_state(DEPLOYFAIL, target=ACTIVE, **watchers) +machine.add_state(DEPLOYHOLD, target=ACTIVE, **watchers) # Add clean* states machine.add_state(CLEANING, target=AVAILABLE, **watchers) machine.add_state(CLEANWAIT, target=AVAILABLE, **watchers) machine.add_state(CLEANFAIL, target=AVAILABLE, **watchers) +machine.add_state(CLEANHOLD, target=AVAILABLE, **watchers) # Add delete* states machine.add_state(DELETING, target=AVAILABLE, **watchers) @@ -393,11 +402,19 @@ machine.add_transition(DEPLOYFAIL, DEPLOYING, 'deploy') # A deployment may also wait on external callbacks machine.add_transition(DEPLOYING, DEPLOYWAIT, 'wait') +machine.add_transition(DEPLOYING, DEPLOYHOLD, 'hold') +machine.add_transition(DEPLOYWAIT, DEPLOYHOLD, 'hold') machine.add_transition(DEPLOYWAIT, DEPLOYING, 'resume') # A deployment waiting on callback may time out machine.add_transition(DEPLOYWAIT, DEPLOYFAIL, 'fail') +# Return the node into a deploying state from holding +machine.add_transition(DEPLOYHOLD, DEPLOYWAIT, 'unhold') + +# A node in deploy hold may also be aborted +machine.add_transition(DEPLOYHOLD, DEPLOYFAIL, 'abort') + # A deployment may complete machine.add_transition(DEPLOYING, ACTIVE, 'done') @@ -435,8 +452,16 @@ machine.add_transition(CLEANWAIT, CLEANFAIL, 'abort') # Cleaning may also wait on external callbacks machine.add_transition(CLEANING, CLEANWAIT, 'wait') +machine.add_transition(CLEANING, CLEANHOLD, 'hold') +machine.add_transition(CLEANWAIT, CLEANHOLD, 'hold') machine.add_transition(CLEANWAIT, CLEANING, 'resume') +# A node in a clean hold step may also be aborted +machine.add_transition(CLEANHOLD, CLEANFAIL, 'abort') + +# Return the node back to cleaning +machine.add_transition(CLEANHOLD, CLEANWAIT, 'unhold') + # An operator may want to move a CLEANFAIL node to MANAGEABLE, to perform # other actions like cleaning machine.add_transition(CLEANFAIL, MANAGEABLE, 'manage') diff --git a/ironic/conductor/cleaning.py b/ironic/conductor/cleaning.py index 1472151a79..01c2aac37e 100644 --- a/ironic/conductor/cleaning.py +++ b/ironic/conductor/cleaning.py @@ -159,7 +159,6 @@ def do_next_clean_step(task, step_index, disable_ramdisk=None): LOG.info('Executing %(kind)s cleaning on node %(node)s, remaining steps: ' '%(steps)s', {'node': node.uuid, 'steps': steps, 'kind': 'manual' if manual_clean else 'automated'}) - # Execute each step until we hit an async step or run out of steps for ind, step in enumerate(steps): # Save which step we're about to start so we can restart @@ -171,10 +170,20 @@ def do_next_clean_step(task, step_index, disable_ramdisk=None): result = None try: if not eocn: - interface = getattr(task.driver, step.get('interface')) LOG.info('Executing %(step)s on node %(node)s', {'step': step, 'node': node.uuid}) - if not conductor_steps.use_reserved_step_handler(task, step): + use_step_handler = conductor_steps.use_reserved_step_handler( + task, step) + if use_step_handler: + if use_step_handler == conductor_steps.EXIT_STEPS: + # Exit the step, i.e. hold step + return + # if use_step_handler == conductor_steps.USED_HANDLER + # Then we have completed the needful in the handler, + # but since there is no other value to check now, + # we know we just need to skip execute_deploy_step + else: + interface = getattr(task.driver, step.get('interface')) result = interface.execute_clean_step(task, step) else: LOG.info('Executing %(step)s on child nodes for node ' @@ -234,7 +243,6 @@ def do_next_clean_step(task, step_index, disable_ramdisk=None): return utils.cleaning_error_handler(task, msg) LOG.info('Node %(node)s finished clean step %(step)s', {'node': node.uuid, 'step': step}) - if CONF.agent.deploy_logs_collect == 'always' and not disable_ramdisk: driver_utils.collect_ramdisk_logs(task.node, label='cleaning') diff --git a/ironic/conductor/deployments.py b/ironic/conductor/deployments.py index 3ffb54c39f..ba49b4b902 100644 --- a/ironic/conductor/deployments.py +++ b/ironic/conductor/deployments.py @@ -278,7 +278,18 @@ def do_next_deploy_step(task, step_index): interface = getattr(task.driver, step.get('interface')) LOG.info('Executing %(step)s on node %(node)s', {'step': step, 'node': node.uuid}) - if not conductor_steps.use_reserved_step_handler(task, step): + use_step_handler = conductor_steps.use_reserved_step_handler( + task, step) + if use_step_handler: + if use_step_handler == conductor_steps.EXIT_STEPS: + # Exit the step, i.e. hold step + return + # if use_step_handler == conductor_steps.USED_HANDLER + # Then we have completed the needful in the handler, + # but since there is no other value to check now, + # we know we just need to skip execute_deploy_step + else: + interface = getattr(task.driver, step.get('interface')) result = interface.execute_deploy_step(task, step) else: LOG.info('Executing %(step)s on child nodes for node ' diff --git a/ironic/conductor/manager.py b/ironic/conductor/manager.py index 076fea2441..fc53595908 100644 --- a/ironic/conductor/manager.py +++ b/ironic/conductor/manager.py @@ -1304,10 +1304,21 @@ class ConductorManager(base_manager.BaseConductorManager): if (action == states.VERBS['abort'] and node.provision_state in (states.CLEANWAIT, states.RESCUEWAIT, - states.INSPECTWAIT)): + states.INSPECTWAIT, + states.CLEANHOLD, + states.DEPLOYHOLD)): self._do_abort(task) return + if (action == states.VERBS['unhold'] + and node.provision_state in (states.CLEANHOLD, + states.DEPLOYHOLD)): + # NOTE(TheJulia): Release the node from the hold, which + # allows the next heartbeat action to pick the node back + # up and it continue it's operation. + task.process_event('unhold') + return + try: task.process_event(action) except exception.InvalidState: @@ -1319,7 +1330,7 @@ class ConductorManager(base_manager.BaseConductorManager): """Handle node abort for certain states.""" node = task.node - if node.provision_state == states.CLEANWAIT: + if node.provision_state in (states.CLEANWAIT, states.CLEANHOLD): # Check if the clean step is abortable; if so abort it. # Otherwise, indicate in that clean step, that cleaning # should be aborted after that step is done. @@ -1368,6 +1379,18 @@ class ConductorManager(base_manager.BaseConductorManager): if node.provision_state == states.INSPECTWAIT: return inspection.abort_inspection(task) + if node.provision_state == states.DEPLOYHOLD: + # Immediately break agent API interaction + # and align with do_node_tear_down + utils.remove_agent_url(task.node) + # And then call the teardown + task.process_event( + 'abort', + callback=self._spawn_worker, + call_args=(self._do_node_tear_down, task, + task.node.provision_state), + err_handler=utils.provisioning_error_handler) + @METRICS.timer('ConductorManager._sync_power_states') @periodics.periodic(spacing=CONF.conductor.sync_power_state_interval, enabled=CONF.conductor.sync_power_state_interval > 0) diff --git a/ironic/conductor/steps.py b/ironic/conductor/steps.py index 59429bb8a1..19953338d8 100644 --- a/ironic/conductor/steps.py +++ b/ironic/conductor/steps.py @@ -80,6 +80,10 @@ RESERVED_STEP_HANDLER_MAPPING = { 'reboot': [utils.node_power_action, states.REBOOT], } +# values to enable declariation of how to handle reserved step names +USED_HANDLER = 'used_handler' +EXIT_STEPS = 'exit_steps' + def _clean_step_key(step): """Sort by priority, then interface priority in event of tie. @@ -811,7 +815,11 @@ def validate_user_deploy_steps_and_templates(task, deploy_steps=None, def use_reserved_step_handler(task, step): - """Returns True if reserved step execution is used, otherwise False. + """Returns guidance for reserved step execution or process is used. + + This method is utilized to handle some specific cases with the execution + of steps. For example, reserved step names, or reserved names which + have specific meaning in the state machine. :param task: a TaskManager object. :param step: The requested step. @@ -822,5 +830,10 @@ def use_reserved_step_handler(task, step): method = call_to_use[0] parameter = call_to_use[1] method(task, parameter) - return True - return False + return USED_HANDLER + if step_name == 'hold': + task.process_event('hold') + return EXIT_STEPS + # If we've reached this point, we're going to return None as + # there is no work for us to do. This allows the caller to + # take its normal path. diff --git a/ironic/drivers/modules/agent_base.py b/ironic/drivers/modules/agent_base.py index 7732840d59..94f316e380 100644 --- a/ironic/drivers/modules/agent_base.py +++ b/ironic/drivers/modules/agent_base.py @@ -84,19 +84,22 @@ VENDOR_PROPERTIES = { } __HEARTBEAT_RECORD_ONLY = (states.ENROLL, states.MANAGEABLE, states.AVAILABLE, - states.CLEANING, states.DEPLOYING, states.RESCUING) + states.CLEANING, states.DEPLOYING, states.RESCUING, + states.DEPLOYHOLD, states.CLEANHOLD) _HEARTBEAT_RECORD_ONLY = frozenset(__HEARTBEAT_RECORD_ONLY) _HEARTBEAT_ALLOWED = (states.DEPLOYWAIT, states.CLEANWAIT, states.RESCUEWAIT, # These are allowed but don't cause any actions since # they're also in HEARTBEAT_RECORD_ONLY. - states.DEPLOYING, states.CLEANING, states.RESCUING) + states.DEPLOYING, states.CLEANING, states.RESCUING, + states.DEPLOYHOLD, states.CLEANHOLD) HEARTBEAT_ALLOWED = frozenset(_HEARTBEAT_ALLOWED) _FASTTRACK_HEARTBEAT_ALLOWED = (states.DEPLOYWAIT, states.CLEANWAIT, states.RESCUEWAIT, states.ENROLL, states.MANAGEABLE, states.AVAILABLE, - states.DEPLOYING) + states.DEPLOYING, states.CLEANHOLD, + states.DEPLOYHOLD) FASTTRACK_HEARTBEAT_ALLOWED = frozenset(_FASTTRACK_HEARTBEAT_ALLOWED) diff --git a/ironic/tests/unit/api/controllers/v1/test_node.py b/ironic/tests/unit/api/controllers/v1/test_node.py index f4f17f69ee..083bab870b 100644 --- a/ironic/tests/unit/api/controllers/v1/test_node.py +++ b/ironic/tests/unit/api/controllers/v1/test_node.py @@ -6403,6 +6403,92 @@ ORHMKeXMO8fcK0By7CiMKwHSXCoEQgfQhWwpMdSsO8LgHCjh87DQc= """ self.assertEqual(http_client.BAD_REQUEST, ret.status_code) self.assertEqual(0, mock_dpa.call_count) + @mock.patch.object(rpcapi.ConductorAPI, 'do_provisioning_action', + autospec=True) + def test_unhold_cleanhold(self, mock_dpa): + self.node.provision_state = states.CLEANHOLD + self.node.save() + + ret = self.put_json('/nodes/%s/states/provision' % self.node.uuid, + {'target': states.VERBS['unhold']}, + headers={api_base.Version.string: "1.85"}) + self.assertEqual(http_client.ACCEPTED, ret.status_code) + self.assertEqual(b'', ret.body) + mock_dpa.assert_called_once_with(mock.ANY, mock.ANY, self.node.uuid, + states.VERBS['unhold'], + 'test-topic') + + @mock.patch.object(rpcapi.ConductorAPI, 'do_provisioning_action', + autospec=True) + def test_abort_cleanhold(self, mock_dpa): + self.node.provision_state = states.CLEANHOLD + self.node.save() + + ret = self.put_json('/nodes/%s/states/provision' % self.node.uuid, + {'target': states.VERBS['abort']}, + headers={api_base.Version.string: "1.85"}) + self.assertEqual(http_client.ACCEPTED, ret.status_code) + self.assertEqual(b'', ret.body) + mock_dpa.assert_called_once_with(mock.ANY, mock.ANY, self.node.uuid, + states.VERBS['abort'], + 'test-topic') + + @mock.patch.object(rpcapi.ConductorAPI, 'do_provisioning_action', + autospec=True) + def test_unhold_deployhold(self, mock_dpa): + self.node.provision_state = states.DEPLOYHOLD + self.node.save() + + ret = self.put_json('/nodes/%s/states/provision' % self.node.uuid, + {'target': states.VERBS['unhold']}, + headers={api_base.Version.string: "1.85"}) + self.assertEqual(http_client.ACCEPTED, ret.status_code) + self.assertEqual(b'', ret.body) + mock_dpa.assert_called_once_with(mock.ANY, mock.ANY, self.node.uuid, + states.VERBS['unhold'], + 'test-topic') + + @mock.patch.object(rpcapi.ConductorAPI, 'do_provisioning_action', + autospec=True) + def test_abort_deployhold(self, mock_dpa): + self.node.provision_state = states.DEPLOYHOLD + self.node.save() + + ret = self.put_json('/nodes/%s/states/provision' % self.node.uuid, + {'target': states.VERBS['abort']}, + headers={api_base.Version.string: "1.85"}) + self.assertEqual(http_client.ACCEPTED, ret.status_code) + self.assertEqual(b'', ret.body) + mock_dpa.assert_called_once_with(mock.ANY, mock.ANY, self.node.uuid, + states.VERBS['abort'], + 'test-topic') + + @mock.patch.object(rpcapi.ConductorAPI, 'do_provisioning_action', + autospec=True) + def test_unhold_cleanhold_not_allowed(self, mock_dpa): + self.node.provision_state = states.CLEANHOLD + self.node.save() + + ret = self.put_json('/nodes/%s/states/provision' % self.node.uuid, + {'target': states.VERBS['unhold']}, + headers={api_base.Version.string: "1.84"}, + expect_errors=True) + self.assertEqual(http_client.NOT_ACCEPTABLE, ret.status_code) + mock_dpa.assert_not_called() + + @mock.patch.object(rpcapi.ConductorAPI, 'do_provisioning_action', + autospec=True) + def test_unhold_deployhold_not_allowed(self, mock_dpa): + self.node.provision_state = states.DEPLOYHOLD + self.node.save() + + ret = self.put_json('/nodes/%s/states/provision' % self.node.uuid, + {'target': states.VERBS['unhold']}, + headers={api_base.Version.string: "1.84"}, + expect_errors=True) + self.assertEqual(http_client.NOT_ACCEPTABLE, ret.status_code) + mock_dpa.assert_not_called() + def test_set_console_mode_enabled(self): with mock.patch.object(rpcapi.ConductorAPI, 'set_console_mode', diff --git a/ironic/tests/unit/conductor/test_cleaning.py b/ironic/tests/unit/conductor/test_cleaning.py index 6b3451fef6..ec79de3eaa 100644 --- a/ironic/tests/unit/conductor/test_cleaning.py +++ b/ironic/tests/unit/conductor/test_cleaning.py @@ -1136,6 +1136,36 @@ class DoNodeCleanTestCase(db_base.DbTestCase): def test__do_next_clean_step_manual_bad_step_return_value(self): self._do_next_clean_step_bad_step_return_value(manual=True) + def _test_do_next_clean_step_handles_hold(self, start_state): + node = obj_utils.create_test_node( + self.context, driver='fake-hardware', + provision_state=start_state, + driver_internal_info={ + 'clean_steps': [ + { + 'step': 'hold', + 'priority': 10, + 'interface': 'power' + } + ], + 'clean_step_index': None}, + clean_step=None) + + with task_manager.acquire( + self.context, node.uuid, shared=False) as task: + cleaning.do_next_clean_step(task, 0) + node.refresh() + + self.assertEqual(states.CLEANHOLD, node.provision_state) + + def test_do_next_clean_step_handles_hold_from_active(self): + # Start is from the conductor + self._test_do_next_clean_step_handles_hold(states.CLEANING) + + def test_do_next_clean_step_handles_hold_from_wait(self): + # Start is the continuation from a heartbeat. + self._test_do_next_clean_step_handles_hold(states.CLEANWAIT) + @mock.patch.object(cleaning, 'do_next_clean_step', autospec=True) def _continue_node_clean(self, mock_next_step, skip=True): # test that skipping current step mechanism works @@ -1168,7 +1198,8 @@ class DoNodeCleanTestCase(db_base.DbTestCase): class DoNodeCleanAbortTestCase(db_base.DbTestCase): @mock.patch.object(fake.FakeDeploy, 'tear_down_cleaning', autospec=True) - def _test_do_node_clean_abort(self, clean_step, tear_mock): + def _test_do_node_clean_abort(self, clean_step, + tear_mock=None): node = obj_utils.create_test_node( self.context, driver='fake-hardware', provision_state=states.CLEANWAIT, @@ -1228,6 +1259,24 @@ class DoNodeCleanAbortTestCase(db_base.DbTestCase): self.assertTrue(task.node.maintenance) self.assertEqual('clean failure', task.node.fault) + @mock.patch.object(fake.FakeDeploy, 'tear_down_cleaning', autospec=True) + def test__do_node_cleanhold_abort_tear_down_fail(self, tear_mock): + tear_mock.side_effect = Exception('Surprise') + + node = obj_utils.create_test_node( + self.context, driver='fake-hardware', + provision_state=states.CLEANHOLD, + target_provision_state=states.MANAGEABLE, + clean_step={'step': 'hold', 'abortable': True}) + + with task_manager.acquire(self.context, node.uuid) as task: + cleaning.do_node_clean_abort(task) + tear_mock.assert_called_once_with(task.driver.deploy, task) + self.assertIsNotNone(task.node.last_error) + self.assertIsNotNone(task.node.maintenance_reason) + self.assertTrue(task.node.maintenance) + self.assertEqual('clean failure', task.node.fault) + class DoNodeCleanTestChildNodes(db_base.DbTestCase): def setUp(self): diff --git a/ironic/tests/unit/conductor/test_deployments.py b/ironic/tests/unit/conductor/test_deployments.py index 4834f2da4a..61de1b54af 100644 --- a/ironic/tests/unit/conductor/test_deployments.py +++ b/ironic/tests/unit/conductor/test_deployments.py @@ -966,6 +966,34 @@ class DoNextDeployStepTestCase(mgr_utils.ServiceSetUpMixin, mock_execute.assert_called_once_with( mock.ANY, mock.ANY, self.deploy_steps[0]) + def _test_do_next_deploy_step_handles(self, start_state): + node = obj_utils.create_test_node( + self.context, driver='fake-hardware', + provision_state=start_state, + driver_internal_info={ + 'deploy_steps': [ + { + 'step': 'hold', + 'priority': 10, + 'interface': 'deploy' + } + ], + 'deploy_step_index': None}, + clean_step=None) + with task_manager.acquire( + self.context, node.uuid, shared=False) as task: + deployments.do_next_deploy_step(task, 0) + node.refresh() + self.assertEqual(states.DEPLOYHOLD, node.provision_state) + + def test_do_next_deploy_step_handles_hold_from_active(self): + # Prior step/action was out of the conductor + self._test_do_next_deploy_step_handles(states.DEPLOYING) + + def test_do_next_deploy_step_handles_hold_from_wait(self): + # Prior step was async in a wait state + self._test_do_next_deploy_step_handles(states.DEPLOYWAIT) + @mock.patch.object(deployments, 'do_next_deploy_step', autospec=True) @mock.patch.object(conductor_steps, '_get_steps', autospec=True) def _continue_node_deploy(self, mock__get_steps, mock_next_step, diff --git a/ironic/tests/unit/conductor/test_manager.py b/ironic/tests/unit/conductor/test_manager.py index 4e0be81c93..b8d4ccebd5 100644 --- a/ironic/tests/unit/conductor/test_manager.py +++ b/ironic/tests/unit/conductor/test_manager.py @@ -2764,6 +2764,53 @@ class DoProvisioningActionTestCase(mgr_utils.ServiceSetUpMixin, self.assertEqual(states.CLEANWAIT, node.provision_state) self.assertEqual(states.AVAILABLE, node.target_provision_state) + @mock.patch('ironic.conductor.manager.ConductorManager._spawn_worker', + autospec=True) + def _do_provision_action_abort_from_cleanhold(self, mock_spawn, + manual=False): + tgt_prov_state = states.MANAGEABLE if manual else states.AVAILABLE + node = obj_utils.create_test_node( + self.context, driver='fake-hardware', + provision_state=states.CLEANHOLD, + target_provision_state=tgt_prov_state) + + self._start_service() + self.service.do_provisioning_action(self.context, node.uuid, 'abort') + node.refresh() + # Node will be moved to tgt_prov_state after cleaning, not tested here + self.assertEqual(states.CLEANFAIL, node.provision_state) + self.assertEqual(tgt_prov_state, node.target_provision_state) + self.assertEqual('By request, the clean operation was aborted', + node.last_error) + mock_spawn.assert_called_with( + self.service, cleaning.do_node_clean_abort, mock.ANY) + + def test_do_provision_action_abort_cleanhold_automated_clean(self): + self._do_provision_action_abort_from_cleanhold() + + def test_do_provision_action_abort_cleanhold_manual_clean(self): + self._do_provision_action_abort_from_cleanhold(manual=True) + + @mock.patch('ironic.conductor.manager.ConductorManager._spawn_worker', + autospec=True) + def test_do_provision_action_abort_from_deployhold(self, mock_spawn): + node = obj_utils.create_test_node( + self.context, driver='fake-hardware', + provision_state=states.DEPLOYHOLD, + driver_internal_info={ + 'agent_url': 'https://foo.bar/' + }) + + self._start_service() + self.service.do_provisioning_action(self.context, node.uuid, 'abort') + node.refresh() + mock_spawn.assert_called_with( + self.service, + self.service._do_node_tear_down, + mock.ANY, + states.DEPLOYHOLD) + self.assertNotIn('agent_url', node.driver_internal_info) + @mgr_utils.mock_record_keepalive class DoNodeCleanTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase): @@ -3059,6 +3106,90 @@ class DoNodeCleanTestCase(mgr_utils.ServiceSetUpMixin, db_base.DbTestCase): self.service.continue_node_clean(self.context, node.uuid) self._stop_service() + @mock.patch('ironic.conductor.manager.ConductorManager._spawn_worker', + autospec=True) + def test_do_provision_action_unlocks_cleaning_manual(self, mock_spawn): + node = obj_utils.create_test_node( + self.context, driver='fake-hardware', + provision_state=states.CLEANHOLD, + target_provision_state=states.MANAGEABLE, + driver_internal_info={ + 'clean_steps': [ + {'step': 'hold', 'priority': 9, 'interface': 'power'}, + {'step': 'update_firmware', 'priority': 10, + 'interface': 'power'} + ], + 'clean_step_index': 1 + } + ) + self._start_service() + self.service.do_provisioning_action(self.context, node.uuid, 'unhold') + node.refresh() + self.assertIsNone(node.last_error) + self.assertEqual(states.CLEANWAIT, node.provision_state) + self.service.continue_node_clean(self.context, node.uuid) + node.refresh() + self.assertIsNone(node.last_error) + self.assertEqual(states.CLEANING, node.provision_state) + self._stop_service() + + @mock.patch('ironic.conductor.manager.ConductorManager._spawn_worker', + autospec=True) + def test_do_provision_action_unlocks_cleaning_automated(self, mock_spawn): + node = obj_utils.create_test_node( + self.context, driver='fake-hardware', + provision_state=states.CLEANHOLD, + # NOTE(TheJulia): This actually tests should this occur with + # automated cleaning. While not an explicit feature, we don't + # want things to go sideways in this case. + target_provision_state=states.AVAILABLE, + driver_internal_info={ + 'clean_steps': [ + {'step': 'hold', 'priority': 9, 'interface': 'power'}, + {'step': 'update_firmware', 'priority': 10, + 'interface': 'power'} + ], + 'clean_step_index': 1 + } + ) + self._start_service() + self.service.do_provisioning_action(self.context, node.uuid, 'unhold') + node.refresh() + self.assertIsNone(node.last_error) + self.assertEqual(states.CLEANWAIT, node.provision_state) + self.service.continue_node_clean(self.context, node.uuid) + node.refresh() + self.assertIsNone(node.last_error) + self.assertEqual(states.CLEANING, node.provision_state) + self._stop_service() + + @mock.patch('ironic.conductor.manager.ConductorManager._spawn_worker', + autospec=True) + def test_do_provision_action_unlocks_deploying(self, mock_spawn): + node = obj_utils.create_test_node( + self.context, driver='fake-hardware', + provision_state=states.DEPLOYHOLD, + target_provision_state=states.ACTIVE, + driver_internal_info={ + 'clean_steps': [ + {'step': 'hold', 'priority': 9, 'interface': 'power'}, + {'step': 'update_firmware', 'priority': 10, + 'interface': 'power'} + ], + 'deploy_step_index': 1 + } + ) + self._start_service() + self.service.do_provisioning_action(self.context, node.uuid, 'unhold') + node.refresh() + self.assertIsNone(node.last_error) + self.assertEqual(states.DEPLOYWAIT, node.provision_state) + self.service.continue_node_deploy(self.context, node.uuid) + node.refresh() + self.assertIsNone(node.last_error) + self.assertEqual(states.DEPLOYING, node.provision_state) + self._stop_service() + class DoNodeRescueTestCase(mgr_utils.CommonMixIn, mgr_utils.ServiceSetUpMixin, db_base.DbTestCase): @@ -8275,7 +8406,6 @@ class NodeHistoryRecordCleanupTestCase(mgr_utils.ServiceSetUpMixin, self.assertEqual(8, len(events)) self.service._manage_node_history(self.context) events = objects.NodeHistory.list(self.context) - print(events) self.assertEqual(6, len(events)) events = objects.NodeHistory.list_by_node_id(self.context, 10) self.assertEqual(2, len(events)) diff --git a/ironic/tests/unit/drivers/modules/test_agent_base.py b/ironic/tests/unit/drivers/modules/test_agent_base.py index f8b23e9fbf..b9e09f324e 100644 --- a/ironic/tests/unit/drivers/modules/test_agent_base.py +++ b/ironic/tests/unit/drivers/modules/test_agent_base.py @@ -231,7 +231,8 @@ class HeartbeatMixinTest(AgentDeployMixinBaseTest): autospec=True) def test_heartbeat_noops_in_wrong_state(self, next_step_mock, log_mock): allowed = {states.DEPLOYWAIT, states.CLEANWAIT, states.RESCUEWAIT, - states.DEPLOYING, states.CLEANING, states.RESCUING} + states.DEPLOYING, states.CLEANING, states.RESCUING, + states.DEPLOYHOLD, states.CLEANHOLD} for state in set(states.machine.states) - allowed: for m in (next_step_mock, log_mock): m.reset_mock() @@ -463,8 +464,9 @@ class HeartbeatMixinTest(AgentDeployMixinBaseTest): task, 'Node failed to perform rescue operation: some failure') @mock.patch.object(agent_base.LOG, 'error', autospec=True) - def test_heartbeat_records_cleaning_deploying(self, log_mock): - for provision_state in (states.CLEANING, states.DEPLOYING): + def test_heartbeat_records_when_appropriate(self, log_mock): + for provision_state in (states.CLEANING, states.DEPLOYING, + states.CLEANHOLD, states.DEPLOYHOLD): self.node.driver_internal_info = {} self.node.provision_state = provision_state self.node.save() diff --git a/releasenotes/notes/add-hold-states-7be5804d6f3a119a.yaml b/releasenotes/notes/add-hold-states-7be5804d6f3a119a.yaml new file mode 100644 index 0000000000..5eec68bc32 --- /dev/null +++ b/releasenotes/notes/add-hold-states-7be5804d6f3a119a.yaml @@ -0,0 +1,18 @@ +--- +features: + - | + Adds a ``clean hold`` and a ``deploy hold`` provision state in which + baremetal nodes can be put in utilizing specialed ``hold`` cleaning + and deployment steps. Allowing for patterns and processes where + Ironic's work is intentionally paused to allow for any external or + operator processes to take place. In these new states, a ``unhold`` + provision state verb can be used to inform Ironic to proceed. + The ``abort`` verb is also a possible option should operators wish + to start over. + - | + Adds the ability to send an ``unhold`` provision state verb utilizing + API version *1.84*. +other: + - | + Fixes the generated state machine diagram and updates it to match the + current state of the code. diff --git a/tools/states_to_dot.py b/tools/states_to_dot.py index 10c10972b7..c44dcbb4cc 100755 --- a/tools/states_to_dot.py +++ b/tools/states_to_dot.py @@ -1,4 +1,4 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 # Copyright (C) 2014 Yahoo! Inc. All Rights Reserved. # diff --git a/tox.ini b/tox.ini index 57ea77e267..e049113400 100644 --- a/tox.ini +++ b/tox.ini @@ -35,6 +35,7 @@ deps = {[testenv]deps} -r{toxinidir}/driver-requirements.txt [testenv:genstates] +allowlist_externals = {toxinidir}/tools/states_to_dot.py deps = {[testenv]deps} pydot2 commands = {toxinidir}/tools/states_to_dot.py -f {toxinidir}/doc/source/images/states.svg --format svg @@ -92,7 +93,7 @@ deps = -c{env:TOX_CONSTRAINTS_FILE:https://releases.openstack.org/constraints/upper/master} -r{toxinidir}/requirements.txt -r{toxinidir}/doc/requirements.txt -commands = sphinx-build -b html -W doc/source doc/build/html +commands = sphinx-build -b html doc/source doc/build/html [testenv:pdf-docs] allowlist_externals = make