Enabling VDU auto-healing
This spec aim to add new action "vdu_autohealing" to bring back the failed VDU instead of existing action "respawn" that performs deleting entire VNF and creating a new one. Change-Id: Idb4f22faa9327ee8d63e545aaa8916d1ec7d3f68 Implements: blueprint vdu-auto-healing Co-Authored-By: Tushar Patil <tushar.vitthal.patil@gmail.com> Co-Authered-By: Bhagyashri Shewale <bhagyashri.shewale@nttdata.com>
This commit is contained in:
parent
cf9b74dcf3
commit
5a6c75d55d
188
specs/stein/vdu-auto-healing.rst
Normal file
188
specs/stein/vdu-auto-healing.rst
Normal file
@ -0,0 +1,188 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
|
||||
================
|
||||
VDU auto healing
|
||||
================
|
||||
|
||||
https://blueprints.launchpad.net/tacker/+spec/vdu-auto-healing
|
||||
|
||||
With anti-affinity policy now in place, it's possible to deploy high
|
||||
available VNF applications (High availability will be taken care by the
|
||||
application running on VDUs). If one of the VDUs is not responding, then
|
||||
the existing ``respawn`` action deletes entire stack and creates new
|
||||
ones.
|
||||
|
||||
Our plan is to add a new action ``vdu_autohealing`` to bring back the
|
||||
failed VDU instead of deleting entire stack and creating a new one.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
If one of the VDUs is not responding, then there is no way to bring back
|
||||
that particular failed VDU as the existing ``respawn`` action deletes
|
||||
the entire stack and creates a new one. If all VDUs are deleted and
|
||||
replaced by new ones, high availability feature will be hampered as
|
||||
there will be some down time until the VDUs are back again.
|
||||
|
||||
Our plan is to add a new action ``vdu_autohealing`` to bring back the
|
||||
failed VDU thereby enabling other VDU in VNF to switch over to master to
|
||||
keep services uninterrupted.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Add a new action 'vdu_autohealing' which will first mark the status
|
||||
(here, this is the status of "Heat" side) of
|
||||
VDU and CPs assigned to that particular VDU as unhealthy using
|
||||
'resource-mark-unhealthy' heat api and then the second step will be to
|
||||
update the stack which will bring back the VDU and CPs marked as
|
||||
unhealthy to "CHECK_COMPLETE" again. Internally, when stack is updated,
|
||||
heat deletes the VDUs and CPs marked as unhealthy and replaces it with a
|
||||
new ones. The heat-apis will be called from respective infra driver used
|
||||
by the VNF. Presently, we are going to support this action for
|
||||
`openstack` infra driver. After stack is updated, it will keep checking
|
||||
the status of VNF to `CREATE_COMPLETE` until that point monitoring of
|
||||
that particular VNF will be stopped.
|
||||
|
||||
Note: No plan to support this new action `vdu_autohealing` for policy
|
||||
type `tosca.policies.tacker.alarming` so this action will not be
|
||||
included in constant `DEFAULT_ALARM_ACTIONS` as defined in
|
||||
tacker.plugins.common.constants. If this action needs to be supported
|
||||
for policy type alarms, then there is need to pass `metadata` where the
|
||||
action is invoked. This metadata contains the name of the
|
||||
`metering.server_group` that you specify in VDU metadata. In the action
|
||||
itself, based on policy type and `metadata`, we can scan VNFD template
|
||||
available in 'vnf_dict' parameter to get VDU name. Once we know the VDU
|
||||
name, the same logic will be reused for auto-healing.
|
||||
|
||||
An example of VNFD is shown below. The VDU will be monitored using any
|
||||
of the existing monitoring drivers. If it fails to monitor any specific
|
||||
VDU, it will execute the new action `vdu_autohealing`.
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
:caption: Example VNFD
|
||||
tosca_definitions_version: tosca_simple_profile_for_nfv_1_0_0
|
||||
description: Monitoring policy action : vdu_autohealing
|
||||
topology_templete:
|
||||
node_templates:
|
||||
VDU:
|
||||
type: tosca.nodes.nfv.VDU.Tacker
|
||||
# ...snip...
|
||||
properties:
|
||||
monitoring_policy:
|
||||
name: ping
|
||||
parameters:
|
||||
monitoring_delay: 45
|
||||
count: 3
|
||||
interval: 1
|
||||
timeout: 2
|
||||
actions:
|
||||
failure: vdu_autohealing
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The other alternative solution is the same one as described in
|
||||
alternative section of vdu-affinity-policy spec [#f1]_. In NSD, we can
|
||||
add two VNFs, one for active and another for standby. If tacker detects
|
||||
any issues during monitoring, then `respawn` action will delete that
|
||||
particular VNF and create a new one. If the failed VNF is active, then
|
||||
the other standby VNF will become active and continue servicing request
|
||||
without interruption.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Bhagyashri Shewale <bhagyashri.shewale@nttdata.com>
|
||||
|
||||
Other contributors:
|
||||
Hiroyuki Jo <jo.hiroyuki@lab.ntt.co.jp>
|
||||
|
||||
Tushar Patil <tushar.vitthal.patil@gmail.com>
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Add `vdu_autohealing` action to mark VDU status to unhealthy and
|
||||
update stack
|
||||
* Unit Tests
|
||||
* Functional Tests
|
||||
* Update documentation
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Unit and functional tests are sufficient to test `vdu_autohealing`
|
||||
action.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
* Add VNFD tosca-template under samples to show how to configure
|
||||
`vdu_autohealing` action.
|
||||
|
||||
* Add a new action `vdu_autohealing` in Tacker Monitoring Framework
|
||||
[#f2]_.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [#f1] https://specs.openstack.org/openstack/tacker-specs/specs/rocky/vdu-affinity-policy.html
|
||||
.. [#f2] https://docs.openstack.org/tacker/latest/contributor/monitor-api.html
|
Loading…
x
Reference in New Issue
Block a user