Change-Id: I3362292106d104df1a379182502c8ab8a603641e Story: 2005830 Task: 33587
4.1 KiB
Load Balancer Member Respawning
As a cloud operator, whenever a load balancer member node fails, I want the load balancer to stop directing traffic to the failed member and for a new member to be spawned.
Fault class
- Hardware failure
- Software error
- Network failure
OpenStack projects used
- Openstack Aodh (telemetry alarm service)
- Openstack Heat (orchestration)
- Openstack Octavia (load balancer as a service)
Remediation class
- Reactive
Fault detection
From the Octavia admin guide:
Octavia will use the health information from the underlying load balancing application to determine the health of members. This information will be streamed to the Octavia database and made available via the status tree or other API methods.
In addition, an Aodh alarm is defined to detect load balancer member
node failure and trigger the alarm action to notify Heat. This
loadbalancer_member_health
type alarm rule was added to Aodh in April
2019, and at the time of writing a patch is under review to add a Heat resource for
creating this alarm type automatically via Heat templates. It is
intended to update this document later with sample Heat templates.
Inputs, decision-making, and remediation
- Octavia's builtin behavior automatically stops directing traffic to the unresponsive member node.
- Heat receives the Aodh alarm regarding the unresponsive member node, and according to the behavior defined in the stack template, spawns a new instance to replace the unresponsive member node.
- Octavia detects when the new member node is operational and begins directing some traffic to the new node.
Existing implementation(s)
A demo video is available here.