From dcd3f516d2fa44c4056a307a11f6e14433476fb0 Mon Sep 17 00:00:00 2001 From: Matt Riedemann Date: Fri, 25 Oct 2019 16:42:09 -0400 Subject: [PATCH] doc: add troubleshooting guide for cleaning up orphaned allocations While we do not have an automated fix for bug 1849479 this provides a troubleshooting document for working around that issue where allocations from a server that was evacuated from a down host need to be cleaned up manually in order to delete the resource provider and associated compute node/service. In general this is also a useful guide for linking up the various resources and terms in nova and how they are reflected in placement with the relevant commands which is probably something we should do more of in our docs. Change-Id: I120e1ddd7946a371888bfc890b5979f2e19288cd Related-Bug: #1829479 --- doc/source/admin/support-compute.rst | 7 + .../troubleshooting/orphaned-allocations.rst | 183 ++++++++++++++++++ doc/source/cli/nova-manage.rst | 2 + 3 files changed, 192 insertions(+) create mode 100644 doc/source/admin/troubleshooting/orphaned-allocations.rst diff --git a/doc/source/admin/support-compute.rst b/doc/source/admin/support-compute.rst index 1264649f73f4..579d0d36eaf5 100644 --- a/doc/source/admin/support-compute.rst +++ b/doc/source/admin/support-compute.rst @@ -9,6 +9,13 @@ a compute node to the instances that run on that node. Another common problem is trying to run 32-bit images on a 64-bit compute node. This section shows you how to troubleshoot Compute. +.. todo:: Move the sections below into sub-pages for readability. + +.. toctree:: + :maxdepth: 1 + + troubleshooting/orphaned-allocations.rst + Compute service logging ----------------------- diff --git a/doc/source/admin/troubleshooting/orphaned-allocations.rst b/doc/source/admin/troubleshooting/orphaned-allocations.rst new file mode 100644 index 000000000000..a8ea832ba843 --- /dev/null +++ b/doc/source/admin/troubleshooting/orphaned-allocations.rst @@ -0,0 +1,183 @@ +Orphaned resource allocations +============================= + +Problem +------- + +There are orphaned resource allocations in the placement service which can +cause resource providers to: + +* Appear to the scheduler to be more utilized than they really are +* Prevent deletion of compute services + +One scenario in which this could happen is a compute service host is having +problems so the administrator forces it down and evacuates servers from it. +Note that in this case "evacuates" refers to the server ``evacuate`` action, +not live migrating all servers from the running compute service. Assume the +compute host is down and fenced. + +In this case, the servers have allocations tracked in placement against both +the down source compute node and their current destination compute host. For +example, here is a server *vm1* which has been evacuated from node *devstack1* +to node *devstack2*: + +.. code-block:: console + + $ openstack --os-compute-api-version 2.53 compute service list --service nova-compute + +--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+ + | ID | Binary | Host | Zone | Status | State | Updated At | + +--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+ + | e3c18c2d-9488-4863-b728-f3f292ec5da8 | nova-compute | devstack1 | nova | enabled | down | 2019-10-25T20:13:51.000000 | + | 50a20add-cc49-46bd-af96-9bb4e9247398 | nova-compute | devstack2 | nova | enabled | up | 2019-10-25T20:13:52.000000 | + | b92afb2e-cd00-4074-803e-fff9aa379c2f | nova-compute | devstack3 | nova | enabled | up | 2019-10-25T20:13:53.000000 | + +--------------------------------------+--------------+-----------+------+---------+-------+----------------------------+ + $ vm1=$(openstack server show vm1 -f value -c id) + $ openstack server show $vm1 -f value -c OS-EXT-SRV-ATTR:host + devstack2 + +The server now has allocations against both *devstack1* and *devstack2* +resource providers in the placement service: + +.. code-block:: console + + $ devstack1=$(openstack resource provider list --name devstack1 -f value -c uuid) + $ devstack2=$(openstack resource provider list --name devstack2 -f value -c uuid) + $ openstack resource provider show --allocations $devstack1 + +-------------+-----------------------------------------------------------------------------------------------------------+ + | Field | Value | + +-------------+-----------------------------------------------------------------------------------------------------------+ + | uuid | 9546fce4-9fb5-4b35-b277-72ff125ad787 | + | name | devstack1 | + | generation | 6 | + | allocations | {u'a1e6e0b2-9028-4166-b79b-c177ff70fbb7': {u'resources': {u'VCPU': 1, u'MEMORY_MB': 512, u'DISK_GB': 1}}} | + +-------------+-----------------------------------------------------------------------------------------------------------+ + $ openstack resource provider show --allocations $devstack2 + +-------------+-----------------------------------------------------------------------------------------------------------+ + | Field | Value | + +-------------+-----------------------------------------------------------------------------------------------------------+ + | uuid | 52d0182d-d466-4210-8f0d-29466bb54feb | + | name | devstack2 | + | generation | 3 | + | allocations | {u'a1e6e0b2-9028-4166-b79b-c177ff70fbb7': {u'resources': {u'VCPU': 1, u'MEMORY_MB': 512, u'DISK_GB': 1}}} | + +-------------+-----------------------------------------------------------------------------------------------------------+ + $ openstack --os-placement-api-version 1.12 resource provider allocation show $vm1 + +--------------------------------------+------------+------------------------------------------------+----------------------------------+----------------------------------+ + | resource_provider | generation | resources | project_id | user_id | + +--------------------------------------+------------+------------------------------------------------+----------------------------------+----------------------------------+ + | 9546fce4-9fb5-4b35-b277-72ff125ad787 | 6 | {u'VCPU': 1, u'MEMORY_MB': 512, u'DISK_GB': 1} | 2f3bffc5db2b47deb40808a4ed2d7c7a | 2206168427c54d92ae2b2572bb0da9af | + | 52d0182d-d466-4210-8f0d-29466bb54feb | 3 | {u'VCPU': 1, u'MEMORY_MB': 512, u'DISK_GB': 1} | 2f3bffc5db2b47deb40808a4ed2d7c7a | 2206168427c54d92ae2b2572bb0da9af | + +--------------------------------------+------------+------------------------------------------------+----------------------------------+----------------------------------+ + +One way to find all servers that were evacuated from *devstack1* is: + +.. code-block:: console + + $ nova migration-list --source-compute devstack1 --migration-type evacuation + +----+--------------------------------------+-------------+-----------+----------------+--------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+ + | Id | UUID | Source Node | Dest Node | Source Compute | Dest Compute | Dest Host | Status | Instance UUID | Old Flavor | New Flavor | Created At | Updated At | Type | + +----+--------------------------------------+-------------+-----------+----------------+--------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+ + | 1 | 8a823ba3-e2e9-4f17-bac5-88ceea496b99 | devstack1 | devstack2 | devstack1 | devstack2 | 192.168.0.1 | done | a1e6e0b2-9028-4166-b79b-c177ff70fbb7 | None | None | 2019-10-25T17:46:35.000000 | 2019-10-25T17:46:37.000000 | evacuation | + +----+--------------------------------------+-------------+-----------+----------------+--------------+-------------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+------------+ + +Trying to delete the resource provider for *devstack1* will fail while there +are allocations against it: + +.. code-block:: console + + $ openstack resource provider delete $devstack1 + Unable to delete resource provider 9546fce4-9fb5-4b35-b277-72ff125ad787: Resource provider has allocations. (HTTP 409) + +Solution +-------- + +Using the example resources above, remove the allocation for server *vm1* from +the *devstack1* resource provider. + +Note that we do not use :command:`openstack resource provider allocation delete` +here because that will remove the allocations for the server from all resource +providers, including *devstack2* where it is now running. So we use +:command:`openstack resource provider allocation set` to overwrite the +allocations and only retain the *devstack2* provider allocations. If you do +remove all allocations for a given server, you can heal them later. See +`Using heal_allocations`_ for details. + +.. TODO: Update this when openstack resource provider allocation set has a + --no-provider option to remove a specific provider from the allocations, + see https://storyboard.openstack.org/#!/story/2006779. + +.. code-block:: console + + $ openstack --os-placement-api-version 1.12 resource provider allocation set $vm1 \ + --project-id 2f3bffc5db2b47deb40808a4ed2d7c7a \ + --user-id 2206168427c54d92ae2b2572bb0da9af \ + --allocation rp=52d0182d-d466-4210-8f0d-29466bb54feb,VCPU=1 \ + --allocation rp=52d0182d-d466-4210-8f0d-29466bb54feb,MEMORY_MB=512 \ + --allocation rp=52d0182d-d466-4210-8f0d-29466bb54feb,DISK_GB=1 + +--------------------------------------+------------+------------------------------------------------+----------------------------------+----------------------------------+ + | resource_provider | generation | resources | project_id | user_id | + +--------------------------------------+------------+------------------------------------------------+----------------------------------+----------------------------------+ + | 52d0182d-d466-4210-8f0d-29466bb54feb | 4 | {u'VCPU': 1, u'MEMORY_MB': 512, u'DISK_GB': 1} | 2f3bffc5db2b47deb40808a4ed2d7c7a | 2206168427c54d92ae2b2572bb0da9af | + +--------------------------------------+------------+------------------------------------------------+----------------------------------+----------------------------------+ + +Now the *devstack1* resource provider can be deleted: + +.. code-block:: console + + $ openstack resource provider delete $devstack1 + +And the related compute service if desired: + +.. code-block:: console + + $ openstack --os-compute-api-version 2.53 compute service delete e3c18c2d-9488-4863-b728-f3f292ec5da8 + +For more details on the resource provider commands used in this guide, refer +to the `osc-placement plugin documentation`_. + +.. _osc-placement plugin documentation: https://docs.openstack.org/osc-placement/latest/ + +Using heal_allocations +~~~~~~~~~~~~~~~~~~~~~~ + +If you have a particularly troubling allocation consumer and just want to +delete its allocations from all providers, you can use the +:command:`openstack resource provider allocation delete` command and then +heal the allocations for the consumer using the +:ref:`heal_allocations command `. For example: + +.. code-block:: console + + $ openstack resource provider allocation delete $vm1 + $ nova-manage placement heal_allocations --verbose --instance $vm1 + Looking for instances in cell: 04879596-d893-401c-b2a6-3d3aa096089d(cell1) + Found 1 candidate instances. + Successfully created allocations for instance a1e6e0b2-9028-4166-b79b-c177ff70fbb7. + Processed 1 instances. + $ openstack resource provider allocation show $vm1 + +--------------------------------------+------------+------------------------------------------------+ + | resource_provider | generation | resources | + +--------------------------------------+------------+------------------------------------------------+ + | 52d0182d-d466-4210-8f0d-29466bb54feb | 5 | {u'VCPU': 1, u'MEMORY_MB': 512, u'DISK_GB': 1} | + +--------------------------------------+------------+------------------------------------------------+ + +Note that deleting allocations and then relying on ``heal_allocations`` may not +always the best solution since healing allocations does not account for some +things: + +* `Migration-based allocations`_ would be lost if manually deleted during a + resize. These are allocations tracked by the migration resource record + on the source compute service during a migration. +* Healing allocations does not supported nested resource allocations before the + 20.0.0 (Train) release. + +If you do use the ``heal_allocations`` command to cleanup allocations for a +specific trouble instance, it is recommended to take note of what the +allocations were before you remove them in case you need to reset them manually +later. Use the :command:`openstack resource provider allocation show` command +to get allocations for a consumer before deleting them, e.g.: + +.. code-block:: console + + $ openstack --os-placement-api-version 1.12 resource provider allocation show $vm1 + +.. _Migration-based allocations: https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/migration-allocations.html diff --git a/doc/source/cli/nova-manage.rst b/doc/source/cli/nova-manage.rst index 6e3bc54ea8e5..8f86b37db19c 100644 --- a/doc/source/cli/nova-manage.rst +++ b/doc/source/cli/nova-manage.rst @@ -545,6 +545,8 @@ Nova Cells v2 Placement ~~~~~~~~~ +.. _heal_allocations_cli: + ``nova-manage placement heal_allocations [--max-count ] [--verbose] [--skip-port-allocations] [--dry-run] [--instance ]`` Iterates over non-cell0 cells looking for instances which do not have allocations in the Placement service and which are not undergoing a task