Fix: Race between attachment and volume deletion
There are cases where requests to delete an attachment made by Nova can race other third-party requests to delete the overall volume. This has been observed when running cinder-csi, where it first requests that Nova detaches a volume before itself requesting that the overall volume is deleted once it becomes `available`. This is a cinder race condition, and like most race conditions is not simple to explain. Some context on the issue: - Cinder API uses the volume "status" field as a locking mechanism to prevent concurrent request processing on the same volume. - Most cinder operations are asynchronous, so the API returns before the operation has been completed by the cinder-volume service, but the attachment operations such as creating/updating/deleting an attachment are synchronous, so the API only returns to the caller after the cinder-volume service has completed the operation. - Our current code **incorrectly** modifies the status of the volume both on the cinder-volume and the cinder-api services on the attachment delete operation. The actual set of events that leads to the issue reported in this bug are: [Cinder-CSI] - Requests Nova to detach volume (Request R1) [Nova] - R1: Asks cinder-api to delete the attachment and **waits** [Cinder-API] - R1: Checks the status of the volume - R1: Sends terminate connection request (R1) to cinder-volume and **waits** [Cinder-Volume] - R1: Ask the driver to terminate the connection - R1: The driver asks the backend to unmap and unexport the volume - R1: The last attachment is removed from the DB and the status of the volume is changed in the DB to "available" [Cinder-CSI] - Checks that there are no attachments in the volume and asks Cinder to delete it (Request R2) [Cinder-API] - R2: Check that the volume's status is valid. It doesn't have attachments and is available, so it can be deleted. - R2: Tell cinder-volume to delete the volume and return immediately. [Cinder-Volume] - R2: Volume is deleted and DB entry is deleted - R1: Finish the termination of the connection [Cinder-API] - R1: Now that cinder-volume has finished the termination the code continues - R1: Try to modify the volume in the DB - R1: DB layer raises VolumeNotFound since the volume has been deleted from the DB - R1: VolumeNotFound is converted to HTTP 404 status code which is returned to Nova [Nova] - R1: Cinder responds with 404 on the attachment delete request - R1: Nova leaves the volume as attached, since the attachment delete failed At this point the Cinder and Nova DBs are out of sync, because Nova thinks that the attachment is connected and Cinder has detached the volume and even deleted it. Hardening is also being done on the Nova side [2] to accept that the volume attachment may be gone. This patch fixes the issue mentioned above, but there is a request on Cinder-CSI [1] to use Nova as the source of truth regarding its attachments that, when implemented, would also fix the issue. [1]: https://github.com/kubernetes/cloud-provider-openstack/issues/1645 [2]: https://review.opendev.org/q/topic:%2522bug/1937084%2522+project:openstack/nova Closes-Bug: #1937084 Change-Id: Iaf149dadad5791e81a3c0efd089d0ee66a1a5614
This commit is contained in:
parent
0cd1b7c275
commit
2ec2222841
@ -63,6 +63,9 @@ from cinder.volume import volume_utils
|
|||||||
|
|
||||||
CONF = cfg.CONF
|
CONF = cfg.CONF
|
||||||
LOG = logging.getLogger(__name__)
|
LOG = logging.getLogger(__name__)
|
||||||
|
# Map with cases where attach status differs from volume status
|
||||||
|
ATTACH_STATUS_MAP = {'attached': 'in-use', 'detached': 'available'}
|
||||||
|
|
||||||
|
|
||||||
options.set_defaults(CONF, connection='sqlite:///$state_path/cinder.sqlite')
|
options.set_defaults(CONF, connection='sqlite:///$state_path/cinder.sqlite')
|
||||||
|
|
||||||
@ -1727,74 +1730,83 @@ def volume_include_in_cluster(context, cluster, partial_rename=True,
|
|||||||
partial_rename, filters)
|
partial_rename, filters)
|
||||||
|
|
||||||
|
|
||||||
|
def _get_statuses_from_attachments(context, session, volume_id):
|
||||||
|
"""Get volume status and attach_status based on existing attachments."""
|
||||||
|
# NOTE: Current implementation ignores attachments on error attaching,
|
||||||
|
# since they will not have been used by any consumer because os-brick's
|
||||||
|
# connect_volume has not been called yet. This leads to cases where a
|
||||||
|
# volume will be in 'available' state yet have attachments.
|
||||||
|
|
||||||
|
# If we sort status of attachments alphabetically, ignoring errors, the
|
||||||
|
# first element will be the attachment status for the volume:
|
||||||
|
# attached > attaching > detaching > reserved
|
||||||
|
attach_status = session.query(models.VolumeAttachment.attach_status).\
|
||||||
|
filter_by(deleted=False).\
|
||||||
|
filter_by(volume_id=volume_id).\
|
||||||
|
filter(~models.VolumeAttachment.attach_status.startswith('error_')).\
|
||||||
|
order_by(models.VolumeAttachment.attach_status.asc()).\
|
||||||
|
limit(1).\
|
||||||
|
scalar()
|
||||||
|
|
||||||
|
# No volume attachment records means the volume is detached.
|
||||||
|
attach_status = attach_status or 'detached'
|
||||||
|
|
||||||
|
# Check cases where volume status is different from attach status, and
|
||||||
|
# default to the same value if it's not one of those cases.
|
||||||
|
status = ATTACH_STATUS_MAP.get(attach_status, attach_status)
|
||||||
|
|
||||||
|
return (status, attach_status)
|
||||||
|
|
||||||
|
|
||||||
@require_admin_context
|
@require_admin_context
|
||||||
def volume_detached(context, volume_id, attachment_id):
|
def volume_detached(context, volume_id, attachment_id):
|
||||||
"""This updates a volume attachment and marks it as detached.
|
"""Delete an attachment and update the volume accordingly.
|
||||||
|
|
||||||
This method also ensures that the volume entry is correctly
|
After marking the attachment as detached the method will decide the status
|
||||||
marked as either still attached/in-use or detached/available
|
and attach_status values for the volume based on the current status and the
|
||||||
if this was the last detachment made.
|
remaining attachments and their status.
|
||||||
|
|
||||||
|
Volume status may be changed to: in-use, attaching, detaching, reserved, or
|
||||||
|
available.
|
||||||
|
|
||||||
|
Volume attach_status will be changed to one of: attached, attaching,
|
||||||
|
detaching, reserved, or detached.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
# NOTE(jdg): This is a funky band-aid for the earlier attempts at
|
# NOTE(jdg): This is a funky band-aid for the earlier attempts at
|
||||||
# multiattach, it's a bummer because these things aren't really being used
|
# multiattach, it's a bummer because these things aren't really being used
|
||||||
# but at the same time we don't want to break them until we work out the
|
# but at the same time we don't want to break them until we work out the
|
||||||
# new proposal for multi-attach
|
# new proposal for multi-attach
|
||||||
remain_attachment = True
|
|
||||||
session = get_session()
|
session = get_session()
|
||||||
with session.begin():
|
with session.begin():
|
||||||
|
# Only load basic volume info necessary to check various status and use
|
||||||
|
# the volume row as a lock with the for_update.
|
||||||
|
volume = _volume_get(context, volume_id, session=session,
|
||||||
|
joined_load=False, for_update=True)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
attachment = _attachment_get(context, attachment_id,
|
attachment = _attachment_get(context, attachment_id,
|
||||||
session=session)
|
session=session)
|
||||||
|
attachment_updates = attachment.delete(session)
|
||||||
except exception.VolumeAttachmentNotFound:
|
except exception.VolumeAttachmentNotFound:
|
||||||
attachment_updates = None
|
attachment_updates = None
|
||||||
attachment = None
|
|
||||||
|
|
||||||
if attachment:
|
status, attach_status = _get_statuses_from_attachments(context,
|
||||||
now = timeutils.utcnow()
|
session,
|
||||||
attachment_updates = {
|
volume_id)
|
||||||
'attach_status': fields.VolumeAttachStatus.DETACHED,
|
|
||||||
'detach_time': now,
|
|
||||||
'deleted': True,
|
|
||||||
'deleted_at': now,
|
|
||||||
'updated_at': attachment.updated_at,
|
|
||||||
}
|
|
||||||
attachment.update(attachment_updates)
|
|
||||||
attachment.save(session=session)
|
|
||||||
del attachment_updates['updated_at']
|
|
||||||
|
|
||||||
attachment_list = None
|
volume_updates = {'updated_at': volume.updated_at,
|
||||||
volume_ref = _volume_get(context, volume_id,
|
'attach_status': attach_status}
|
||||||
session=session)
|
|
||||||
volume_updates = {'updated_at': volume_ref.updated_at}
|
|
||||||
if not volume_ref.volume_attachment:
|
|
||||||
# NOTE(jdg): We kept the old arg style allowing session exclusively
|
|
||||||
# for this one call
|
|
||||||
attachment_list = volume_attachment_get_all_by_volume_id(
|
|
||||||
context, volume_id, session=session)
|
|
||||||
remain_attachment = False
|
|
||||||
if attachment_list and len(attachment_list) > 0:
|
|
||||||
remain_attachment = True
|
|
||||||
|
|
||||||
if not remain_attachment:
|
# Hide volume status update to available on volume migration or upload,
|
||||||
# Hide status update from user if we're performing volume migration
|
# as status is updated later on those flows.
|
||||||
# or uploading it to image
|
if ((attach_status != 'detached')
|
||||||
if ((not volume_ref.migration_status and
|
or (not volume.migration_status and volume.status != 'uploading')
|
||||||
not (volume_ref.status == 'uploading')) or
|
or volume.migration_status in ('success', 'error')):
|
||||||
volume_ref.migration_status in ('success', 'error')):
|
volume_updates['status'] = status
|
||||||
volume_updates['status'] = 'available'
|
|
||||||
|
|
||||||
volume_updates['attach_status'] = (
|
volume.update(volume_updates)
|
||||||
fields.VolumeAttachStatus.DETACHED)
|
volume.save(session=session)
|
||||||
else:
|
|
||||||
# Volume is still attached
|
|
||||||
volume_updates['status'] = 'in-use'
|
|
||||||
volume_updates['attach_status'] = (
|
|
||||||
fields.VolumeAttachStatus.ATTACHED)
|
|
||||||
|
|
||||||
volume_ref.update(volume_updates)
|
|
||||||
volume_ref.save(session=session)
|
|
||||||
del volume_updates['updated_at']
|
del volume_updates['updated_at']
|
||||||
return (volume_updates, attachment_updates)
|
return (volume_updates, attachment_updates)
|
||||||
|
|
||||||
@ -1878,11 +1890,14 @@ def _volume_get_query(context, session=None, project_only=False,
|
|||||||
|
|
||||||
|
|
||||||
@require_context
|
@require_context
|
||||||
def _volume_get(context, volume_id, session=None, joined_load=True):
|
def _volume_get(context, volume_id, session=None, joined_load=True,
|
||||||
|
for_update=False):
|
||||||
result = _volume_get_query(context, session=session, project_only=True,
|
result = _volume_get_query(context, session=session, project_only=True,
|
||||||
joined_load=joined_load)
|
joined_load=joined_load)
|
||||||
if joined_load:
|
if joined_load:
|
||||||
result = result.options(joinedload('volume_type.extra_specs'))
|
result = result.options(joinedload('volume_type.extra_specs'))
|
||||||
|
if for_update:
|
||||||
|
result = result.with_for_update()
|
||||||
result = result.filter_by(id=volume_id).first()
|
result = result.filter_by(id=volume_id).first()
|
||||||
|
|
||||||
if not result:
|
if not result:
|
||||||
|
@ -56,8 +56,10 @@ class CinderBase(models.TimestampMixin,
|
|||||||
def delete(self, session):
|
def delete(self, session):
|
||||||
"""Delete this object."""
|
"""Delete this object."""
|
||||||
updated_values = self.delete_values()
|
updated_values = self.delete_values()
|
||||||
|
updated_values['updated_at'] = self.updated_at
|
||||||
self.update(updated_values)
|
self.update(updated_values)
|
||||||
self.save(session=session)
|
self.save(session=session)
|
||||||
|
del updated_values['updated_at']
|
||||||
return updated_values
|
return updated_values
|
||||||
|
|
||||||
|
|
||||||
@ -393,6 +395,14 @@ class VolumeAttachment(BASE, CinderBase):
|
|||||||
# Stores a serialized json dict of host connector information from brick.
|
# Stores a serialized json dict of host connector information from brick.
|
||||||
connector = sa.Column(sa.Text)
|
connector = sa.Column(sa.Text)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def delete_values():
|
||||||
|
now = timeutils.utcnow()
|
||||||
|
return {'deleted': True,
|
||||||
|
'deleted_at': now,
|
||||||
|
'attach_status': 'detached',
|
||||||
|
'detach_time': now}
|
||||||
|
|
||||||
|
|
||||||
class VolumeType(BASE, CinderBase):
|
class VolumeType(BASE, CinderBase):
|
||||||
"""Represent possible volume_types of volumes offered."""
|
"""Represent possible volume_types of volumes offered."""
|
||||||
|
@ -172,13 +172,13 @@ class AttachmentsAPITestCase(test.TestCase):
|
|||||||
self.controller.delete(req, attachment.id)
|
self.controller.delete(req, attachment.id)
|
||||||
|
|
||||||
volume2 = objects.Volume.get_by_id(self.ctxt, volume1.id)
|
volume2 = objects.Volume.get_by_id(self.ctxt, volume1.id)
|
||||||
if status == 'reserved':
|
# Volume and attachment status is changed on the API service
|
||||||
self.assertEqual('detached', volume2.attach_status)
|
self.assertEqual('detached', volume2.attach_status)
|
||||||
self.assertRaises(
|
self.assertEqual('available', volume2.status)
|
||||||
exception.VolumeAttachmentNotFound,
|
self.assertRaises(
|
||||||
objects.VolumeAttachment.get_by_id, self.ctxt, attachment.id)
|
exception.VolumeAttachmentNotFound,
|
||||||
else:
|
objects.VolumeAttachment.get_by_id, self.ctxt, attachment.id)
|
||||||
self.assertEqual('attached', volume2.attach_status)
|
if status != 'reserved':
|
||||||
mock_delete.assert_called_once_with(req.environ['cinder.context'],
|
mock_delete.assert_called_once_with(req.environ['cinder.context'],
|
||||||
attachment.id, mock.ANY)
|
attachment.id, mock.ANY)
|
||||||
|
|
||||||
|
@ -18,7 +18,6 @@ from oslo_utils import importutils
|
|||||||
from cinder import context
|
from cinder import context
|
||||||
from cinder import db
|
from cinder import db
|
||||||
from cinder.db.sqlalchemy import api as sqla_db
|
from cinder.db.sqlalchemy import api as sqla_db
|
||||||
from cinder import exception
|
|
||||||
from cinder.objects import fields
|
from cinder.objects import fields
|
||||||
from cinder.objects import volume_attachment
|
from cinder.objects import volume_attachment
|
||||||
from cinder.tests.unit.api.v2 import fakes as v2_fakes
|
from cinder.tests.unit.api.v2 import fakes as v2_fakes
|
||||||
@ -137,13 +136,21 @@ class AttachmentManagerTestCase(test.TestCase):
|
|||||||
attachment_ref = db.volume_attachment_get(
|
attachment_ref = db.volume_attachment_get(
|
||||||
self.context,
|
self.context,
|
||||||
attachment_ref['id'])
|
attachment_ref['id'])
|
||||||
|
|
||||||
|
vref.refresh()
|
||||||
|
expected_status = (vref.status, vref.attach_status,
|
||||||
|
attachment_ref.attach_status)
|
||||||
|
|
||||||
self.manager.attachment_delete(self.context,
|
self.manager.attachment_delete(self.context,
|
||||||
attachment_ref['id'],
|
attachment_ref['id'],
|
||||||
vref)
|
vref)
|
||||||
self.assertRaises(exception.VolumeAttachmentNotFound,
|
# Manager doesn't change the resource status. It is changed on the API
|
||||||
db.volume_attachment_get,
|
attachment_ref = db.volume_attachment_get(self.context,
|
||||||
self.context,
|
attachment_ref.id)
|
||||||
attachment_ref.id)
|
vref.refresh()
|
||||||
|
self.assertEqual(
|
||||||
|
expected_status,
|
||||||
|
(vref.status, vref.attach_status, attachment_ref.attach_status))
|
||||||
|
|
||||||
def test_attachment_delete_remove_export_fail(self):
|
def test_attachment_delete_remove_export_fail(self):
|
||||||
"""attachment_delete removes attachment on remove_export failure."""
|
"""attachment_delete removes attachment on remove_export failure."""
|
||||||
@ -160,15 +167,18 @@ class AttachmentManagerTestCase(test.TestCase):
|
|||||||
attach = db.volume_attach(self.context, values)
|
attach = db.volume_attach(self.context, values)
|
||||||
# Confirm the volume OVO has the attachment before the deletion
|
# Confirm the volume OVO has the attachment before the deletion
|
||||||
vref.refresh()
|
vref.refresh()
|
||||||
|
expected_vol_status = (vref.status, vref.attach_status)
|
||||||
self.assertEqual(1, len(vref.volume_attachment))
|
self.assertEqual(1, len(vref.volume_attachment))
|
||||||
|
|
||||||
self.manager.attachment_delete(self.context, attach.id, vref)
|
self.manager.attachment_delete(self.context, attach.id, vref)
|
||||||
|
|
||||||
# Attachment has been removed from the DB
|
# Manager doesn't change the resource status. It is changed on the API
|
||||||
self.assertRaises(exception.VolumeAttachmentNotFound,
|
attachment = db.volume_attachment_get(self.context, attach.id)
|
||||||
db.volume_attachment_get, self.context, attach.id)
|
self.assertEqual(attach.attach_status, attachment.attach_status)
|
||||||
# Attachment has been removed from the volume OVO attachment list
|
|
||||||
self.assertEqual(0, len(vref.volume_attachment))
|
vref = db.volume_get(self.context, vref.id)
|
||||||
|
self.assertEqual(expected_vol_status,
|
||||||
|
(vref.status, vref.attach_status))
|
||||||
|
|
||||||
def test_attachment_delete_multiple_attachments(self):
|
def test_attachment_delete_multiple_attachments(self):
|
||||||
volume_params = {'status': 'available'}
|
volume_params = {'status': 'available'}
|
||||||
|
@ -644,11 +644,13 @@ class DBAPIVolumeTestCase(BaseTest):
|
|||||||
'instance_uuid': instance_uuid,
|
'instance_uuid': instance_uuid,
|
||||||
'attach_status': fields.VolumeAttachStatus.ATTACHING, }
|
'attach_status': fields.VolumeAttachStatus.ATTACHING, }
|
||||||
attachment = db.volume_attach(self.ctxt, values)
|
attachment = db.volume_attach(self.ctxt, values)
|
||||||
|
db.volume_attached(self.ctxt, attachment.id, instance_uuid,
|
||||||
|
None, '/tmp')
|
||||||
values2 = {'volume_id': volume.id,
|
values2 = {'volume_id': volume.id,
|
||||||
'instance_uuid': fake.OBJECT_ID,
|
'instance_uuid': fake.OBJECT_ID,
|
||||||
'attach_status': fields.VolumeAttachStatus.ATTACHING, }
|
'attach_status': fields.VolumeAttachStatus.ATTACHING, }
|
||||||
db.volume_attach(self.ctxt, values2)
|
attachment2 = db.volume_attach(self.ctxt, values2)
|
||||||
db.volume_attached(self.ctxt, attachment.id,
|
db.volume_attached(self.ctxt, attachment2.id,
|
||||||
instance_uuid,
|
instance_uuid,
|
||||||
None, '/tmp')
|
None, '/tmp')
|
||||||
volume_updates, attachment_updates = (
|
volume_updates, attachment_updates = (
|
||||||
|
@ -2261,53 +2261,26 @@ class API(base.Base):
|
|||||||
volume = attachment.volume
|
volume = attachment.volume
|
||||||
|
|
||||||
if attachment.attach_status == fields.VolumeAttachStatus.RESERVED:
|
if attachment.attach_status == fields.VolumeAttachStatus.RESERVED:
|
||||||
volume_utils.notify_about_volume_usage(ctxt, volume,
|
volume_utils.notify_about_volume_usage(ctxt,
|
||||||
"detach.start")
|
volume, "detach.start")
|
||||||
volume.finish_detach(attachment.id)
|
|
||||||
do_notify = True
|
|
||||||
else:
|
else:
|
||||||
self.volume_rpcapi.attachment_delete(ctxt,
|
# Generate the detach.start notification on the volume service to
|
||||||
attachment.id,
|
# include the host doing the operation.
|
||||||
volume)
|
self.volume_rpcapi.attachment_delete(ctxt, attachment.id, volume)
|
||||||
do_notify = False
|
|
||||||
status_updates = {'status': 'available',
|
|
||||||
'attach_status': 'detached'}
|
|
||||||
remaining_attachments = AO_LIST.get_all_by_volume_id(ctxt, volume.id)
|
|
||||||
LOG.debug("Remaining volume attachments: %s", remaining_attachments,
|
|
||||||
resource=volume)
|
|
||||||
|
|
||||||
# NOTE(jdg) Try and figure out the > state we have left and set that
|
# Trigger attachments lazy load (missing since volume was loaded in the
|
||||||
# attached > attaching > > detaching > reserved
|
# attachment without joined tables). With them loaded the finish_detach
|
||||||
pending_status_list = []
|
# call removes the detached one from the list, and the notification and
|
||||||
for attachment in remaining_attachments:
|
# return have the right attachment list.
|
||||||
pending_status_list.append(attachment.attach_status)
|
volume.volume_attachment
|
||||||
LOG.debug("Adding status of: %s to pending status list "
|
volume.finish_detach(attachment.id)
|
||||||
"for volume.", attachment.attach_status,
|
|
||||||
resource=volume)
|
|
||||||
|
|
||||||
LOG.debug("Pending status list for volume during "
|
# Do end notification on the API so it comes after finish_detach.
|
||||||
"attachment-delete: %s",
|
# Doing it on the volume service leads to race condition from bug
|
||||||
pending_status_list, resource=volume)
|
# #1937084, and doing the notification there with the finish here leads
|
||||||
if 'attached' in pending_status_list:
|
# to bug #1916980.
|
||||||
status_updates['status'] = 'in-use'
|
volume_utils.notify_about_volume_usage(ctxt, volume, "detach.end")
|
||||||
status_updates['attach_status'] = 'attached'
|
return volume.volume_attachment
|
||||||
elif 'attaching' in pending_status_list:
|
|
||||||
status_updates['status'] = 'attaching'
|
|
||||||
status_updates['attach_status'] = 'attaching'
|
|
||||||
elif 'detaching' in pending_status_list:
|
|
||||||
status_updates['status'] = 'detaching'
|
|
||||||
status_updates['attach_status'] = 'detaching'
|
|
||||||
elif 'reserved' in pending_status_list:
|
|
||||||
status_updates['status'] = 'reserved'
|
|
||||||
status_updates['attach_status'] = 'reserved'
|
|
||||||
|
|
||||||
volume.status = status_updates['status']
|
|
||||||
volume.attach_status = status_updates['attach_status']
|
|
||||||
volume.save()
|
|
||||||
|
|
||||||
if do_notify:
|
|
||||||
volume_utils.notify_about_volume_usage(ctxt, volume, "detach.end")
|
|
||||||
return remaining_attachments
|
|
||||||
|
|
||||||
|
|
||||||
class HostAPI(base.Base):
|
class HostAPI(base.Base):
|
||||||
|
@ -4951,8 +4951,6 @@ class VolumeManager(manager.CleanableManager,
|
|||||||
# Failures on detach_volume and remove_export are not considered
|
# Failures on detach_volume and remove_export are not considered
|
||||||
# failures in terms of detaching the volume.
|
# failures in terms of detaching the volume.
|
||||||
pass
|
pass
|
||||||
vref.finish_detach(attachment_id)
|
|
||||||
self._notify_about_volume_usage(context, vref, "detach.end")
|
|
||||||
|
|
||||||
# Replication group API (Tiramisu)
|
# Replication group API (Tiramisu)
|
||||||
def enable_replication(self,
|
def enable_replication(self,
|
||||||
|
@ -0,0 +1,6 @@
|
|||||||
|
---
|
||||||
|
fixes:
|
||||||
|
- |
|
||||||
|
`Bug #1937084 <https://bugs.launchpad.net/cinder/+bug/1937084>`_: Fixed
|
||||||
|
race condition between delete attachment and delete volume that can leave
|
||||||
|
deleted volumes stuck as attached to instances.
|
Loading…
x
Reference in New Issue
Block a user