[ovn][metadata] Remove metadata readiness mechanism

Prior to this patch, the metadata agent was writing into SB
database when a network had been provisioned with metadata
on a particular chassis.

Then, neutron-server would wait for that event to happen with
a 15s timeout before sending the vif-plugged event to Nova.

By removing this mechanism:

1) We'll save writes to OVN SB database which, in highly loaded
systems and at scale reduces significantly the load on ovsdb-server.

2) Ignoring healthchecks (that still requires write to the SB DB),
we can make OVN metadata agent to connect to slave instances when
using active-backup OVN databases since writes are not needed.

3) There's a chance that the VM boots very fast and requests
metadata before the service is ready but since the timeout was
15 seconds, we can safely rely on the the cloud-init retries.

Signed-off-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
Change-Id: Ia6cd7a9a3b9662a9a8ce106e01a93c357c255956
This commit is contained in:
Daniel Alvarez Sanchez 2021-05-18 16:50:12 +02:00 committed by Rodolfo Alonso
parent 7e98d18927
commit 36ba1cc319
8 changed files with 7 additions and 274 deletions

View File

@ -147,15 +147,8 @@ In neutron-ovn-metadata-agent.
Southbound database and look for all rows with the ``chassis`` column set
to the host the agent is running on. For all those entries, make sure a
metadata proxy instance is spawned for every ``datapath`` (Neutron
network) those ports are attached to. The agent will keep record of the
list of networks it currently has proxies running on by updating the
``external-ids`` key ``neutron-metadata-proxy-networks`` of the OVN
``Chassis`` record in the OVN Southbound database that corresponds to this
host. As an example, this key would look like
``neutron-metadata-proxy-networks=NET1_UUID,NET4_UUID`` meaning that this
chassis is hosting one or more VM's connected to networks 1 and 4 so we
should have a metadata proxy instance running for each. Ensure any running
metadata proxies no longer needed are torn down.
network) those ports are attached to. Ensure any running metadata proxies
no longer needed are torn down.
* Open and maintain a connection to the OVN Northbound database (using the
ovsdbapp library). On first connection, and anytime a reconnect happens:
@ -218,9 +211,6 @@ Launching a metadata proxy includes:
* Starting haproxy in this network namespace.
* Add the network UUID to ``external-ids:neutron-metadata-proxy-networks`` on
the Chassis table for our chassis in OVN Southbound database.
Tearing down a metadata proxy includes:
* Removing the network UUID from our chassis.
@ -239,18 +229,12 @@ have to deal with the complexity of it (haproxy instances, network namespaces,
etcetera). In this case, the agent would not create the neutron ports needed
for metadata.
There could be a race condition when the first VM for a certain network boots
on a hypervisor if it does so before the metadata proxy instance has been
spawned.
Right now, the ``vif-plugged`` event to Nova is sent out when the up column
in the OVN Northbound database's Logical_Switch_Port table changes to True,
indicating that the VIF is now up. To overcome this race condition we want
to wait until all network UUID's to which this VM is connected to are present
in ``external-ids:neutron-metadata-proxy-networks`` on the Chassis table
for our chassis in OVN Southbound database. This will delay the event to Nova
until the metadata proxy instance is up and running on the host ensuring the
VM will be able to get the metadata on boot.
indicating that the VIF is now up. There could be a race condition when the
first VM for a certain network boots on a hypervisor if it does so before the
metadata proxy instance has been spawned. Fortunately, retries on cloud-init
should eventually fetch metadata even when this might happen.
Alternatives Considered
-----------------------

View File

@ -335,8 +335,6 @@ class MetadataAgent(object):
This function will shutdown metadata proxy if it's running and delete
the VETH pair, the OVS port and the namespace.
"""
self.update_chassis_metadata_networks(datapath, remove=True)
# TODO(dalvarez): Remove this in Y cycle when we are sure that all
# namespaces will be created with the Neutron network UUID and not
# anymore with the OVN datapath UUID.
@ -515,7 +513,6 @@ class MetadataAgent(object):
self.conf, bind_address=n_const.METADATA_V4_IP,
network_id=net_name)
self.update_chassis_metadata_networks(net_name)
return namespace
def ensure_all_networks_provisioned(self):
@ -541,30 +538,3 @@ class MetadataAgent(object):
namespaces.append(netns)
return namespaces
# NOTE(lucasagomes): Even tho the metadata agent is a multi-process
# application, there's only one Southbound database IDL instance in
# the agent which handles the OVSDB events therefore we do not need
# the external=True parameter in the @synchronized decorator.
@lockutils.synchronized(CHASSIS_METADATA_LOCK)
def update_chassis_metadata_networks(self, datapath, remove=False):
"""Update metadata networks hosted in this chassis.
Add or remove a datapath from the list of current datapaths that
we're currently serving metadata.
"""
current_dps = self.sb_idl.get_chassis_metadata_networks(self.chassis)
updated = False
if remove:
if datapath in current_dps:
current_dps.remove(datapath)
updated = True
else:
if datapath not in current_dps:
current_dps.append(datapath)
updated = True
if updated:
with self.sb_idl.create_transaction(check_error=True) as txn:
txn.add(self.sb_idl.set_chassis_metadata_networks(
self.chassis, current_dps))

View File

@ -43,7 +43,6 @@ from neutron.common.ovn import acl as ovn_acl
from neutron.common.ovn import constants as ovn_const
from neutron.common.ovn import extensions as ovn_extensions
from neutron.common.ovn import utils as ovn_utils
from neutron.common import utils as n_utils
from neutron.conf.plugins.ml2.drivers.ovn import ovn_conf
from neutron.db import ovn_hash_ring_db
from neutron.db import ovn_revision_numbers_db
@ -64,11 +63,6 @@ import neutron.wsgi
LOG = log.getLogger(__name__)
METADATA_READY_WAIT_TIMEOUT = 15
class MetadataServiceReadyWaitTimeoutException(Exception):
pass
class OVNPortUpdateError(n_exc.BadRequest):
@ -993,7 +987,6 @@ class OVNMechanismDriver(api.MechanismDriver):
LOG.info("OVN reports status up for port: %s", port_id)
self._update_dnat_entry_if_needed(port_id)
self._wait_for_metadata_provisioned_if_needed(port_id)
admin_context = n_context.get_admin_context()
provisioning_blocks.provisioning_complete(
@ -1088,61 +1081,6 @@ class OVNMechanismDriver(api.MechanismDriver):
if phynet in phynets}
segment_service_db.map_segment_to_hosts(context, segment.id, hosts)
def _wait_for_metadata_provisioned_if_needed(self, port_id):
"""Wait for metadata service to be provisioned.
Wait until metadata service has been setup for this port in the chassis
it resides. If metadata is disabled or DHCP is not enabled for its
subnets, this function will return right away.
"""
if ovn_conf.is_ovn_metadata_enabled() and self._sb_ovn:
# Wait until metadata service has been setup for this port in the
# chassis it resides.
result = (
self._sb_ovn.get_logical_port_chassis_and_datapath(port_id))
if not result:
LOG.warning("Logical port %s doesn't exist in OVN", port_id)
return
chassis, datapath = result
if not chassis:
LOG.warning("Logical port %s is not bound to a "
"chassis", port_id)
return
# Check if the port belongs to some IPv4 subnet with DHCP enabled.
context = n_context.get_admin_context()
port = self._plugin.get_port(context, port_id)
port_subnet_ids = set(
ip['subnet_id'] for ip in port['fixed_ips'] if
n_utils.get_ip_version(ip['ip_address']) == const.IP_VERSION_4)
if not port_subnet_ids:
# The port doesn't belong to any IPv4 subnet
return
subnets = self._plugin.get_subnets(context, filters=dict(
network_id=[port['network_id']], ip_version=[4],
enable_dhcp=True))
subnet_ids = set(
s['id'] for s in subnets if s['id'] in port_subnet_ids)
if not subnet_ids:
return
try:
n_utils.wait_until_true(
lambda: datapath in
self._sb_ovn.get_chassis_metadata_networks(chassis),
timeout=METADATA_READY_WAIT_TIMEOUT,
exception=MetadataServiceReadyWaitTimeoutException)
except MetadataServiceReadyWaitTimeoutException:
# If we reach this point it means that metadata agent didn't
# provision the datapath for this port on its chassis. Either
# the agent is not running or it crashed. We'll complete the
# provisioning block though.
LOG.warning("Metadata service is not ready for port %s, check"
" neutron-ovn-metadata-agent status/logs.",
port_id)
def patch_plugin_merge(self, method_name, new_fn, op=operator.add):
old_method = getattr(self._plugin, method_name)

View File

@ -833,25 +833,6 @@ class OvsdbSbOvnIdl(sb_impl_idl.OvnSbApiIdlImpl, Backend):
('type', '=', 'localport'))
return next(iter(cmd.execute(check_error=True)), None)
def get_chassis_metadata_networks(self, chassis_name):
"""Return a list with the metadata networks the chassis is hosting."""
try:
chassis = self.lookup('Chassis', chassis_name)
except idlutils.RowNotFound:
LOG.warning("Couldn't find Chassis named %s in OVN while looking "
"for metadata networks", chassis_name)
return []
proxy_networks = chassis.external_ids.get(
'neutron-metadata-proxy-networks', None)
return proxy_networks.split(',') if proxy_networks else []
def set_chassis_metadata_networks(self, chassis, networks):
nets = ','.join(networks) if networks else ''
# TODO(twilson) This could just use DbSetCommand
return cmd.UpdateChassisExtIdsCommand(
self, chassis, {'neutron-metadata-proxy-networks': nets},
if_exists=True)
def set_chassis_neutron_description(self, chassis, description,
agent_type):
desc_key = (ovn_const.OVN_AGENT_METADATA_DESC_KEY

View File

@ -123,13 +123,6 @@ class TestSbApi(BaseOvnIdlTest):
val = str(uuid.uuid4())
self.assertIsNone(self.api.get_metadata_port_network(val))
def test_set_get_chassis_metadata_networks(self):
name = self.data['chassis'][0]['name']
nets = [str(uuid.uuid4()) for _ in range(3)]
self.api.set_chassis_metadata_networks(name, nets).execute(
check_error=True)
self.assertEqual(nets, self.api.get_chassis_metadata_networks(name))
def _create_bound_port_with_ip(self):
chassis, switch, port, binding = self._add_switch_port(
self.data['chassis'][0]['name'])

View File

@ -178,10 +178,7 @@ class TestMetadataAgent(base.BaseTestCase):
Check that the VETH pair, OVS port and namespace associated to this
namespace are deleted and the metadata proxy is destroyed.
"""
with mock.patch.object(self.agent,
'update_chassis_metadata_networks'),\
mock.patch.object(
ip_netns, 'exists', return_value=True),\
with mock.patch.object(ip_netns, 'exists', return_value=True),\
mock.patch.object(
ip_lib, 'device_exists', return_value=True),\
mock.patch.object(
@ -235,9 +232,6 @@ class TestMetadataAgent(base.BaseTestCase):
ip_wrap, 'add_veth',
return_value=[ip_lib.IPDevice('ip1'),
ip_lib.IPDevice('ip2')]) as add_veth,\
mock.patch.object(
self.agent,
'update_chassis_metadata_networks') as update_chassis,\
mock.patch.object(
driver.MetadataDriver,
'spawn_monitored_metadata_proxy') as spawn_mdp, \
@ -273,53 +267,4 @@ class TestMetadataAgent(base.BaseTestCase):
spawn_mdp.assert_called_once_with(
mock.ANY, 'namespace', 80, mock.ANY,
bind_address=n_const.METADATA_V4_IP, network_id='1')
# Check that the chassis has been updated with the datapath.
update_chassis.assert_called_once_with('1')
mock_checksum.assert_called_once_with('namespace')
def _test_update_chassis_metadata_networks_helper(
self, dp, remove, expected_dps, txn_called=True):
current_dps = ['0', '1', '2']
with mock.patch.object(self.agent.sb_idl,
'get_chassis_metadata_networks',
return_value=current_dps),\
mock.patch.object(self.agent.sb_idl,
'set_chassis_metadata_networks',
retrurn_value=True),\
mock.patch.object(self.agent.sb_idl,
'create_transaction') as create_txn_mock:
self.agent.update_chassis_metadata_networks(dp, remove=remove)
updated_dps = self.agent.sb_idl.get_chassis_metadata_networks(
self.agent.chassis)
self.assertEqual(updated_dps, expected_dps)
self.assertEqual(create_txn_mock.called, txn_called)
def test_update_chassis_metadata_networks_add(self):
dp = '4'
remove = False
expected_dps = ['0', '1', '2', '4']
self._test_update_chassis_metadata_networks_helper(
dp, remove, expected_dps)
def test_update_chassis_metadata_networks_remove(self):
dp = '2'
remove = True
expected_dps = ['0', '1']
self._test_update_chassis_metadata_networks_helper(
dp, remove, expected_dps)
def test_update_chassis_metadata_networks_add_dp_exists(self):
dp = '2'
remove = False
expected_dps = ['0', '1', '2']
self._test_update_chassis_metadata_networks_helper(
dp, remove, expected_dps, txn_called=False)
def test_update_chassis_metadata_networks_remove_no_dp(self):
dp = '3'
remove = True
expected_dps = ['0', '1', '2']
self._test_update_chassis_metadata_networks_helper(
dp, remove, expected_dps, txn_called=False)

View File

@ -807,51 +807,3 @@ class TestNBImplIdlOvn(TestDBImplIdlOvn):
lb_row = self._find_ovsdb_fake_row(self.lb_table, 'name', 'lb_2')
lb = self.nb_ovn_idl.get_floatingip_in_nat_or_lb(fip_id)
self.assertEqual(lb['_uuid'], lb_row.uuid)
class TestSBImplIdlOvn(TestDBImplIdlOvn):
fake_set = {
'chassis': [
{'name': 'fake-chassis',
'external_ids': {'neutron-metadata-proxy-networks': 'fake-id'}}],
}
def setUp(self):
super(TestSBImplIdlOvn, self).setUp()
self.chassis_table = fakes.FakeOvsdbTable.create_one_ovsdb_table()
self._tables = {}
self._tables['Chassis'] = self.chassis_table
with mock.patch.object(impl_idl_ovn.OvsdbSbOvnIdl, 'from_worker',
return_value=mock.Mock()):
with mock.patch.object(ovs_idl.Backend, 'autocreate_indices',
create=True):
impl_idl_ovn.OvsdbSbOvnIdl.ovsdb_connection = None
self.sb_ovn_idl = impl_idl_ovn.OvsdbSbOvnIdl(mock.MagicMock())
self.sb_ovn_idl.idl.tables = self._tables
def _load_sb_db(self):
fake_chassis = TestSBImplIdlOvn.fake_set['chassis']
self._load_ovsdb_fake_rows(self.chassis_table, fake_chassis)
@mock.patch.object(impl_idl_ovn.LOG, 'warning')
def test_get_chassis_metadata_networks_chassis_empty(self, mock_log):
chassis_name = 'fake-chassis'
result = self.sb_ovn_idl.get_chassis_metadata_networks(chassis_name)
mock_log.assert_called_once_with(
"Couldn't find Chassis named %s in OVN while looking "
"for metadata networks", chassis_name)
self.assertEqual([], result)
@mock.patch.object(impl_idl_ovn.LOG, 'warning')
def test_get_chassis_metadata_networks(self, mock_log):
chassis_name = 'fake-chassis'
self._load_sb_db()
result = self.sb_ovn_idl.get_chassis_metadata_networks(chassis_name)
self.assertFalse(mock_log.called)
self.assertEqual(['fake-id'], result)

View File

@ -42,7 +42,6 @@ from neutron.common.ovn import acl as ovn_acl
from neutron.common.ovn import constants as ovn_const
from neutron.common.ovn import hash_ring_manager
from neutron.common.ovn import utils as ovn_utils
from neutron.common import utils as n_utils
from neutron.conf.plugins.ml2.drivers.ovn import ovn_conf
from neutron.db import db_base_plugin_v2
from neutron.db import ovn_revision_numbers_db
@ -960,9 +959,6 @@ class TestOVNMechanismDriver(TestOVNMechanismDriverBase):
'provisioning_complete') as pc, \
mock.patch.object(self.mech_driver,
'_update_dnat_entry_if_needed') as ude, \
mock.patch.object(
self.mech_driver,
'_wait_for_metadata_provisioned_if_needed') as wmp, \
mock.patch.object(self.mech_driver, '_should_notify_nova',
return_value=is_compute_port):
self.mech_driver.set_port_status_up(port1['port']['id'])
@ -973,7 +969,6 @@ class TestOVNMechanismDriver(TestOVNMechanismDriverBase):
provisioning_blocks.L2_AGENT_ENTITY
)
ude.assert_called_once_with(port1['port']['id'])
wmp.assert_called_once_with(port1['port']['id'])
# If the port does NOT bellong to compute, do not notify Nova
# about it's status changes
@ -1064,31 +1059,6 @@ class TestOVNMechanismDriver(TestOVNMechanismDriverBase):
)
ude.assert_called_once_with(port1['port']['id'], False)
def _test__wait_for_metadata_provisioned_if_needed(self, enable_dhcp,
wait_expected):
with self.network(tenant_id='test') as net1, \
self.subnet(network=net1,
enable_dhcp=enable_dhcp) as subnet1, \
self.port(subnet=subnet1, set_context=True,
tenant_id='test') as port1, \
mock.patch.object(n_utils, 'wait_until_true') as wut, \
mock.patch.object(ovn_conf, 'is_ovn_metadata_enabled',
return_value=True):
self.mech_driver._wait_for_metadata_provisioned_if_needed(
port1['port']['id'])
if wait_expected:
self.assertEqual(1, wut.call_count)
else:
wut.assert_not_called()
def test__wait_for_metadata_provisioned_if_needed(self):
self._test__wait_for_metadata_provisioned_if_needed(
enable_dhcp=True, wait_expected=True)
def test__wait_for_metadata_provisioned_if_needed_not_needed(self):
self._test__wait_for_metadata_provisioned_if_needed(
enable_dhcp=False, wait_expected=False)
def test_bind_port_unsupported_vnic_type(self):
fake_port = fakes.FakePort.create_one_port(
attrs={'binding:vnic_type': 'unknown'}).info()