[ovn][metadata] Remove metadata readiness mechanism
Prior to this patch, the metadata agent was writing into SB database when a network had been provisioned with metadata on a particular chassis. Then, neutron-server would wait for that event to happen with a 15s timeout before sending the vif-plugged event to Nova. By removing this mechanism: 1) We'll save writes to OVN SB database which, in highly loaded systems and at scale reduces significantly the load on ovsdb-server. 2) Ignoring healthchecks (that still requires write to the SB DB), we can make OVN metadata agent to connect to slave instances when using active-backup OVN databases since writes are not needed. 3) There's a chance that the VM boots very fast and requests metadata before the service is ready but since the timeout was 15 seconds, we can safely rely on the the cloud-init retries. Signed-off-by: Daniel Alvarez Sanchez <dalvarez@redhat.com> Change-Id: Ia6cd7a9a3b9662a9a8ce106e01a93c357c255956
This commit is contained in:
parent
7e98d18927
commit
36ba1cc319
@ -147,15 +147,8 @@ In neutron-ovn-metadata-agent.
|
||||
Southbound database and look for all rows with the ``chassis`` column set
|
||||
to the host the agent is running on. For all those entries, make sure a
|
||||
metadata proxy instance is spawned for every ``datapath`` (Neutron
|
||||
network) those ports are attached to. The agent will keep record of the
|
||||
list of networks it currently has proxies running on by updating the
|
||||
``external-ids`` key ``neutron-metadata-proxy-networks`` of the OVN
|
||||
``Chassis`` record in the OVN Southbound database that corresponds to this
|
||||
host. As an example, this key would look like
|
||||
``neutron-metadata-proxy-networks=NET1_UUID,NET4_UUID`` meaning that this
|
||||
chassis is hosting one or more VM's connected to networks 1 and 4 so we
|
||||
should have a metadata proxy instance running for each. Ensure any running
|
||||
metadata proxies no longer needed are torn down.
|
||||
network) those ports are attached to. Ensure any running metadata proxies
|
||||
no longer needed are torn down.
|
||||
|
||||
* Open and maintain a connection to the OVN Northbound database (using the
|
||||
ovsdbapp library). On first connection, and anytime a reconnect happens:
|
||||
@ -218,9 +211,6 @@ Launching a metadata proxy includes:
|
||||
|
||||
* Starting haproxy in this network namespace.
|
||||
|
||||
* Add the network UUID to ``external-ids:neutron-metadata-proxy-networks`` on
|
||||
the Chassis table for our chassis in OVN Southbound database.
|
||||
|
||||
Tearing down a metadata proxy includes:
|
||||
|
||||
* Removing the network UUID from our chassis.
|
||||
@ -239,18 +229,12 @@ have to deal with the complexity of it (haproxy instances, network namespaces,
|
||||
etcetera). In this case, the agent would not create the neutron ports needed
|
||||
for metadata.
|
||||
|
||||
There could be a race condition when the first VM for a certain network boots
|
||||
on a hypervisor if it does so before the metadata proxy instance has been
|
||||
spawned.
|
||||
|
||||
Right now, the ``vif-plugged`` event to Nova is sent out when the up column
|
||||
in the OVN Northbound database's Logical_Switch_Port table changes to True,
|
||||
indicating that the VIF is now up. To overcome this race condition we want
|
||||
to wait until all network UUID's to which this VM is connected to are present
|
||||
in ``external-ids:neutron-metadata-proxy-networks`` on the Chassis table
|
||||
for our chassis in OVN Southbound database. This will delay the event to Nova
|
||||
until the metadata proxy instance is up and running on the host ensuring the
|
||||
VM will be able to get the metadata on boot.
|
||||
indicating that the VIF is now up. There could be a race condition when the
|
||||
first VM for a certain network boots on a hypervisor if it does so before the
|
||||
metadata proxy instance has been spawned. Fortunately, retries on cloud-init
|
||||
should eventually fetch metadata even when this might happen.
|
||||
|
||||
Alternatives Considered
|
||||
-----------------------
|
||||
|
@ -335,8 +335,6 @@ class MetadataAgent(object):
|
||||
This function will shutdown metadata proxy if it's running and delete
|
||||
the VETH pair, the OVS port and the namespace.
|
||||
"""
|
||||
self.update_chassis_metadata_networks(datapath, remove=True)
|
||||
|
||||
# TODO(dalvarez): Remove this in Y cycle when we are sure that all
|
||||
# namespaces will be created with the Neutron network UUID and not
|
||||
# anymore with the OVN datapath UUID.
|
||||
@ -515,7 +513,6 @@ class MetadataAgent(object):
|
||||
self.conf, bind_address=n_const.METADATA_V4_IP,
|
||||
network_id=net_name)
|
||||
|
||||
self.update_chassis_metadata_networks(net_name)
|
||||
return namespace
|
||||
|
||||
def ensure_all_networks_provisioned(self):
|
||||
@ -541,30 +538,3 @@ class MetadataAgent(object):
|
||||
namespaces.append(netns)
|
||||
|
||||
return namespaces
|
||||
|
||||
# NOTE(lucasagomes): Even tho the metadata agent is a multi-process
|
||||
# application, there's only one Southbound database IDL instance in
|
||||
# the agent which handles the OVSDB events therefore we do not need
|
||||
# the external=True parameter in the @synchronized decorator.
|
||||
@lockutils.synchronized(CHASSIS_METADATA_LOCK)
|
||||
def update_chassis_metadata_networks(self, datapath, remove=False):
|
||||
"""Update metadata networks hosted in this chassis.
|
||||
|
||||
Add or remove a datapath from the list of current datapaths that
|
||||
we're currently serving metadata.
|
||||
"""
|
||||
current_dps = self.sb_idl.get_chassis_metadata_networks(self.chassis)
|
||||
updated = False
|
||||
if remove:
|
||||
if datapath in current_dps:
|
||||
current_dps.remove(datapath)
|
||||
updated = True
|
||||
else:
|
||||
if datapath not in current_dps:
|
||||
current_dps.append(datapath)
|
||||
updated = True
|
||||
|
||||
if updated:
|
||||
with self.sb_idl.create_transaction(check_error=True) as txn:
|
||||
txn.add(self.sb_idl.set_chassis_metadata_networks(
|
||||
self.chassis, current_dps))
|
||||
|
@ -43,7 +43,6 @@ from neutron.common.ovn import acl as ovn_acl
|
||||
from neutron.common.ovn import constants as ovn_const
|
||||
from neutron.common.ovn import extensions as ovn_extensions
|
||||
from neutron.common.ovn import utils as ovn_utils
|
||||
from neutron.common import utils as n_utils
|
||||
from neutron.conf.plugins.ml2.drivers.ovn import ovn_conf
|
||||
from neutron.db import ovn_hash_ring_db
|
||||
from neutron.db import ovn_revision_numbers_db
|
||||
@ -64,11 +63,6 @@ import neutron.wsgi
|
||||
|
||||
|
||||
LOG = log.getLogger(__name__)
|
||||
METADATA_READY_WAIT_TIMEOUT = 15
|
||||
|
||||
|
||||
class MetadataServiceReadyWaitTimeoutException(Exception):
|
||||
pass
|
||||
|
||||
|
||||
class OVNPortUpdateError(n_exc.BadRequest):
|
||||
@ -993,7 +987,6 @@ class OVNMechanismDriver(api.MechanismDriver):
|
||||
LOG.info("OVN reports status up for port: %s", port_id)
|
||||
|
||||
self._update_dnat_entry_if_needed(port_id)
|
||||
self._wait_for_metadata_provisioned_if_needed(port_id)
|
||||
|
||||
admin_context = n_context.get_admin_context()
|
||||
provisioning_blocks.provisioning_complete(
|
||||
@ -1088,61 +1081,6 @@ class OVNMechanismDriver(api.MechanismDriver):
|
||||
if phynet in phynets}
|
||||
segment_service_db.map_segment_to_hosts(context, segment.id, hosts)
|
||||
|
||||
def _wait_for_metadata_provisioned_if_needed(self, port_id):
|
||||
"""Wait for metadata service to be provisioned.
|
||||
|
||||
Wait until metadata service has been setup for this port in the chassis
|
||||
it resides. If metadata is disabled or DHCP is not enabled for its
|
||||
subnets, this function will return right away.
|
||||
"""
|
||||
if ovn_conf.is_ovn_metadata_enabled() and self._sb_ovn:
|
||||
# Wait until metadata service has been setup for this port in the
|
||||
# chassis it resides.
|
||||
result = (
|
||||
self._sb_ovn.get_logical_port_chassis_and_datapath(port_id))
|
||||
if not result:
|
||||
LOG.warning("Logical port %s doesn't exist in OVN", port_id)
|
||||
return
|
||||
chassis, datapath = result
|
||||
if not chassis:
|
||||
LOG.warning("Logical port %s is not bound to a "
|
||||
"chassis", port_id)
|
||||
return
|
||||
|
||||
# Check if the port belongs to some IPv4 subnet with DHCP enabled.
|
||||
context = n_context.get_admin_context()
|
||||
port = self._plugin.get_port(context, port_id)
|
||||
port_subnet_ids = set(
|
||||
ip['subnet_id'] for ip in port['fixed_ips'] if
|
||||
n_utils.get_ip_version(ip['ip_address']) == const.IP_VERSION_4)
|
||||
if not port_subnet_ids:
|
||||
# The port doesn't belong to any IPv4 subnet
|
||||
return
|
||||
|
||||
subnets = self._plugin.get_subnets(context, filters=dict(
|
||||
network_id=[port['network_id']], ip_version=[4],
|
||||
enable_dhcp=True))
|
||||
|
||||
subnet_ids = set(
|
||||
s['id'] for s in subnets if s['id'] in port_subnet_ids)
|
||||
if not subnet_ids:
|
||||
return
|
||||
|
||||
try:
|
||||
n_utils.wait_until_true(
|
||||
lambda: datapath in
|
||||
self._sb_ovn.get_chassis_metadata_networks(chassis),
|
||||
timeout=METADATA_READY_WAIT_TIMEOUT,
|
||||
exception=MetadataServiceReadyWaitTimeoutException)
|
||||
except MetadataServiceReadyWaitTimeoutException:
|
||||
# If we reach this point it means that metadata agent didn't
|
||||
# provision the datapath for this port on its chassis. Either
|
||||
# the agent is not running or it crashed. We'll complete the
|
||||
# provisioning block though.
|
||||
LOG.warning("Metadata service is not ready for port %s, check"
|
||||
" neutron-ovn-metadata-agent status/logs.",
|
||||
port_id)
|
||||
|
||||
def patch_plugin_merge(self, method_name, new_fn, op=operator.add):
|
||||
old_method = getattr(self._plugin, method_name)
|
||||
|
||||
|
@ -833,25 +833,6 @@ class OvsdbSbOvnIdl(sb_impl_idl.OvnSbApiIdlImpl, Backend):
|
||||
('type', '=', 'localport'))
|
||||
return next(iter(cmd.execute(check_error=True)), None)
|
||||
|
||||
def get_chassis_metadata_networks(self, chassis_name):
|
||||
"""Return a list with the metadata networks the chassis is hosting."""
|
||||
try:
|
||||
chassis = self.lookup('Chassis', chassis_name)
|
||||
except idlutils.RowNotFound:
|
||||
LOG.warning("Couldn't find Chassis named %s in OVN while looking "
|
||||
"for metadata networks", chassis_name)
|
||||
return []
|
||||
proxy_networks = chassis.external_ids.get(
|
||||
'neutron-metadata-proxy-networks', None)
|
||||
return proxy_networks.split(',') if proxy_networks else []
|
||||
|
||||
def set_chassis_metadata_networks(self, chassis, networks):
|
||||
nets = ','.join(networks) if networks else ''
|
||||
# TODO(twilson) This could just use DbSetCommand
|
||||
return cmd.UpdateChassisExtIdsCommand(
|
||||
self, chassis, {'neutron-metadata-proxy-networks': nets},
|
||||
if_exists=True)
|
||||
|
||||
def set_chassis_neutron_description(self, chassis, description,
|
||||
agent_type):
|
||||
desc_key = (ovn_const.OVN_AGENT_METADATA_DESC_KEY
|
||||
|
@ -123,13 +123,6 @@ class TestSbApi(BaseOvnIdlTest):
|
||||
val = str(uuid.uuid4())
|
||||
self.assertIsNone(self.api.get_metadata_port_network(val))
|
||||
|
||||
def test_set_get_chassis_metadata_networks(self):
|
||||
name = self.data['chassis'][0]['name']
|
||||
nets = [str(uuid.uuid4()) for _ in range(3)]
|
||||
self.api.set_chassis_metadata_networks(name, nets).execute(
|
||||
check_error=True)
|
||||
self.assertEqual(nets, self.api.get_chassis_metadata_networks(name))
|
||||
|
||||
def _create_bound_port_with_ip(self):
|
||||
chassis, switch, port, binding = self._add_switch_port(
|
||||
self.data['chassis'][0]['name'])
|
||||
|
@ -178,10 +178,7 @@ class TestMetadataAgent(base.BaseTestCase):
|
||||
Check that the VETH pair, OVS port and namespace associated to this
|
||||
namespace are deleted and the metadata proxy is destroyed.
|
||||
"""
|
||||
with mock.patch.object(self.agent,
|
||||
'update_chassis_metadata_networks'),\
|
||||
mock.patch.object(
|
||||
ip_netns, 'exists', return_value=True),\
|
||||
with mock.patch.object(ip_netns, 'exists', return_value=True),\
|
||||
mock.patch.object(
|
||||
ip_lib, 'device_exists', return_value=True),\
|
||||
mock.patch.object(
|
||||
@ -235,9 +232,6 @@ class TestMetadataAgent(base.BaseTestCase):
|
||||
ip_wrap, 'add_veth',
|
||||
return_value=[ip_lib.IPDevice('ip1'),
|
||||
ip_lib.IPDevice('ip2')]) as add_veth,\
|
||||
mock.patch.object(
|
||||
self.agent,
|
||||
'update_chassis_metadata_networks') as update_chassis,\
|
||||
mock.patch.object(
|
||||
driver.MetadataDriver,
|
||||
'spawn_monitored_metadata_proxy') as spawn_mdp, \
|
||||
@ -273,53 +267,4 @@ class TestMetadataAgent(base.BaseTestCase):
|
||||
spawn_mdp.assert_called_once_with(
|
||||
mock.ANY, 'namespace', 80, mock.ANY,
|
||||
bind_address=n_const.METADATA_V4_IP, network_id='1')
|
||||
# Check that the chassis has been updated with the datapath.
|
||||
update_chassis.assert_called_once_with('1')
|
||||
mock_checksum.assert_called_once_with('namespace')
|
||||
|
||||
def _test_update_chassis_metadata_networks_helper(
|
||||
self, dp, remove, expected_dps, txn_called=True):
|
||||
current_dps = ['0', '1', '2']
|
||||
with mock.patch.object(self.agent.sb_idl,
|
||||
'get_chassis_metadata_networks',
|
||||
return_value=current_dps),\
|
||||
mock.patch.object(self.agent.sb_idl,
|
||||
'set_chassis_metadata_networks',
|
||||
retrurn_value=True),\
|
||||
mock.patch.object(self.agent.sb_idl,
|
||||
'create_transaction') as create_txn_mock:
|
||||
|
||||
self.agent.update_chassis_metadata_networks(dp, remove=remove)
|
||||
updated_dps = self.agent.sb_idl.get_chassis_metadata_networks(
|
||||
self.agent.chassis)
|
||||
|
||||
self.assertEqual(updated_dps, expected_dps)
|
||||
self.assertEqual(create_txn_mock.called, txn_called)
|
||||
|
||||
def test_update_chassis_metadata_networks_add(self):
|
||||
dp = '4'
|
||||
remove = False
|
||||
expected_dps = ['0', '1', '2', '4']
|
||||
self._test_update_chassis_metadata_networks_helper(
|
||||
dp, remove, expected_dps)
|
||||
|
||||
def test_update_chassis_metadata_networks_remove(self):
|
||||
dp = '2'
|
||||
remove = True
|
||||
expected_dps = ['0', '1']
|
||||
self._test_update_chassis_metadata_networks_helper(
|
||||
dp, remove, expected_dps)
|
||||
|
||||
def test_update_chassis_metadata_networks_add_dp_exists(self):
|
||||
dp = '2'
|
||||
remove = False
|
||||
expected_dps = ['0', '1', '2']
|
||||
self._test_update_chassis_metadata_networks_helper(
|
||||
dp, remove, expected_dps, txn_called=False)
|
||||
|
||||
def test_update_chassis_metadata_networks_remove_no_dp(self):
|
||||
dp = '3'
|
||||
remove = True
|
||||
expected_dps = ['0', '1', '2']
|
||||
self._test_update_chassis_metadata_networks_helper(
|
||||
dp, remove, expected_dps, txn_called=False)
|
||||
|
@ -807,51 +807,3 @@ class TestNBImplIdlOvn(TestDBImplIdlOvn):
|
||||
lb_row = self._find_ovsdb_fake_row(self.lb_table, 'name', 'lb_2')
|
||||
lb = self.nb_ovn_idl.get_floatingip_in_nat_or_lb(fip_id)
|
||||
self.assertEqual(lb['_uuid'], lb_row.uuid)
|
||||
|
||||
|
||||
class TestSBImplIdlOvn(TestDBImplIdlOvn):
|
||||
|
||||
fake_set = {
|
||||
'chassis': [
|
||||
{'name': 'fake-chassis',
|
||||
'external_ids': {'neutron-metadata-proxy-networks': 'fake-id'}}],
|
||||
}
|
||||
|
||||
def setUp(self):
|
||||
super(TestSBImplIdlOvn, self).setUp()
|
||||
self.chassis_table = fakes.FakeOvsdbTable.create_one_ovsdb_table()
|
||||
|
||||
self._tables = {}
|
||||
self._tables['Chassis'] = self.chassis_table
|
||||
|
||||
with mock.patch.object(impl_idl_ovn.OvsdbSbOvnIdl, 'from_worker',
|
||||
return_value=mock.Mock()):
|
||||
with mock.patch.object(ovs_idl.Backend, 'autocreate_indices',
|
||||
create=True):
|
||||
impl_idl_ovn.OvsdbSbOvnIdl.ovsdb_connection = None
|
||||
self.sb_ovn_idl = impl_idl_ovn.OvsdbSbOvnIdl(mock.MagicMock())
|
||||
|
||||
self.sb_ovn_idl.idl.tables = self._tables
|
||||
|
||||
def _load_sb_db(self):
|
||||
fake_chassis = TestSBImplIdlOvn.fake_set['chassis']
|
||||
self._load_ovsdb_fake_rows(self.chassis_table, fake_chassis)
|
||||
|
||||
@mock.patch.object(impl_idl_ovn.LOG, 'warning')
|
||||
def test_get_chassis_metadata_networks_chassis_empty(self, mock_log):
|
||||
chassis_name = 'fake-chassis'
|
||||
result = self.sb_ovn_idl.get_chassis_metadata_networks(chassis_name)
|
||||
|
||||
mock_log.assert_called_once_with(
|
||||
"Couldn't find Chassis named %s in OVN while looking "
|
||||
"for metadata networks", chassis_name)
|
||||
self.assertEqual([], result)
|
||||
|
||||
@mock.patch.object(impl_idl_ovn.LOG, 'warning')
|
||||
def test_get_chassis_metadata_networks(self, mock_log):
|
||||
chassis_name = 'fake-chassis'
|
||||
self._load_sb_db()
|
||||
result = self.sb_ovn_idl.get_chassis_metadata_networks(chassis_name)
|
||||
|
||||
self.assertFalse(mock_log.called)
|
||||
self.assertEqual(['fake-id'], result)
|
||||
|
@ -42,7 +42,6 @@ from neutron.common.ovn import acl as ovn_acl
|
||||
from neutron.common.ovn import constants as ovn_const
|
||||
from neutron.common.ovn import hash_ring_manager
|
||||
from neutron.common.ovn import utils as ovn_utils
|
||||
from neutron.common import utils as n_utils
|
||||
from neutron.conf.plugins.ml2.drivers.ovn import ovn_conf
|
||||
from neutron.db import db_base_plugin_v2
|
||||
from neutron.db import ovn_revision_numbers_db
|
||||
@ -960,9 +959,6 @@ class TestOVNMechanismDriver(TestOVNMechanismDriverBase):
|
||||
'provisioning_complete') as pc, \
|
||||
mock.patch.object(self.mech_driver,
|
||||
'_update_dnat_entry_if_needed') as ude, \
|
||||
mock.patch.object(
|
||||
self.mech_driver,
|
||||
'_wait_for_metadata_provisioned_if_needed') as wmp, \
|
||||
mock.patch.object(self.mech_driver, '_should_notify_nova',
|
||||
return_value=is_compute_port):
|
||||
self.mech_driver.set_port_status_up(port1['port']['id'])
|
||||
@ -973,7 +969,6 @@ class TestOVNMechanismDriver(TestOVNMechanismDriverBase):
|
||||
provisioning_blocks.L2_AGENT_ENTITY
|
||||
)
|
||||
ude.assert_called_once_with(port1['port']['id'])
|
||||
wmp.assert_called_once_with(port1['port']['id'])
|
||||
|
||||
# If the port does NOT bellong to compute, do not notify Nova
|
||||
# about it's status changes
|
||||
@ -1064,31 +1059,6 @@ class TestOVNMechanismDriver(TestOVNMechanismDriverBase):
|
||||
)
|
||||
ude.assert_called_once_with(port1['port']['id'], False)
|
||||
|
||||
def _test__wait_for_metadata_provisioned_if_needed(self, enable_dhcp,
|
||||
wait_expected):
|
||||
with self.network(tenant_id='test') as net1, \
|
||||
self.subnet(network=net1,
|
||||
enable_dhcp=enable_dhcp) as subnet1, \
|
||||
self.port(subnet=subnet1, set_context=True,
|
||||
tenant_id='test') as port1, \
|
||||
mock.patch.object(n_utils, 'wait_until_true') as wut, \
|
||||
mock.patch.object(ovn_conf, 'is_ovn_metadata_enabled',
|
||||
return_value=True):
|
||||
self.mech_driver._wait_for_metadata_provisioned_if_needed(
|
||||
port1['port']['id'])
|
||||
if wait_expected:
|
||||
self.assertEqual(1, wut.call_count)
|
||||
else:
|
||||
wut.assert_not_called()
|
||||
|
||||
def test__wait_for_metadata_provisioned_if_needed(self):
|
||||
self._test__wait_for_metadata_provisioned_if_needed(
|
||||
enable_dhcp=True, wait_expected=True)
|
||||
|
||||
def test__wait_for_metadata_provisioned_if_needed_not_needed(self):
|
||||
self._test__wait_for_metadata_provisioned_if_needed(
|
||||
enable_dhcp=False, wait_expected=False)
|
||||
|
||||
def test_bind_port_unsupported_vnic_type(self):
|
||||
fake_port = fakes.FakePort.create_one_port(
|
||||
attrs={'binding:vnic_type': 'unknown'}).info()
|
||||
|
Loading…
Reference in New Issue
Block a user