Don't force_reconnect() on unhandled Idl exception

There's no reason to believe that reconnecting to ovsdb-server will
resolve an unhandled exception in python-ovs. In addition, since users
often subclass Idl and add their own notify() methods, there could be
exceptions thrown from that code.

The best we can do is log what is going on and rely on users to fix
the issue. Delaying with sleep() is usually a bad idea since if there
was some kind of ovsdb reconnection, it will delay calls to Idl.run()
which will handle that reconnection over several calls.

Change-Id: Iab2177fb9fa653292a3805689895f98e0833dc4a
This commit is contained in:
Terry Wilson 2022-10-24 14:58:46 -05:00
parent 44756a4e1e
commit cd70d1e290

View File

@ -93,7 +93,6 @@ class Connection(object):
self.thread.start() self.thread.start()
def run(self): def run(self):
errors = 0
while self.is_running: while self.is_running:
# If we fail in an Idl call, we could have missed an update # If we fail in an Idl call, we could have missed an update
# from the server, leaving us out of sync with ovsdb-server. # from the server, leaving us out of sync with ovsdb-server.
@ -108,22 +107,10 @@ class Connection(object):
self.idl.run() self.idl.run()
except Exception as e: except Exception as e:
# This shouldn't happen, but is possible if there is a bug # This shouldn't happen, but is possible if there is a bug
# in python-ovs # in python-ovs, or an unhandled exception in overridden
errors += 1 # Idl.notify() code
LOG.exception(e) LOG.exception(e)
with self.lock:
self.idl.force_reconnect()
try:
idlutils.wait_for_change(self.idl, self.timeout)
except Exception as e:
# This could throw the same exception as idl.run()
# or Exception("timeout"), either way continue
LOG.exception(e)
sleep = min(2 ** errors, 60)
LOG.info("Trying to recover, sleeping %s seconds", sleep)
time.sleep(sleep)
continue continue
errors = 0
txn = self.txns.get_nowait() txn = self.txns.get_nowait()
if txn is not None: if txn is not None:
try: try: