d9913370de
One of the biggest frustrations larger operators have is when they trigger a massive number of concurrent deployments. As one would expect, the memory utilization of the conductor goes up. Except, even with the default number of worker threads, if we're requested to convert 80 images at the same time, or to perform the write-out to the remote node at the same time, we will consume a large amount of system RAM. Or more specifically, qemu-img will consume a large amount of memory. If the amount of memory goes too low, the system can trigger OOMKiller which will slay processes using ram. Ideally, we do not want this to happen to our conductor process, much less the work that is being performed, so we need to add some guard rails to help keep us from entering into situations where we may compromise the conductor by taking on too much work. Adds a guard in the conductor to prevent multiple parallel deployment operations from running the conductor out of memory. With the defaults, the conductor will attempt to throttle back automatically and hold worker threads which will slow down the amount of work also proceeding through the conductor, as we are in a memory condition where we should be careful about the work. The defaults allow this to occur for a total of 15 seconds between re-check of available RAM, for a total number of six retries. The minimum default is 1024 (MB), as this is the amount of memory qemu-img allocates when trying to write images. This quite literally means no additional qemu-img process can spawn until the default memory situation has resolved itself. Change-Id: I69db0169c564c5b22abd0cb1b890f409c13b0ac2 |
||
---|---|---|
.. | ||
notes | ||
source |