Files
system-config/doc/source/nodepool.rst
Clark Boylan 7bb12ad33e Remove nb01, nb02, and nb04 from config management
These servers have been replaced by new Noble servers (nb05, nb06,
nb07). These new servers have managed to build every one of our current
images except for gentoo, openeuler, and openeuler arm64. These three
images weren't building on the old system either.

There is a small amount of concern that removing the old servers without
letting them clean up the database after themselves may orphan some
zookeeper database records. However the current rockylinux-9 images were
both built by nb05 or nb06 and we don't have any old records from nb01
or nb02 remaining so it seems nodepool cleans up after itself properly.
Worst case we can probably do manual database edits.

We also remove the version specifier in the docker-compose.yaml file as
`docker compose` ignores it and emits a warning when it is present. Once
this change lands all of our nodepool builders will use `docker compose`
instead of `docker-compose` making this a safe cleanup.

Change-Id: Iab8d2b6493b78cc3711d64119da2da5d3456a25a
2025-03-20 09:29:16 -07:00

4.5 KiB

title

Nodepool

Nodepool

Nodepool is a service used by the OpenStack CI team to deploy and manage a pool of devstack images on a cloud server for use in OpenStack project testing.

At a Glance

Hosts
  • nl01.opendev.org
  • nl02.opendev.org
  • nl03.opendev.org
  • nl04.opendev.org
  • nb05.opendev.org
  • nb06.opendev.org
  • nb07.opendev.org
  • zk04.opendev.org
  • zk05.opendev.org
  • zk06.opendev.org
Puppet
Configuration
  • nodepool/nodepool.yaml
  • nodepool/scripts/
  • nodepool/elements/
Projects
Bugs
Resources

Overview

Once per day, for every image type (and provider) configured by nodepool, a new image with cached data is built for use by devstack. Nodepool spins up new instances and tears down old as tests are queued up and completed, always maintaining a consistent number of available instances for tests up to the set limits of the CI infrastructure.

Zookeeper

Nodepool stores image metadata in ZooKeeper. We have a three-node ZooKeeper cluster running on zk04.opendev.org - zk06.opendev.org.

The Nodepool CLI should be sufficient to examine and alter any of the information stored in ZooKeeper. However, in case advanced debugging is needed, use of zk-shell ("pip install zk_shell" into a virtualenv and run "zk-shell") is recommended as an easy way to inspect and/or change data in ZooKeeper.

Bad Images

Since nodepool takes a while to build images, and generally only does it once per day, occasionally the images it produces may have significant behavior changes from the previous versions. For instance, a provider's base image or operating system package may update, or some of the scripts or system configuration that we apply to the images may change. If this occurs, it is easy to revert to the last good image.

Nodepool periodically deletes old images, however, it never deletes the current or next most recent image in the ready state for any image-provider combination. So if you find that the ubuntu-precise image is problematic, you can run:

$ sudo nodepool dib-image-list

+---------------------------+----------------+---------+-----------+----------+-------------+
| ID                        | Image          | Builder | Formats   | State    | Age         |
+---------------------------+----------------+---------+-----------+----------+-------------+
| ubuntu-precise-0000000001 | ubuntu-precise | nb01    | qcow2,vhd | ready    | 02:00:57:33 |
| ubuntu-precise-0000000002 | ubuntu-precise | nb01    | qcow2,vhd | ready    | 01:00:57:33 |
+---------------------------+----------------+---------+-----------+----------+-------------+

Image ubuntu-precise-0000000001 is the previous image and ubuntu-precise-0000000002 is the current image (they are both marked as ready and the current image is simply the image with the shortest age.

Nodepool aggressively attempts to build and upload missing images, so if the problem with the image will not be solved with an immediate rebuild, image builds must first be disabled for that image. To do so, add pause: True to the diskimage section for ubuntu-precise in nodepool.yaml.

Then delete the problematic image with:

$ sudo nodepool dib-image-delete ubuntu-precise-0000000002

All uploads corresponding to that image build will be deleted from providers before the image DIB files are deleted. The previous image will become the current image and nodepool will use it when creating new nodes. When nodepool next creates an image, it will still retain build #1 since it will still be considered the next-most-recent image.

vhd-util

Creating images for Rackspace requires a patched version of vhd-util to convert the images into the appropriate VHD format. See the opendev/infra-vhd-util-deb for details of this custom package. This is installed on a production host via a PPA built and published by jobs in this repository.