infra-specs/specs/cleanup-test-node-python.rst
Clark Boylan a6ea64f8aa Add storyboard story for test node cleanup
This was missing in the original spec. Additionally add ianw as
volunteer as most of the changes are pushed up and ready to go now that
we have agreement.

Change-Id: Icfedce70fa6c91a5ebafd94804982a326140e41d
2020-03-05 16:49:14 -08:00

5.6 KiB

Copyright 2020 OpenStack Foundation

This work is licensed under a Creative Commons Attribution 3.0
Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode

Cleanup Test Node Python Installation

https://storyboard.openstack.org/#!/story/2007386

The OpenDev Nodepool builders use minimal distro elements to build our test node images up from scratch. We have done this in order to reduce the size of images, control what goes in them, and to install glean which has been required for images to boot properly in some some clouds. Unfortunately, because we install glean, a python project, we drag in a python toolchain (pip, virtualenv, etc) from pypi. This can create problems if jobs later expect these tools to be distro package installed.

Problem Description

As noted above we build our test node images from scratch. One of the reasons for this is to install the glean utility via pip. In order to do that we pull in latest pip to install glean for us. We also create several virtualenvs for os-testr, bindep, and a zuul-cloner compatibility shim. To do this we use latest pip to install latest virtualenv. Finally, we install tox using latest pip as many of our jobs leverage it to drive testing.

Historically this has been fine as we have primarily tested python software that want to install using up to date python development tools. Over time we've shifted to be more of a general purpose CI platform and jobs that don't want latest python development tools have had to work around the decisions we have made on our images.

Recently this was made worse by a virtualenv release that was incompatible with older virtualenv and the tools built around it. In debugging this we discovered that we use python3 -m venv and virtualenv on different platforms to create our system level virtualenvs for os-testr, bindep, and zuul-cloner. This resulted in different behaviors on different platforms and made debugging difficult.

Ideally we would use a consistent set of tooling for system level python utilities and avoid assuming global latest pip on the images entirely. This would lead to consistent behavior for our utilities across platforms, and jobs that aren't testing python from source can interact with the system in the manner they choose.

Proposed Change

All platforms we use today support python3 (including latest CentOS 7). This allows us to use python3 -m venv on all platforms to create system level virtualenvs for tools like os-testr, bindep, and zuul-cloner. Additionally, we can move glean and tox into system-level virtualenvs using python3 -m venv. If we do this we can avoid installing pip and virtualenv from pypi at a global level.

This will get us consistent utility behavior across platforms and makes life easier for jobs that don't assume latest python development tools are preinstalled.

We will need to accomodate existing jobs that assume an up to date python development utility set. For these jobs that use tox they can simply refer to the tox that has been installed in a system level virtualenv. For jobs that need virtualenv and/or pip they will need to install these tools at job runtime. We can update base jobs as necessary to do that automatically for most jobs. In order to reduce the cost of this installation we can precache get-pip.py as well as wheels for these tools and their dependencies.

Alternatives

We could use distro packages for python development tools. These tend to end up out of date, and will result in different behaviors across platforms.

We could bootstrap everything at job runtime. This will put unwanted pressure on our mirrors and caches ans the vast majority of jobs will now install a consistent set of tools.

We could replace glean with a non python project. Unfortuantely glean encodes so many random cloud behaviors that rewriting it would be a fairly signficant effort that we don't have time for.

We could continue with the current image build processes, but provide a zuul job role that cleans up python development tools for jobs that expect system pacakges.

Implementation

Assignee(s)

Primary assignee:

Ian Wienand (ianw)

Gerrit Topic

Use Gerrit topic "cleanup-test-image-python" for all patches related to this spec.

git-review -t cleanup-test-image-python

Work Items

  • Communicate this spec and its changes broadly as it has the chance to impact a number of projects, teams, and jobs.
  • Do this for a new test image and label
    • Remove pip-and-virtualenv from our image element dependency list.
    • Install python3 and python3-venv in all image builds.
    • Replace inconsistent system level virtualenvs with python3 -m venv virtualenvs.
    • Add new system level virtualenvs for glean and tox.
  • Apply the above changes to our production images and labels once tested and working.

Repositories

openstack/project-config will have its nodepool elements as well as nodepool-builder configs updated.

Servers

This will affect all of our single use test nodes.

DNS Entries

None

Documentation

We will update the OpenDev Test Environment docs: https://docs.openstack.org/infra/manual/testing.html

Security

N/A

Testing

We will apply these changes to a new image and label so that production images are unaffected. Once this new image/label is available in Zuul we can run a representative set of jobs against it to ensure the expected behavior.

Dependencies

None