Add Airship 1 Flow Doc

This adds some Airship 1 documentation that Rodolfo authored to
describe the end-to-end Airship 1 deployment flow.

Change-Id: Ie1d2ecfa2cc3ba9883ffcfbc238b8f3499def9da
This commit is contained in:
Matt McEuen 2021-04-09 15:52:53 -05:00
parent 6ee51e9ca5
commit ff0c344f2a
3 changed files with 277 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 153 KiB

View File

@ -0,0 +1,276 @@
****************************
Airship 1.0 Deployment Flows
****************************
.. |vspace| raw:: latex
\vspace{5mm}
.. image:: airship-1-flow.png
Airship 1.0 Deploy and Update Site Flow
#######################################
1. Pegleg facilitates cloning the repositories necessary to interact with
a site. Each site has a single site-definition.yaml which contains
the repositories that “compose” that site. These may be global
repositories, type level repositories (e.g. cruisers or cloud harbor),
and finally site-level repositories. These may be entirely different
repositories with different permissions. Pegleg facilitates cloning
all of these at the correct revisions according to the definition for
that site. Pegleg can be driven via a jenkins pipeline, which can be
further abstracted in something like an NC3C dashboard, or it can be
driven on the command line directly by imitating the behavior in the
pipeline.
|vspace|
2. Pegleg wears several different hats. The CI/CD workflows leverage
different pipelines in order to call upon these hats but under the
hood, its really just different command line flags on the pegleg CLI
command depending on what type of action is occurring. Pegleg can:
a. Generate (and re-generate/rotate) new secure secrets for a site
according to each secrets requirement (e.g. length, type, and so
on). For instance: UUIDs, passwords, keys, and so on.
|vspace|
b. Encrypt secrets, and Decrypt secrets. When secrets are encrypted,
they are wrapped in a YAML envelope containing metadata for each
secret. This allows for understanding when secrets are going to
expire, when they were last rotated, and so on. All deployment and
update pipelines for instance would leverage the decrypt functionality
in order to render the documents successfully.
|vspace|
c. Lint the YAML to ensure it is valid and meets certain basic syntax
criteria and deckhand does not have an issue processing rules
encountered. For instance, development gating pipelines that validate
changes to YAML would invoke pegleg in this way.
|vspace|
d. Render will actually process the documents through the deckhand
library, which will perform substitutions, pull in the secrets that
are referenced from the configuration YAML so you can see the target
document locally. This is effectively a very in-depth linting process
and again would be used in development gates and potentially to
fast-fail in deployment and update pipelines if there was an issue.
|vspace|
e. Collect will bundle up all the documents but not actually render them
which is appropriate for deployment and update pipelines as it sends
the documents through raw (but presumably with decrypted secrets)
because each cloud site has its own deckhand instance running
maintaining its own revision history capable of rendering the
documents in-site. It is used in every deployment and update pipeline
as the results of collect are what is sent to shipyard.
|vspace|
3. Once pegleg has decrypted the secrets in the document set within an
ephemeral jenkins pipeline, pegleg collect is called to assemble them
all, and finally that is piped to the shipyard client which will
publish them via REST API to a Shipyard API service running within the
site. There are two scenarios under which Shipyard may be running in
the site.
a. On the genesis host, which is a single node running Kubernetes in a
green-field site that will be expanded to a full cluster once more
nodes are provisioned.
|vspace|
b. On the control plane of a greenfield site, receiving a site-update
or expansion.
Simply put, the entire Shipyard workflow can be summarized as follows:
* Initial region/site data will be passed to Shipyard from either a
human operator or Jenkins
* The data (in YAML format) will be sent to Deckhand for validation and storage
* Shipyard will make use of the post-processed data from DeckHand to
interact with Drydock.
* Drydock will interact with Promenade to provision and deploy bare metal
nodes using Ubuntu MAAS and a resilient Kubernetes cluster will be created
at the end of the process
* Once the Kubernetes clusters are up and validated to be working properly,
Shipyard will interact with Armada to deploy OpenStack using OpenStack Helm
* Once the OpenStack cluster is deployed, Shipyard will trigger a workflow to
perform basic sanity health checks on the cluster
|vspace|
4. Shipyard will do a number of pre-validations before delivering the
document set to deckhand. Things such as a concurrency check, to
ensure we dont try to run updates in parallel unaware of each other.
It will also run a number of fail-fast validation checks.
|vspace|
5. Shipyard will leverage the deckhand client library to deliver the
documents to deckhand over its REST API, which will again validate
them and render them (which again involves performing all layering,
substitution, secret interpolation, and so on) and publishes a
document revision, so that there is an on-site record of every change
that has ever been requested. This document revision that is fully
rendered will be available at a deckhand REST API URL that can be
retrieved by various Airship sub-components.
|vspace|
6. Deckhand will store secrets within Barbican so that they are not
stored in clear text within a database, and the rendered document set
revision itself is stored directly in a database. Deckhand will
change every secret to a Barbican reference which will be rendered
on-demand by Deckhand whenever someone asks for that document revision
through the API.
|vspace|
7. At this point, with the documents stored in Deckhand, Shipyard will
perform another fail-fast step and ask each of the components
highlighted in yellow to perform a dry-run no-op validation of the
entire document set from their perspective. This means that Drydock
for instance, would be validating and acknowledging it would not have
any issue processing the document set it sees in Deckhand. This
helps ensure we do not encounter updates that fail in the middle
of the process. If a component is unhappy with the document set
we want to know early and fail before making any changes.
|vspace|
8. Shipyard will now invoke Drydock to provision baremetal hosts that
have not already been provisioned and continue to call back or poll
for when Drydock has completed this process. Airship has a concept
called deployment strategies because the hardware aspect of
deployment is not guaranteed or reliable, and we dont always want
failures here to block every other process in the stack. In other
words, our deployment strategies require that 100% of nodes marked
as control plane nodes must be provisioned successfully to
continue, but that a certain percentage of each rack of workers
could fail and we can still continue past the hardware
provisioning steps successfully. In other words, this is where we
introduce a threshold of failure.
|vspace|
9. Shipyard will send Drydock the Deckhand URL to obtain the document set
for itself for this update. Drydock will retrieve the entire document
set from Deckhand but it will only process documents it cares
about.
|vspace|
10. Drydock will process any Drydock/BootAction documents that have
external references in them to render those upfront before writing an
operating system to the physical host. Most importantly, this allows
Promenade to construct a host-specific join script. In other words,
Drydock calls out to the Promenade REST API to construct a join shell
script for each host and this is driven by Drydock/Bootaction
documents.
|vspace|
11. Drydock will orchestrate MaaS based on the document set. It does this
through several internal tasks, prepare_site, prepare_nodes, and
deploy_site. Within prepare_site, upfront orchestration of MaaS
occurs setting non-host specific settings via the MaaS API, such as
CIDRs, and VLANs. Within prepare_nodes, we identify hosts that
havent already been provisioned and then power cycle hosts, wait for
them to be discovered by MaaS, and then aligning and renaming them to
hosts in our static inventory. Then the host configuration is
orchestrated in MaaS so they have the proper networking and storage
configuration as well as receive the correct static overlays, like
Kubernetes join scripts, the correct Drivers, and so on, on
first-boot. Finally within Drydocks deploy_nodes task we orchestrate
several MaaS flows to actually provision the nodes with an operating
system where they execute any additional static scripts delivered on
first-boot.
|vspace|
12. During the deploy_nodes phase of Drydock, MaaS is effectively writing
an operating system to the baremetal nodes.
|vspace|
13. Driven by cloud-init on first boot post provision, the nodes will
actually make a rest call back to the MaaS API to inform it that
provisioning has completed and they have successfully booted up into
functional networking and have booted up successfully. Drydock can
use this status within MaaS to understand the nodes were provisioned
successfully.
|vspace|
14. The nodes run the Promenade generated shell script to join them to
Kubernetes. This host-specific script installs the appropriate
dependencies and joins the node as a Kubernetes node, either as a
worker, or as a control plane host depending on the hosts profile in
the YAML inventory.
|vspace|
15. Shipyard has been polling Drydock for completion of processing the
site update. Once the polling for Drydock provisioning completes,
Shipyard will move on to performing a similar request to Armada.
Armada is asked to update the site and given a Deckhand URL and
revision to pull from.
|vspace|
16. Armada pulls the rendered document set from Deckhand.
|vspace|
17. Armada then proceeds to help orchestrate any helm installs or upgrades
necessary in the site, and helps do this across a vast number of
charts, their ordering, and dependencies. Armada also supports
fetching Helm chart source and then building charts from source from
various local and remote locations, such as Git endpoints, tarballs or
local directories. It will also give the operator some indication of
what is about to change by assisting with diffs for both values,
values overrides, and actual template changes. Its functionality
extends beyond Helm, assisting in interacting with Kubernetes directly
to perform basic pre- and post-steps, such as removing completed or
failed jobs, running backup jobs, blocking on chart readiness, or
deleting resources that do not support upgrades. However, primarily,
it is an interface to support orchestrating Helm.
|vspace|
18. Armada effectively interacts with Tiller for installation (although it
may interact with k8s directly to poll, wait, remove jobs, and
otherwise help protect helm from failures). Tiller will then interact
with k8s to perform helm chart installations or upgrades.
Airship 1.0 Update Software Flow
################################
The Update Software flow (or “action” in Shipyard -- depicted with
green numbers in the image) is effectively a subset of the above flow.
It is used primarily to speed the process up by bypassing the Drydock
flow entirely. The reason for this is both speed as interacting with
MaaS is slow, as well as times where you want to avoid trying to
process hardware requests (e.g. waiting for Drydock to try and
provision a piece of failed hardware only to ultimately timeout some
time later before moving on to the next step because the deployment
strategy allows it).
Further Documentation
#####################
* https://airshipit.readthedocs.io/projects/shipyard/en/latest/
* https://airshipit.readthedocs.io/projects/pegleg/en/latest/
* https://airshipit.readthedocs.io/projects/armada/en/latest/
* https://airshipit.readthedocs.io/projects/promenade/en/latest/
* https://airshipit.readthedocs.io/projects/drydock/en/latest/
* https://airshipit.readthedocs.io/projects/deckhand/en/latest/
* https://airshipit.readthedocs.io/en/latest/

View File

@ -75,6 +75,7 @@ developers.
Seaworthy: Production-grade Airship <https://docs.airshipit.org/treasuremap/seaworthy.html>
develop/airship1-developers.rst
develop/conventions.rst
airship1/airship-1-flow.rst
Other Resources
---------------