Add Airship 1 Flow Doc
This adds some Airship 1 documentation that Rodolfo authored to describe the end-to-end Airship 1 deployment flow. Change-Id: Ie1d2ecfa2cc3ba9883ffcfbc238b8f3499def9da
This commit is contained in:
parent
6ee51e9ca5
commit
ff0c344f2a
BIN
doc/source/airship1/airship-1-flow.png
Normal file
BIN
doc/source/airship1/airship-1-flow.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 153 KiB |
276
doc/source/airship1/airship-1-flow.rst
Normal file
276
doc/source/airship1/airship-1-flow.rst
Normal file
@ -0,0 +1,276 @@
|
|||||||
|
****************************
|
||||||
|
Airship 1.0 Deployment Flows
|
||||||
|
****************************
|
||||||
|
|
||||||
|
.. |vspace| raw:: latex
|
||||||
|
|
||||||
|
\vspace{5mm}
|
||||||
|
|
||||||
|
.. image:: airship-1-flow.png
|
||||||
|
|
||||||
|
Airship 1.0 Deploy and Update Site Flow
|
||||||
|
#######################################
|
||||||
|
|
||||||
|
|
||||||
|
1. Pegleg facilitates cloning the repositories necessary to interact with
|
||||||
|
a site. Each site has a single site-definition.yaml which contains
|
||||||
|
the repositories that “compose” that site. These may be global
|
||||||
|
repositories, type level repositories (e.g. cruisers or cloud harbor),
|
||||||
|
and finally site-level repositories. These may be entirely different
|
||||||
|
repositories with different permissions. Pegleg facilitates cloning
|
||||||
|
all of these at the correct revisions according to the definition for
|
||||||
|
that site. Pegleg can be driven via a jenkins pipeline, which can be
|
||||||
|
further abstracted in something like an NC3C dashboard, or it can be
|
||||||
|
driven on the command line directly by imitating the behavior in the
|
||||||
|
pipeline.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
2. Pegleg wears several different hats. The CI/CD workflows leverage
|
||||||
|
different pipelines in order to call upon these hats but under the
|
||||||
|
hood, it’s really just different command line flags on the pegleg CLI
|
||||||
|
command depending on what type of action is occurring. Pegleg can:
|
||||||
|
|
||||||
|
a. Generate (and re-generate/rotate) new secure secrets for a site
|
||||||
|
according to each secret’s requirement (e.g. length, type, and so
|
||||||
|
on). For instance: UUIDs, passwords, keys, and so on.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
b. Encrypt secrets, and Decrypt secrets. When secrets are encrypted,
|
||||||
|
they are wrapped in a YAML envelope containing metadata for each
|
||||||
|
secret. This allows for understanding when secrets are going to
|
||||||
|
expire, when they were last rotated, and so on. All deployment and
|
||||||
|
update pipelines for instance would leverage the decrypt functionality
|
||||||
|
in order to render the documents successfully.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
c. Lint the YAML to ensure it is valid and meets certain basic syntax
|
||||||
|
criteria and deckhand does not have an issue processing rules
|
||||||
|
encountered. For instance, development gating pipelines that validate
|
||||||
|
changes to YAML would invoke pegleg in this way.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
d. Render will actually process the documents through the deckhand
|
||||||
|
library, which will perform substitutions, pull in the secrets that
|
||||||
|
are referenced from the configuration YAML so you can see the target
|
||||||
|
document locally. This is effectively a very in-depth linting process
|
||||||
|
and again would be used in development gates and potentially to
|
||||||
|
fast-fail in deployment and update pipelines if there was an issue.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
e. Collect will bundle up all the documents but not actually render them
|
||||||
|
which is appropriate for deployment and update pipelines as it sends
|
||||||
|
the documents through raw (but presumably with decrypted secrets)
|
||||||
|
because each cloud site has its own deckhand instance running
|
||||||
|
maintaining its own revision history capable of rendering the
|
||||||
|
documents in-site. It is used in every deployment and update pipeline
|
||||||
|
as the results of collect are what is sent to shipyard.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
3. Once pegleg has decrypted the secrets in the document set within an
|
||||||
|
ephemeral jenkins pipeline, pegleg collect is called to assemble them
|
||||||
|
all, and finally that is piped to the shipyard client which will
|
||||||
|
publish them via REST API to a Shipyard API service running within the
|
||||||
|
site. There are two scenarios under which Shipyard may be running in
|
||||||
|
the site.
|
||||||
|
|
||||||
|
a. On the genesis host, which is a single node running Kubernetes in a
|
||||||
|
green-field site that will be expanded to a full cluster once more
|
||||||
|
nodes are provisioned.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
b. On the control plane of a greenfield site, receiving a site-update
|
||||||
|
or expansion.
|
||||||
|
|
||||||
|
Simply put, the entire Shipyard workflow can be summarized as follows:
|
||||||
|
|
||||||
|
* Initial region/site data will be passed to Shipyard from either a
|
||||||
|
human operator or Jenkins
|
||||||
|
* The data (in YAML format) will be sent to Deckhand for validation and storage
|
||||||
|
* Shipyard will make use of the post-processed data from DeckHand to
|
||||||
|
interact with Drydock.
|
||||||
|
* Drydock will interact with Promenade to provision and deploy bare metal
|
||||||
|
nodes using Ubuntu MAAS and a resilient Kubernetes cluster will be created
|
||||||
|
at the end of the process
|
||||||
|
* Once the Kubernetes clusters are up and validated to be working properly,
|
||||||
|
Shipyard will interact with Armada to deploy OpenStack using OpenStack Helm
|
||||||
|
* Once the OpenStack cluster is deployed, Shipyard will trigger a workflow to
|
||||||
|
perform basic sanity health checks on the cluster
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
4. Shipyard will do a number of pre-validations before delivering the
|
||||||
|
document set to deckhand. Things such as a concurrency check, to
|
||||||
|
ensure we don’t try to run updates in parallel unaware of each other.
|
||||||
|
It will also run a number of fail-fast validation checks.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
5. Shipyard will leverage the deckhand client library to deliver the
|
||||||
|
documents to deckhand over its REST API, which will again validate
|
||||||
|
them and render them (which again involves performing all layering,
|
||||||
|
substitution, secret interpolation, and so on) and publishes a
|
||||||
|
document revision, so that there is an on-site record of every change
|
||||||
|
that has ever been requested. This document revision that is fully
|
||||||
|
rendered will be available at a deckhand REST API URL that can be
|
||||||
|
retrieved by various Airship sub-components.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
6. Deckhand will store secrets within Barbican so that they are not
|
||||||
|
stored in clear text within a database, and the rendered document set
|
||||||
|
revision itself is stored directly in a database. Deckhand will
|
||||||
|
change every secret to a Barbican reference which will be rendered
|
||||||
|
on-demand by Deckhand whenever someone asks for that document revision
|
||||||
|
through the API.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
7. At this point, with the documents stored in Deckhand, Shipyard will
|
||||||
|
perform another fail-fast step and ask each of the components
|
||||||
|
highlighted in yellow to perform a dry-run no-op validation of the
|
||||||
|
entire document set from their perspective. This means that Drydock
|
||||||
|
for instance, would be validating and acknowledging it would not have
|
||||||
|
any issue processing the document set it sees in Deckhand. This
|
||||||
|
helps ensure we do not encounter updates that fail in the middle
|
||||||
|
of the process. If a component is unhappy with the document set
|
||||||
|
we want to know early and fail before making any changes.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
8. Shipyard will now invoke Drydock to provision baremetal hosts that
|
||||||
|
have not already been provisioned and continue to call back or poll
|
||||||
|
for when Drydock has completed this process. Airship has a concept
|
||||||
|
called deployment strategies because the hardware aspect of
|
||||||
|
deployment is not guaranteed or reliable, and we don’t always want
|
||||||
|
failures here to block every other process in the stack. In other
|
||||||
|
words, our deployment strategies require that 100% of nodes marked
|
||||||
|
as control plane nodes must be provisioned successfully to
|
||||||
|
continue, but that a certain percentage of each rack of workers
|
||||||
|
could fail and we can still continue past the hardware
|
||||||
|
provisioning steps successfully. In other words, this is where we
|
||||||
|
introduce a threshold of failure.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
9. Shipyard will send Drydock the Deckhand URL to obtain the document set
|
||||||
|
for itself for this update. Drydock will retrieve the entire document
|
||||||
|
set from Deckhand but it will only process documents it cares
|
||||||
|
about.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
10. Drydock will process any Drydock/BootAction documents that have
|
||||||
|
external references in them to render those upfront before writing an
|
||||||
|
operating system to the physical host. Most importantly, this allows
|
||||||
|
Promenade to construct a host-specific join script. In other words,
|
||||||
|
Drydock calls out to the Promenade REST API to construct a join shell
|
||||||
|
script for each host and this is driven by Drydock/Bootaction
|
||||||
|
documents.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
11. Drydock will orchestrate MaaS based on the document set. It does this
|
||||||
|
through several internal tasks, prepare_site, prepare_nodes, and
|
||||||
|
deploy_site. Within prepare_site, upfront orchestration of MaaS
|
||||||
|
occurs setting non-host specific settings via the MaaS API, such as
|
||||||
|
CIDRs, and VLANs. Within prepare_nodes, we identify hosts that
|
||||||
|
haven’t already been provisioned and then power cycle hosts, wait for
|
||||||
|
them to be discovered by MaaS, and then aligning and renaming them to
|
||||||
|
hosts in our static inventory. Then the host configuration is
|
||||||
|
orchestrated in MaaS so they have the proper networking and storage
|
||||||
|
configuration as well as receive the correct static overlays, like
|
||||||
|
Kubernetes join scripts, the correct Drivers, and so on, on
|
||||||
|
first-boot. Finally within Drydock’s deploy_nodes task we orchestrate
|
||||||
|
several MaaS flows to actually provision the nodes with an operating
|
||||||
|
system where they execute any additional static scripts delivered on
|
||||||
|
first-boot.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
12. During the deploy_nodes phase of Drydock, MaaS is effectively writing
|
||||||
|
an operating system to the baremetal nodes.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
13. Driven by cloud-init on first boot post provision, the nodes will
|
||||||
|
actually make a rest call back to the MaaS API to inform it that
|
||||||
|
provisioning has completed and they have successfully booted up into
|
||||||
|
functional networking and have booted up successfully. Drydock can
|
||||||
|
use this status within MaaS to understand the nodes were provisioned
|
||||||
|
successfully.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
14. The nodes run the Promenade generated shell script to join them to
|
||||||
|
Kubernetes. This host-specific script installs the appropriate
|
||||||
|
dependencies and joins the node as a Kubernetes node, either as a
|
||||||
|
worker, or as a control plane host depending on the hosts profile in
|
||||||
|
the YAML inventory.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
15. Shipyard has been polling Drydock for completion of processing the
|
||||||
|
site update. Once the polling for Drydock provisioning completes,
|
||||||
|
Shipyard will move on to performing a similar request to Armada.
|
||||||
|
Armada is asked to update the site and given a Deckhand URL and
|
||||||
|
revision to pull from.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
16. Armada pulls the rendered document set from Deckhand.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
17. Armada then proceeds to help orchestrate any helm installs or upgrades
|
||||||
|
necessary in the site, and helps do this across a vast number of
|
||||||
|
charts, their ordering, and dependencies. Armada also supports
|
||||||
|
fetching Helm chart source and then building charts from source from
|
||||||
|
various local and remote locations, such as Git endpoints, tarballs or
|
||||||
|
local directories. It will also give the operator some indication of
|
||||||
|
what is about to change by assisting with diffs for both values,
|
||||||
|
values overrides, and actual template changes. Its functionality
|
||||||
|
extends beyond Helm, assisting in interacting with Kubernetes directly
|
||||||
|
to perform basic pre- and post-steps, such as removing completed or
|
||||||
|
failed jobs, running backup jobs, blocking on chart readiness, or
|
||||||
|
deleting resources that do not support upgrades. However, primarily,
|
||||||
|
it is an interface to support orchestrating Helm.
|
||||||
|
|
||||||
|
|vspace|
|
||||||
|
|
||||||
|
18. Armada effectively interacts with Tiller for installation (although it
|
||||||
|
may interact with k8s directly to poll, wait, remove jobs, and
|
||||||
|
otherwise help protect helm from failures). Tiller will then interact
|
||||||
|
with k8s to perform helm chart installations or upgrades.
|
||||||
|
|
||||||
|
Airship 1.0 Update Software Flow
|
||||||
|
################################
|
||||||
|
|
||||||
|
The Update Software flow (or “action” in Shipyard -- depicted with
|
||||||
|
green numbers in the image) is effectively a subset of the above flow.
|
||||||
|
It is used primarily to speed the process up by bypassing the Drydock
|
||||||
|
flow entirely. The reason for this is both speed as interacting with
|
||||||
|
MaaS is slow, as well as times where you want to avoid trying to
|
||||||
|
process hardware requests (e.g. waiting for Drydock to try and
|
||||||
|
provision a piece of failed hardware only to ultimately timeout some
|
||||||
|
time later before moving on to the next step because the deployment
|
||||||
|
strategy allows it).
|
||||||
|
|
||||||
|
Further Documentation
|
||||||
|
#####################
|
||||||
|
|
||||||
|
* https://airshipit.readthedocs.io/projects/shipyard/en/latest/
|
||||||
|
* https://airshipit.readthedocs.io/projects/pegleg/en/latest/
|
||||||
|
* https://airshipit.readthedocs.io/projects/armada/en/latest/
|
||||||
|
* https://airshipit.readthedocs.io/projects/promenade/en/latest/
|
||||||
|
* https://airshipit.readthedocs.io/projects/drydock/en/latest/
|
||||||
|
* https://airshipit.readthedocs.io/projects/deckhand/en/latest/
|
||||||
|
* https://airshipit.readthedocs.io/en/latest/
|
||||||
|
|
||||||
|
|
@ -75,6 +75,7 @@ developers.
|
|||||||
Seaworthy: Production-grade Airship <https://docs.airshipit.org/treasuremap/seaworthy.html>
|
Seaworthy: Production-grade Airship <https://docs.airshipit.org/treasuremap/seaworthy.html>
|
||||||
develop/airship1-developers.rst
|
develop/airship1-developers.rst
|
||||||
develop/conventions.rst
|
develop/conventions.rst
|
||||||
|
airship1/airship-1-flow.rst
|
||||||
|
|
||||||
Other Resources
|
Other Resources
|
||||||
---------------
|
---------------
|
||||||
|
Loading…
x
Reference in New Issue
Block a user