Add Airship 1 Flow Doc
This adds some Airship 1 documentation that Rodolfo authored to describe the end-to-end Airship 1 deployment flow. Change-Id: Ie1d2ecfa2cc3ba9883ffcfbc238b8f3499def9da
This commit is contained in:
parent
6ee51e9ca5
commit
ff0c344f2a
BIN
doc/source/airship1/airship-1-flow.png
Normal file
BIN
doc/source/airship1/airship-1-flow.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 153 KiB |
276
doc/source/airship1/airship-1-flow.rst
Normal file
276
doc/source/airship1/airship-1-flow.rst
Normal file
@ -0,0 +1,276 @@
|
||||
****************************
|
||||
Airship 1.0 Deployment Flows
|
||||
****************************
|
||||
|
||||
.. |vspace| raw:: latex
|
||||
|
||||
\vspace{5mm}
|
||||
|
||||
.. image:: airship-1-flow.png
|
||||
|
||||
Airship 1.0 Deploy and Update Site Flow
|
||||
#######################################
|
||||
|
||||
|
||||
1. Pegleg facilitates cloning the repositories necessary to interact with
|
||||
a site. Each site has a single site-definition.yaml which contains
|
||||
the repositories that “compose” that site. These may be global
|
||||
repositories, type level repositories (e.g. cruisers or cloud harbor),
|
||||
and finally site-level repositories. These may be entirely different
|
||||
repositories with different permissions. Pegleg facilitates cloning
|
||||
all of these at the correct revisions according to the definition for
|
||||
that site. Pegleg can be driven via a jenkins pipeline, which can be
|
||||
further abstracted in something like an NC3C dashboard, or it can be
|
||||
driven on the command line directly by imitating the behavior in the
|
||||
pipeline.
|
||||
|
||||
|vspace|
|
||||
|
||||
2. Pegleg wears several different hats. The CI/CD workflows leverage
|
||||
different pipelines in order to call upon these hats but under the
|
||||
hood, it’s really just different command line flags on the pegleg CLI
|
||||
command depending on what type of action is occurring. Pegleg can:
|
||||
|
||||
a. Generate (and re-generate/rotate) new secure secrets for a site
|
||||
according to each secret’s requirement (e.g. length, type, and so
|
||||
on). For instance: UUIDs, passwords, keys, and so on.
|
||||
|
||||
|vspace|
|
||||
|
||||
b. Encrypt secrets, and Decrypt secrets. When secrets are encrypted,
|
||||
they are wrapped in a YAML envelope containing metadata for each
|
||||
secret. This allows for understanding when secrets are going to
|
||||
expire, when they were last rotated, and so on. All deployment and
|
||||
update pipelines for instance would leverage the decrypt functionality
|
||||
in order to render the documents successfully.
|
||||
|
||||
|vspace|
|
||||
|
||||
c. Lint the YAML to ensure it is valid and meets certain basic syntax
|
||||
criteria and deckhand does not have an issue processing rules
|
||||
encountered. For instance, development gating pipelines that validate
|
||||
changes to YAML would invoke pegleg in this way.
|
||||
|
||||
|vspace|
|
||||
|
||||
d. Render will actually process the documents through the deckhand
|
||||
library, which will perform substitutions, pull in the secrets that
|
||||
are referenced from the configuration YAML so you can see the target
|
||||
document locally. This is effectively a very in-depth linting process
|
||||
and again would be used in development gates and potentially to
|
||||
fast-fail in deployment and update pipelines if there was an issue.
|
||||
|
||||
|vspace|
|
||||
|
||||
e. Collect will bundle up all the documents but not actually render them
|
||||
which is appropriate for deployment and update pipelines as it sends
|
||||
the documents through raw (but presumably with decrypted secrets)
|
||||
because each cloud site has its own deckhand instance running
|
||||
maintaining its own revision history capable of rendering the
|
||||
documents in-site. It is used in every deployment and update pipeline
|
||||
as the results of collect are what is sent to shipyard.
|
||||
|
||||
|vspace|
|
||||
|
||||
3. Once pegleg has decrypted the secrets in the document set within an
|
||||
ephemeral jenkins pipeline, pegleg collect is called to assemble them
|
||||
all, and finally that is piped to the shipyard client which will
|
||||
publish them via REST API to a Shipyard API service running within the
|
||||
site. There are two scenarios under which Shipyard may be running in
|
||||
the site.
|
||||
|
||||
a. On the genesis host, which is a single node running Kubernetes in a
|
||||
green-field site that will be expanded to a full cluster once more
|
||||
nodes are provisioned.
|
||||
|
||||
|vspace|
|
||||
|
||||
b. On the control plane of a greenfield site, receiving a site-update
|
||||
or expansion.
|
||||
|
||||
Simply put, the entire Shipyard workflow can be summarized as follows:
|
||||
|
||||
* Initial region/site data will be passed to Shipyard from either a
|
||||
human operator or Jenkins
|
||||
* The data (in YAML format) will be sent to Deckhand for validation and storage
|
||||
* Shipyard will make use of the post-processed data from DeckHand to
|
||||
interact with Drydock.
|
||||
* Drydock will interact with Promenade to provision and deploy bare metal
|
||||
nodes using Ubuntu MAAS and a resilient Kubernetes cluster will be created
|
||||
at the end of the process
|
||||
* Once the Kubernetes clusters are up and validated to be working properly,
|
||||
Shipyard will interact with Armada to deploy OpenStack using OpenStack Helm
|
||||
* Once the OpenStack cluster is deployed, Shipyard will trigger a workflow to
|
||||
perform basic sanity health checks on the cluster
|
||||
|
||||
|vspace|
|
||||
|
||||
4. Shipyard will do a number of pre-validations before delivering the
|
||||
document set to deckhand. Things such as a concurrency check, to
|
||||
ensure we don’t try to run updates in parallel unaware of each other.
|
||||
It will also run a number of fail-fast validation checks.
|
||||
|
||||
|vspace|
|
||||
|
||||
5. Shipyard will leverage the deckhand client library to deliver the
|
||||
documents to deckhand over its REST API, which will again validate
|
||||
them and render them (which again involves performing all layering,
|
||||
substitution, secret interpolation, and so on) and publishes a
|
||||
document revision, so that there is an on-site record of every change
|
||||
that has ever been requested. This document revision that is fully
|
||||
rendered will be available at a deckhand REST API URL that can be
|
||||
retrieved by various Airship sub-components.
|
||||
|
||||
|vspace|
|
||||
|
||||
6. Deckhand will store secrets within Barbican so that they are not
|
||||
stored in clear text within a database, and the rendered document set
|
||||
revision itself is stored directly in a database. Deckhand will
|
||||
change every secret to a Barbican reference which will be rendered
|
||||
on-demand by Deckhand whenever someone asks for that document revision
|
||||
through the API.
|
||||
|
||||
|vspace|
|
||||
|
||||
7. At this point, with the documents stored in Deckhand, Shipyard will
|
||||
perform another fail-fast step and ask each of the components
|
||||
highlighted in yellow to perform a dry-run no-op validation of the
|
||||
entire document set from their perspective. This means that Drydock
|
||||
for instance, would be validating and acknowledging it would not have
|
||||
any issue processing the document set it sees in Deckhand. This
|
||||
helps ensure we do not encounter updates that fail in the middle
|
||||
of the process. If a component is unhappy with the document set
|
||||
we want to know early and fail before making any changes.
|
||||
|
||||
|vspace|
|
||||
|
||||
8. Shipyard will now invoke Drydock to provision baremetal hosts that
|
||||
have not already been provisioned and continue to call back or poll
|
||||
for when Drydock has completed this process. Airship has a concept
|
||||
called deployment strategies because the hardware aspect of
|
||||
deployment is not guaranteed or reliable, and we don’t always want
|
||||
failures here to block every other process in the stack. In other
|
||||
words, our deployment strategies require that 100% of nodes marked
|
||||
as control plane nodes must be provisioned successfully to
|
||||
continue, but that a certain percentage of each rack of workers
|
||||
could fail and we can still continue past the hardware
|
||||
provisioning steps successfully. In other words, this is where we
|
||||
introduce a threshold of failure.
|
||||
|
||||
|vspace|
|
||||
|
||||
9. Shipyard will send Drydock the Deckhand URL to obtain the document set
|
||||
for itself for this update. Drydock will retrieve the entire document
|
||||
set from Deckhand but it will only process documents it cares
|
||||
about.
|
||||
|
||||
|vspace|
|
||||
|
||||
10. Drydock will process any Drydock/BootAction documents that have
|
||||
external references in them to render those upfront before writing an
|
||||
operating system to the physical host. Most importantly, this allows
|
||||
Promenade to construct a host-specific join script. In other words,
|
||||
Drydock calls out to the Promenade REST API to construct a join shell
|
||||
script for each host and this is driven by Drydock/Bootaction
|
||||
documents.
|
||||
|
||||
|vspace|
|
||||
|
||||
11. Drydock will orchestrate MaaS based on the document set. It does this
|
||||
through several internal tasks, prepare_site, prepare_nodes, and
|
||||
deploy_site. Within prepare_site, upfront orchestration of MaaS
|
||||
occurs setting non-host specific settings via the MaaS API, such as
|
||||
CIDRs, and VLANs. Within prepare_nodes, we identify hosts that
|
||||
haven’t already been provisioned and then power cycle hosts, wait for
|
||||
them to be discovered by MaaS, and then aligning and renaming them to
|
||||
hosts in our static inventory. Then the host configuration is
|
||||
orchestrated in MaaS so they have the proper networking and storage
|
||||
configuration as well as receive the correct static overlays, like
|
||||
Kubernetes join scripts, the correct Drivers, and so on, on
|
||||
first-boot. Finally within Drydock’s deploy_nodes task we orchestrate
|
||||
several MaaS flows to actually provision the nodes with an operating
|
||||
system where they execute any additional static scripts delivered on
|
||||
first-boot.
|
||||
|
||||
|vspace|
|
||||
|
||||
12. During the deploy_nodes phase of Drydock, MaaS is effectively writing
|
||||
an operating system to the baremetal nodes.
|
||||
|
||||
|vspace|
|
||||
|
||||
13. Driven by cloud-init on first boot post provision, the nodes will
|
||||
actually make a rest call back to the MaaS API to inform it that
|
||||
provisioning has completed and they have successfully booted up into
|
||||
functional networking and have booted up successfully. Drydock can
|
||||
use this status within MaaS to understand the nodes were provisioned
|
||||
successfully.
|
||||
|
||||
|vspace|
|
||||
|
||||
14. The nodes run the Promenade generated shell script to join them to
|
||||
Kubernetes. This host-specific script installs the appropriate
|
||||
dependencies and joins the node as a Kubernetes node, either as a
|
||||
worker, or as a control plane host depending on the hosts profile in
|
||||
the YAML inventory.
|
||||
|
||||
|vspace|
|
||||
|
||||
15. Shipyard has been polling Drydock for completion of processing the
|
||||
site update. Once the polling for Drydock provisioning completes,
|
||||
Shipyard will move on to performing a similar request to Armada.
|
||||
Armada is asked to update the site and given a Deckhand URL and
|
||||
revision to pull from.
|
||||
|
||||
|vspace|
|
||||
|
||||
16. Armada pulls the rendered document set from Deckhand.
|
||||
|
||||
|vspace|
|
||||
|
||||
17. Armada then proceeds to help orchestrate any helm installs or upgrades
|
||||
necessary in the site, and helps do this across a vast number of
|
||||
charts, their ordering, and dependencies. Armada also supports
|
||||
fetching Helm chart source and then building charts from source from
|
||||
various local and remote locations, such as Git endpoints, tarballs or
|
||||
local directories. It will also give the operator some indication of
|
||||
what is about to change by assisting with diffs for both values,
|
||||
values overrides, and actual template changes. Its functionality
|
||||
extends beyond Helm, assisting in interacting with Kubernetes directly
|
||||
to perform basic pre- and post-steps, such as removing completed or
|
||||
failed jobs, running backup jobs, blocking on chart readiness, or
|
||||
deleting resources that do not support upgrades. However, primarily,
|
||||
it is an interface to support orchestrating Helm.
|
||||
|
||||
|vspace|
|
||||
|
||||
18. Armada effectively interacts with Tiller for installation (although it
|
||||
may interact with k8s directly to poll, wait, remove jobs, and
|
||||
otherwise help protect helm from failures). Tiller will then interact
|
||||
with k8s to perform helm chart installations or upgrades.
|
||||
|
||||
Airship 1.0 Update Software Flow
|
||||
################################
|
||||
|
||||
The Update Software flow (or “action” in Shipyard -- depicted with
|
||||
green numbers in the image) is effectively a subset of the above flow.
|
||||
It is used primarily to speed the process up by bypassing the Drydock
|
||||
flow entirely. The reason for this is both speed as interacting with
|
||||
MaaS is slow, as well as times where you want to avoid trying to
|
||||
process hardware requests (e.g. waiting for Drydock to try and
|
||||
provision a piece of failed hardware only to ultimately timeout some
|
||||
time later before moving on to the next step because the deployment
|
||||
strategy allows it).
|
||||
|
||||
Further Documentation
|
||||
#####################
|
||||
|
||||
* https://airshipit.readthedocs.io/projects/shipyard/en/latest/
|
||||
* https://airshipit.readthedocs.io/projects/pegleg/en/latest/
|
||||
* https://airshipit.readthedocs.io/projects/armada/en/latest/
|
||||
* https://airshipit.readthedocs.io/projects/promenade/en/latest/
|
||||
* https://airshipit.readthedocs.io/projects/drydock/en/latest/
|
||||
* https://airshipit.readthedocs.io/projects/deckhand/en/latest/
|
||||
* https://airshipit.readthedocs.io/en/latest/
|
||||
|
||||
|
@ -75,6 +75,7 @@ developers.
|
||||
Seaworthy: Production-grade Airship <https://docs.airshipit.org/treasuremap/seaworthy.html>
|
||||
develop/airship1-developers.rst
|
||||
develop/conventions.rst
|
||||
airship1/airship-1-flow.rst
|
||||
|
||||
Other Resources
|
||||
---------------
|
||||
|
Loading…
Reference in New Issue
Block a user