324 lines
15 KiB
ReStructuredText
324 lines
15 KiB
ReStructuredText
.. _az-environment-config:
|
|
|
|
=============================================
|
|
Example of multi-AZ environment configuration
|
|
=============================================
|
|
|
|
On this page, we will provide an example configuration that can be used in
|
|
production environments with multiple Availability Zones.
|
|
|
|
It will be an extended and more specific version of :ref:`pod-environment-config`
|
|
so it is expected that you are aware of the concepts and approaches defined there.
|
|
|
|
To better understand why some configuration options were applied in examples
|
|
it is also recommended to look through :ref:`configuring-inventory`
|
|
|
|
Generic design
|
|
==============
|
|
|
|
The following design decisions were made in the example below:
|
|
|
|
* Three Availability Zones (AZs)
|
|
* Three infrastructure (control plane) hosts, each host is placed in a
|
|
different Availability Zone
|
|
* Eight compute hosts, 2 compute hosts in each Availability Zone. First
|
|
Availability Zone has two extra compute hosts for pinned CPU aggregate.
|
|
* Three Ceph storage clusters provisioned with Ceph Ansible.
|
|
* Compute hosts act as OVN gateway hosts
|
|
* Tunnel networks which are reachable between Availability Zones
|
|
* Public API, OpenStack external and management networks are represented as
|
|
stretched L2 networks between Availability Zones.
|
|
|
|
.. image:: ../figures/az-layout-general.drawio.png
|
|
:width: 100%
|
|
|
|
Load Balancing
|
|
==============
|
|
|
|
A Load Balancer (HAProxy) is usually deployed on infrastructure hosts. With
|
|
infrastructure hosts being spread across Availability Zones we need to
|
|
come up with a more complex design which is aimed at solving the following issues:
|
|
|
|
* Withstand a single Availability Zone failure
|
|
* Reduce amount of cross-AZ traffic
|
|
* Spread load across Availability Zones
|
|
|
|
To address these challenges, the following changes to the basic design were made:
|
|
|
|
* Leverage DNS Round Robin (an A/AAAA record per AZ) for Public API
|
|
* Define Internal API FQDN through /etc/hosts overrides, which are unique per
|
|
Availability Zone
|
|
* Define 6 keepalived instances: 3 for public and 3 for internal VIPs
|
|
* Ensure HAProxy to prioritize a backend from own Availability Zone over
|
|
"remote" ones
|
|
|
|
The example also deploys HAProxy with Keepalived in their own LXC
|
|
containers on the contrary to a more conventional bare metal deployment.
|
|
You can check a :ref:`haproxy-in-lxc` for more details on how to do that.
|
|
|
|
.. image:: ../figures/az-balancing-diagram.drawio.png
|
|
:width: 100%
|
|
|
|
|
|
Storage design complications
|
|
============================
|
|
|
|
There are multiple complications related to organizing storage where the storage is not stretched between Availability Zones.
|
|
|
|
First, there is only a single controller in any given Availability
|
|
Zone, while multiple copies of ``cinder_volume`` needs to be run for
|
|
each storage provider for High Availability. As ``cinder_volume`` needs
|
|
access to storage network, one of the best places for it are ``ceph-mon``
|
|
hosts.
|
|
|
|
Another challenge is to organize shared storage for Glance Images, as
|
|
``rbd`` can't be used consistently anymore. While Glance Interoperable Import
|
|
interface could be leveraged for syncing images between ``rbd`` backends, in
|
|
fact not all clients and services can work with Glances `import API <https://docs.openstack.org/api-ref/image/v2/#interoperable-image-import>`_.
|
|
One of the most obvious solutions here can be usage of Swift API, while
|
|
configuring Ceph RadosGW policy to replicate the bucket between independent
|
|
instances located in their Availability Zones.
|
|
|
|
Last, but not the least complication is Nova scheduling when ``cross_az_attach``
|
|
is disabled. As Nova will not add an Availability Zone to instances
|
|
``request_specs`` when an instance is created from a volume directly, on the
|
|
contrary to creating volume manually in advance and supplying volume UUID to
|
|
the instance create API call. The problem with that behavior, is that Nova
|
|
will attempt to Live Migrate or re-schedule instances without an Availability
|
|
Zone in ``request_specs`` to other AZs, which will result in failure, as
|
|
``cross_az_attach`` is disabled. You can read more about this in a
|
|
Nova `bug report <https://bugs.launchpad.net/nova/+bug/2047182>`_
|
|
In order to work around this Bug you need to set a ``default_schedule_zone``
|
|
for Nova and Cinder, which will ensure AZ always being defined in
|
|
``request_specs``. You can also go further and define an actual
|
|
Availability Zone as ``default_schedule_zone``, making each controller
|
|
to have its own default. As Load Balancer will attempt to send requests
|
|
only to "local" backends first, this approach does work to distribute
|
|
new VMs across all AZs when user does not supply AZ explicitly.
|
|
Otherwise, the "default" AZ will be accepting significantly more
|
|
new signups.
|
|
|
|
|
|
Configuration examples
|
|
======================
|
|
|
|
Network configuration
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Network CIDR/VLAN assignments
|
|
-----------------------------
|
|
|
|
The following CIDR assignments are used for this environment.
|
|
|
|
+-----------------------------+-----------------+------+
|
|
| Network | CIDR | VLAN |
|
|
+=============================+=================+======+
|
|
| Management Network | 172.29.236.0/22 | 10 |
|
|
+-----------------------------+-----------------+------+
|
|
| AZ1 Storage Network | 172.29.244.0/24 | 20 |
|
|
+-----------------------------+-----------------+------+
|
|
| AZ1 Tunnel (Geneve) Network | 172.29.240.0/24 | 30 |
|
|
+-----------------------------+-----------------+------+
|
|
| AZ2 Storage Network | 172.29.245.0/24 | 21 |
|
|
+-----------------------------+-----------------+------+
|
|
| AZ2 Tunnel (Geneve) Network | 172.29.241.0/24 | 31 |
|
|
+-----------------------------+-----------------+------+
|
|
| AZ3 Storage Network | 172.29.246.0/24 | 22 |
|
|
+-----------------------------+-----------------+------+
|
|
| AZ3 Tunnel (Geneve) Network | 172.29.242.0/24 | 32 |
|
|
+-----------------------------+-----------------+------+
|
|
| Public API VIPs | 203.0.113.0/28 | 400 |
|
|
+-----------------------------+-----------------+------+
|
|
|
|
IP assignments
|
|
--------------
|
|
|
|
The following host name and IP address assignments are used for this
|
|
environment.
|
|
|
|
+------------------+----------------+--------------------+----------------+
|
|
| Host name | Management IP | Tunnel (Geneve) IP | Storage IP |
|
|
+==================+================+====================+================+
|
|
+------------------+----------------+--------------------+----------------+
|
|
| infra1 | 172.29.236.11 | | |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| infra2 | 172.29.236.12 | | |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| infra3 | 172.29.236.13 | | |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az1_ceph1 | 172.29.237.201 | | 172.29.244.201 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az1_ceph2 | 172.29.237.202 | | 172.29.244.202 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az1_ceph3 | 172.29.237.203 | | 172.29.244.203 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az2_ceph1 | 172.29.238.201 | | 172.29.245.201 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az2_ceph2 | 172.29.238.202 | | 172.29.245.202 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az2_ceph3 | 172.29.238.203 | | 172.29.245.203 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az3_ceph1 | 172.29.239.201 | | 172.29.246.201 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az3_ceph2 | 172.29.239.202 | | 172.29.246.202 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az3_ceph3 | 172.29.239.203 | | 172.29.246.203 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az1_compute1 | 172.29.237.11 | 172.29.240.11 | 172.29.244.11 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az1_compute2 | 172.29.237.12 | 172.29.240.12 | 172.29.244.12 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az1_pin_compute1 | 172.29.237.13 | 172.29.240.13 | 172.29.244.13 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az1_pin_compute2 | 172.29.237.14 | 172.29.240.14 | 172.29.244.14 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az2_compute1 | 172.29.238.11 | 172.29.241.11 | 172.29.245.11 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az2_compute2 | 172.29.238.12 | 172.29.241.12 | 172.29.245.12 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az3_compute1 | 172.29.239.11 | 172.29.242.11 | 172.29.246.11 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
| az3_compute3 | 172.29.239.12 | 172.29.242.12 | 172.29.246.12 |
|
|
+------------------+----------------+--------------------+----------------+
|
|
|
|
Host network configuration
|
|
--------------------------
|
|
|
|
Each host does require the correct network bridges to be implemented. In
|
|
this example, we leverage the ``systemd_networkd`` role that performs configuration
|
|
for us during ``openstack_hosts`` execution. It creates all required vlans and
|
|
bridges. The only pre-requirement is to have a connection to the
|
|
host via SSH available for Ansible to manage the host.
|
|
|
|
.. note::
|
|
|
|
Example assumes that default gateway is set through ``bond0`` interface,
|
|
which aggregates ``eth0`` and ``eth1`` links.
|
|
If your environment does not have ``eth0``, but instead has ``p1p1`` or
|
|
some other interface name, ensure that references to ``eth0`` are replaced
|
|
with the appropriate name. The same applies to additional network interfaces
|
|
|
|
.. literalinclude:: ../../../../etc/openstack_deploy/user_networks.yml.az.example
|
|
|
|
Deployment configuration
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Environment customizations
|
|
--------------------------
|
|
|
|
Deployed files in ``/etc/openstack_deploy/env.d`` allow the
|
|
customization of Ansible groups.
|
|
|
|
To deploy HAProxy in container we need to create a file
|
|
``/etc/openstack_deploy/env.d/haproxy.yml`` with the following content:
|
|
|
|
.. literalinclude:: ../../../../etc/openstack_deploy/env.d/haproxy.yml.container.example
|
|
|
|
As we are using Ceph for this environment, so the ``cinder-volume`` runs in a container on the
|
|
Ceph Monitor hosts. To achieve this, implement
|
|
``/etc/openstack_deploy/env.d/cinder.yml`` with the following content:
|
|
|
|
.. literalinclude:: ../../../../etc/openstack_deploy/env.d/cinder-volume.yml.container.example
|
|
|
|
In order to be able to execute a playbook only against hosts in
|
|
a single Availability Zone, as well as be able to set AZ-specific
|
|
variables, we need to define groups definitions. For that, create
|
|
a file ``/etc/openstack_deploy/env.d/az.yml`` with the following content:
|
|
|
|
.. literalinclude:: ../../../../etc/openstack_deploy/env.d/az.yml.example
|
|
|
|
Above example will create following groups:
|
|
|
|
* ``azN_hosts`` which will contain only bare metal nodes
|
|
* ``azN_containers`` that will contain all containers that are spawned on
|
|
bare metal nodes, that are part of the pod.
|
|
* ``azN_all`` that will contain `azN_hosts` and `azN_containers` members
|
|
|
|
We also need to define a complete new set of groups for Ceph, to deploy multiple
|
|
independent instances of it.
|
|
|
|
For that, create a file ``/etc/openstack_deploy/env.d/ceph.yml`` with the
|
|
following content:
|
|
|
|
.. literalinclude:: ../../../../etc/openstack_deploy/env.d/ceph.yml.az.example
|
|
|
|
Environment layout
|
|
------------------
|
|
|
|
The ``/etc/openstack_deploy/openstack_user_config.yml`` file defines the
|
|
environment layout.
|
|
|
|
For each AZ, a group will need to be defined containing all hosts within that
|
|
AZ.
|
|
|
|
Within defined provider networks, ``address_prefix`` is used to override the
|
|
prefix of the key added to each host that contains IP address information.
|
|
We use AZ-specific prefixes for ``container``, ``tunnel``, or ``storage``.
|
|
``reference_group`` contains the name of a defined AZ group and is used to
|
|
limit the scope of each provider network to that group.
|
|
|
|
YAML Anchors and Aliases are used heavily in the example below to populate all
|
|
groups that might become handy while not repeating hosts definitions each time.
|
|
You can read more about the topic in
|
|
`Ansible Documentation <https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_advanced_syntax.html#yaml-anchors-and-aliases-sharing-variable-values>`_
|
|
|
|
The following configuration describes the layout for this environment.
|
|
|
|
.. literalinclude:: ../../../../etc/openstack_deploy/openstack_user_config.yml.az.example
|
|
|
|
|
|
User variables
|
|
--------------
|
|
|
|
In order to properly configure Availability Zones, we need to
|
|
leverage ``group_vars`` and define Availability Zone name used
|
|
for each AZ there. For this, create files:
|
|
|
|
* ``/etc/openstack_deploy/group_vars/az1_all.yml``
|
|
* ``/etc/openstack_deploy/group_vars/az2_all.yml``
|
|
* ``/etc/openstack_deploy/group_vars/az3_all.yml``
|
|
|
|
With content like below, where N should be AZ number depending on the file:
|
|
|
|
.. code:: yaml
|
|
|
|
az_name: azN
|
|
|
|
As for this environment, the load balancer is created in the LXC containers
|
|
on the infrastructure hosts, we need to ensure absence of the default route on
|
|
``eth0`` interface.
|
|
To prevent that from happening, we override ``lxc_container_networks`` in
|
|
``/etc/openstack_deploy/group_vars/haproxy/lxc_network.yml`` file:
|
|
|
|
.. literalinclude:: ../../../../etc/openstack_deploy/group_vars/haproxy/lxc_network.yml.example
|
|
|
|
Next, we want to secure HAProxy pointing always to the backend which is
|
|
considered as "local" to the HAProxy. For that we switch balancing algorithm
|
|
to ``first`` and order re-backends so that the one from current Availability Zone
|
|
appears to be the first in the list. This can be done by creating file
|
|
``/etc/openstack_deploy/group_vars/haproxy/backend_overrides.yml`` with content:
|
|
|
|
.. literalinclude:: ../../../../etc/openstack_deploy/group_vars/haproxy/first_backend.yml.az.example
|
|
|
|
We also need to define a couple of extra Keepalived instances in order to secure
|
|
DNS RR approach, along with configuring Keepalived in unicast mode.
|
|
For that create a file ``/etc/openstack_deploy/group_vars/haproxy/keepalived.yml``
|
|
with following content:
|
|
|
|
.. literalinclude:: ../../../../etc/openstack_deploy/group_vars/haproxy/keepalived.yml.az.example
|
|
|
|
In order to add support for multiple compute tiers (in with CPU overcommit
|
|
and pinned CPUs) you need to create a file
|
|
``/etc/openstack_deploy/group_vars/pinned_compute_hosts`` with content:
|
|
|
|
.. code:: yaml
|
|
|
|
nova_cpu_allocation_ratio: 1.0
|
|
nova_ram_allocation_ratio: 1.0
|
|
|
|
Rest of variables can be defined in ``/etc/openstack_deploy/user_variables.yml``
|
|
but a lot of them will be referencing ``az_name`` variable, so it's presence
|
|
(along with corresponding groups) are vital for this scenario.
|
|
|
|
.. literalinclude:: ../../../../etc/openstack_deploy/user_variables.yml.az.example
|