As part of the effort to implement Ansible code linting in CI
(using ansible-lint) - we need to implement recommendations from
ansible-lint output [1].
One of them is to stop using local_action in favor of delegate_to -
to increase readability and and match the style of typical ansible
tasks.
[1]: https://review.opendev.org/694779/
Partially implements: blueprint ansible-lint
Change-Id: I46c259ddad5a6aaf9c7301e6c44cd8a1d5c457d3
Introduce kolla_address filter.
Introduce put_address_in_context filter.
Add AF config to vars.
Address contexts:
- raw (default): <ADDR>
- memcache: inet6:[<ADDR>]
- url: [<ADDR>]
Other changes:
globals.yml - mention just IP in comment
prechecks/port_checks (api_intf) - kolla_address handles validation
3x interface conditional (swift configs: replication/storage)
2x interface variable definition with hostname
(haproxy listens; api intf)
1x interface variable definition with hostname with bifrost exclusion
(baremetal pre-install /etc/hosts; api intf)
neutron's ml2 'overlay_ip_version' set to 6 for IPv6 on tunnel network
basic multinode source CI job for IPv6
prechecks for rabbitmq and qdrouterd use proper NSS database now
MariaDB Galera Cluster WSREP SST mariabackup workaround
(socat and IPv6)
Ceph naming workaround in CI
TODO: probably needs documenting
RabbitMQ IPv6-only proto_dist
Ceph ms switch to IPv6 mode
Remove neutron-server ml2_type_vxlan/vxlan_group setting
as it is not used (let's avoid any confusion)
and could break setups without proper multicast routing
if it started working (also IPv4-only)
haproxy upgrade checks for slaves based on ipv6 addresses
TODO:
ovs-dpdk grabs ipv4 network address (w/ prefix len / submask)
not supported, invalid by default because neutron_external has no address
No idea whether ovs-dpdk works at all atm.
ml2 for xenapi
Xen is not supported too well.
This would require working with XenAPI facts.
rp_filter setting
This would require meddling with ip6tables (there is no sysctl param).
By default nothing is dropped.
Unlikely we really need it.
ironic dnsmasq is configured IPv4-only
dnsmasq needs DHCPv6 options and testing in vivo.
KNOWN ISSUES (beyond us):
One cannot use IPv6 address to reference the image for docker like we
currently do, see: https://github.com/moby/moby/issues/39033
(docker_registry; docker API 400 - invalid reference format)
workaround: use hostname/FQDN
RabbitMQ may fail to bind to IPv6 if hostname resolves also to IPv4.
This is due to old RabbitMQ versions available in images.
IPv4 is preferred by default and may fail in the IPv6-only scenario.
This should be no problem in real life as IPv6-only is indeed IPv6-only.
Also, when new RabbitMQ (3.7.16/3.8+) makes it into images, this will
no longer be relevant as we supply all the necessary config.
See: https://github.com/rabbitmq/rabbitmq-server/pull/1982
For reliable runs, at least Ansible 2.8 is required (2.8.5 confirmed
to work well). Older Ansible versions are known to miss IPv6 addresses
in interface facts. This may affect redeploys, reconfigures and
upgrades which run after VIP address is assigned.
See: https://github.com/ansible/ansible/issues/63227
Bifrost Train does not support IPv6 deployments.
See: https://storyboard.openstack.org/#!/story/2006689
Change-Id: Ia34e6916ea4f99e9522cd2ddde03a0a4776f7e2c
Implements: blueprint ipv6-control-plane
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Use upstream Ansible modules for registration of services, endpoints,
users, projects, roles, and role grants.
Change-Id: I7c9138d422cc91c177fd8992347176bb54156b5a
Sometimes mgr dashboard enablement fails with following message:
"Error ENOENT: all mgr daemons do not support module 'dashboard',
pass --force to force enablement"
Change-Id: Ie7052dbdccb855e02da849dbc207b5d1778e2c82
This commit adds the functionality for an operator to specify
their own trusted CA certificate file for interacting with the
Keystone API.
Implements: blueprint support-trusted-ca-certificate-file
Change-Id: I84f9897cc8e107658701fb309ec318c0f805883b
- add support for sha256 in bslurp module
- change sha1 to sha256 in ceph-mon ansible role
Depends-On: https://review.opendev.org/655623
Change-Id: I25e28d150f2a8d4a7f87bb119d9fb1c46cfe926f
Closes-Bug: #1826327
In the current deployment of ceph, the node name of osd and the name
of mon are both IP, and other daemons use hostname.
This commit adds support for naming mon and osd nodes using hostname,
and does not change the default ip-named way.
Change-Id: I22bef72dcd8fc8bcd391ae30e4643520250fd556
1) ceph-nfs (ganesha-ceph) - use NFSv4 only
This is recommended upstream.
v3 and UDP require portmapper (aka rpcbind) which we
do not want, except where Ubuntu ganesha version (2.6)
forces it by requiring enabled UDP, see [1].
The issue has been fixed in 2.8, included in CentOS.
Additionally disable v3 helper protocols and kerberos
to avoid meaningless warnings.
2) ceph-nfs (ganesha-ceph) - do not export host dbus
It is not in use. This avoids the temptation to try
handling it on host.
3) Properly handle ceph services deploy and upgrade
Upgrade runs deploy.
The order has been corrected - nfs goes after mds.
Additionally upgrade takes care of rgw for keystone
(for swift emulation).
4) Enhance ceph keyring module with error detection
Now it does not blindly try to create a keyring after
any failure. This used to hide real issue.
5) Retry ceph admin keyring update until cluster works
Reordering deployment caused issue with ceph cluster not being
fully operational before taking actions on it.
6) CI: Remove osd df from collected logs as it may hang CI
Hangs are caused by healthy MON and no healthy MGR.
A descriptive note is left in its place.
7) CI: Add 5s timeout to ceph informational commands
This decreases the timeout from the default 300s.
[1] https://review.opendev.org/669315
Change-Id: I1cf0ad10b80552f503898e723f0c4bd00a38f143
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.
This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.
I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.
[1] https://review.opendev.org/667363
Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
* Sometimes getting/creating ceph mds keyring fails, similar to https://tracker.ceph.com/issues/16255
Change-Id: I47587cbeb8be0e782c13ba7f40367409e2daa8a8
There are now several good tools for deploying Ceph, including Ceph
Ansible and ceph-deploy. Maintaining our own Ceph deployment is a
significant maintenance burden, and we should focus on our core mission
to deploy OpenStack. Given that this is a significant part of kolla
ansible currently we will need a long deprecation period and a migration
path to another tool.
Change-Id: Ic603c85c04d8794580a19f9efaa7a8589565f4f6
Partially-Implements: blueprint remove-ceph
Since we have different upgrade paths, we must use the actually
installed Ceph release name when doing require-osd-release
Closes-Bug: #1832989
Change-Id: I6aaa4b4ac0fb739f7ad885c13f55b6db969996a2
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Many tasks that use Docker have become specified already, but
not all. This change ensures all tasks that use the following
modules have become:
* kolla_docker
* kolla_ceph_keyring
* kolla_toolbox
* kolla_container_facts
It also adds become for 'command' tasks that use docker CLI.
Change-Id: I4a5ebcedaccb9261dbc958ec67e8077d7980e496
Several config file permissions are incorrect on the host. In general,
files should be 0660, and directories and executables 0770.
Change-Id: Id276ac1864f280554e98b937f2845bb424d521de
Closes-Bug: #1821579
We're duplicating code to build the keystone URLs in nearly every
config, where we've already done it in group_vars. Replace the
redundancy with a variable that does the same thing.
Change-Id: I207d77870e2535c1cdcbc5eaf704f0448ac85a7a
when ceph_mon and ceph_osd start failed, add debug option will
print more info. now when ceph_mon and ceph_osd containers start
failed, docker logs ceph_mon print none log
Closes-Bug: #1815707
Change-Id: I3c5086019808a9738714f5279ec74cbb9b7a8587
when enable ceph_nfs,it deploy failed, because no ganesha config
file, and the 'ganesha.nfs' command need root privilege to run.
i will modify ceph_nfs dockerfile,please review. thanks
https://review.openstack.org/#/c/630510/
Change-Id: I347107bc33733061ad043bffe38ecc1d16770afc
Closes-Bug: #1811581
With this change, an operator may be able to stop a
service container without stopping all services in a host.
This change is the starting point to start
fast-forward upgrades support.
In next changes new flags will be introducced to disable
stop dataplane services during upgrades.
Change-Id: Ifde7a39d7d8596ef0d7405ecf1ac1d49a459d9ef
Implements: blueprint support-stop-containers
The buggy come from ceph changes[0], which is included since ceph osd
v11.0.0. The `osd crush update on start` logical is moved from
`ceph-osd-prestart.sh` to ceph-osd startup process. So ceph-osd will
create buckets by node hostname automatically. Whereas, kolla is
creating buckets by node ip
For the less confused and ceph upgrade impact, disable `osd crush update
on start` is a better choice
[0] a28b71e3c9
Change-Id: Ibbeac9505c9957319126267dbe6bd7a2cac11f0c
Closes-Bug: #1801662
Having all services in one giant haproxy file makes altering
configuration for a service both painful and dangerous. Each service
should be configured with a simple set of variables and rendered with a
single unified template.
Available are two new templates:
* haproxy_single_service_listen.cfg.j2: close to the original style, but
only one service per file
* haproxy_single_service_split.cfg.j2: using the newer haproxy syntax
for separated frontend and backend
For now the default will be the single listen block, for ease of
transition.
Change-Id: I6e237438fbc0aa3c89a3c8bd706a53b74e71904b
The current bluestore disk label naming is inconsistent with the
filestore. The filestore naming format is that the disk prefixes
belonging to the same osd are the same and the suffixes are
different.
This patch keeps the bluestore's disk naming as well.
Change-Id: I71dda29fc4a6765300ce7bb173d2c448c24f6eca
ResellerAdmin role is used to give users object storage administration role
in their projects.
It is required to pass object storage quotas tests[1] of DefCore (OpenStack
Powered) certification test suite.
[1] tempest.api.object_storage.test_account_quotas*
Related-Bug: #1700729
Change-Id: Id976827aa7da271e54b77476f175f06bd1a00cc8
Currently test_list_containers tempest tests[1] would be failed.
It is becuase accept-ranges header does not exist. See ceph bug[2].
Rgw_swift_enforce_content_length assures Content-Length and
Accept-Ranges in dynamically generated account & container listings.
[1] tempest.api.object_storage.test_account_services.AccountTest.test_list_containers
[2] http://tracker.ceph.com/issues/21554
Related-Bug: #1783456
Change-Id: I9b5fcc361f0bc0e521302d2df1974aabf6f4a7e7
Object versioning test[1] is required for RefStack test suite.
Swift has enabled it by default[2].
It is also needed for ceph-rgw.
[1]
tempest.api.object_storage.test_object_version.ContainerTest.test_versioned_container
[2] https://review.openstack.org/#/c/517281/
Related-Bug: #1729583
Change-Id: If89636f77d87bab75e8e7bcf16cc784e83184bc6
By default ceph-rgw is not completely comaptible with Swift API,
because of the restriction for Swift INFO API.[0]
The patch improve ceph-rgw compatibility with Swift API. It is
controlled by the option "ceph_rgw_compatibility" in
ansible/group_vars/all.yml.
After changing the option, run the "reconfigure" command to enable.
Closes-Bug: #1783456
[0] https://github.com/ceph/ceph/pull/17967
Change-Id: Ibf3eb52280e197965caef08a44ae226c4f884cb5
Signed-off-by: tone.zhang <tone.zhang@arm.com>
This patch enables the ceph mgr prometheus exporter.
If enable_prometheus_ceph_mgr_exporter is set to true,
the ceph mgr prometheus plugin is enabled on the hosts that are part
of the ceph-mgr group, then the exporter is added into the prometheus-server
configuration file.
Change-Id: Ia2f879401e585e6043f69cc5e3ab1a1f72f7f033
Support Kolla Ceph to deploy blustore OSDs with Kolla-ansible.
Please refer to [1] for bluestore OSD configuration
The patch includes:
1. Set Ceph OSD store type group_vars/all.yml. The default value
is "bluestore" in Rocky.
2. Make Kolla Ceph to deploy bluestore OSDs with Kolla-ansible
3. Update gate test configuration for Ceph bluestore OSD test
[1]: specs/kolla-ceph-bluestore.rst
Partially-Implements: blueprint kolla-ceph-bluestore
Depends-On: I00eaa600a5e9ad4c1ebca2eeb523bca3d7a25128
Change-Id: I14f20a00654dff32c36d078ebb9005d91a3e60b2
Signed-off-by: Tone Zhang <tone.zhang@arm.com>
Add become to all tasks that use the module "kolla_docker"
Change-Id: I4309c4011687b88ec31d739fd8f834fe2326ff10
Partial-Implements: blueprint ansible-specific-task-become
Patch [0] left 2 variables for authentication one is
openstack_swift_auth and the other (inexistent) openstack_ceph_rgw_auth
for the ceph_rgw start_keystone task.
This patch leaves only openstack_ceph_rgw_auth.
Closes-Bug: #1769463
[0] 84ade4e149
Change-Id: I1cc522d91f8258f4ca23afc10a0a2a2b35c1ff68
Signed-off-by: Jorge Niedbalski <jorge.niedbalski@linaro.org>
- rename action and serial to kolla_ansible and kolla_serial
- use become instead of "sudo <command>" in shell
- Remove quota for failed_when and changed_when in rabbitmq tasks
Change-Id: I78cb60168aaa40bb6439198283546b7faf33917c
Implements: blueprint migrate-to-ansible-2-2-0