6370 Commits

Author SHA1 Message Date
jinyuanliu
3ccb176f13 ADD venus for kolla-ansible
This project [1] can provide a one-stop solution to log collection,
cleaning, indexing, analysis, alarm, visualization, report generation
and other needs, which involves helping operator or maintainer to
quickly solve retrieve problems, grasp the operational health of the
platform, and improve the level of platform management.

[1] https://wiki.openstack.org/wiki/Venus

Change-Id: If3562bbed6181002b76831bab54f863041c5a885
2022-03-17 20:35:08 +08:00
Zuul
668fecf397 Merge "Adds etcd endpoints as a Prometheus scrape target" 2022-03-16 17:55:00 +00:00
Mark Goddard
d2d4b53d47 libvirt: support SASL authentication
In Kolla Ansible OpenStack deployments, by default, libvirt is
configured to allow read-write access via an unauthenticated,
unencrypted TCP connection, using the internal API network.  This is to
facilitate migration between hosts.

By default, Kolla Ansible does not use encryption for services on the
internal network (and did not support it until Ussuri). However, most
other services on the internal network are at least authenticated
(usually via passwords), ensuring that they cannot be used by anyone
with access to the network, unless they have credentials.

The main issue here is the lack of authentication. Any client with
access to the internal network is able to connect to the libvirt TCP
port and make arbitrary changes to the hypervisor. This could include
starting a VM, modifying an existing VM, etc. Given the flexibility of
the domain options, it could be seen as equivalent to having root access
to the hypervisor.

Kolla Ansible supports libvirt TLS [1] since the Train release, using
client and server certificates for mutual authentication and encryption.
However, this feature is not enabled by default, and requires
certificates to be generated for each compute host.

This change adds support for libvirt SASL authentication, and enables it
by default. This provides base level of security. Deployments requiring
further security should use libvirt TLS.

[1] https://docs.openstack.org/kolla-ansible/latest/reference/compute/libvirt-guide.html#libvirt-tls

Depends-On: https://review.opendev.org/c/openstack/kolla/+/833021
Closes-Bug: #1964013
Change-Id: Ia91ceeb609e4cdb144433122b443028c0278b71e
2022-03-10 16:57:16 +00:00
Zuul
f26b9cd8ad Merge "Fix prechecks for "Ironic iPXE" container" 2022-03-10 09:38:57 +00:00
Zuul
da476a7fea Merge "Explicitly unset net.ipv4.ip_forward sysctl" 2022-03-09 15:40:32 +00:00
Zuul
801c2d8fc2 Merge "[TrivialFix] Remove old comment" 2022-03-08 18:12:42 +00:00
Zuul
02a3cbcde3 Merge "Make cron logfile minsize,maxsize configurable" 2022-03-08 16:33:27 +00:00
Nathan Taylor
0f2794a075 Adds etcd endpoints as a Prometheus scrape target
Add "enable_prometheus_etcd_integration" configuration parameter which
can be used to configure Prometheus to scrape etcd metrics endpoints.
The default value of "enable_prometheus_etcd_integration" is set to
the combined values of "enable_prometheus" and "enable_etcd".

Change-Id: I7a0b802c5687e2d508e06baf55e355d9761e806f
2022-03-08 08:42:19 -07:00
Mark Goddard
caf33be54b Explicitly unset net.ipv4.ip_forward sysctl
While I8bb398e299aa68147004723a18d3a1ec459011e5 stopped setting
the net.ipv4.ip_forward sysctl, this change explicitly removes the
option from the Kolla sysctl config file. In the absence of another
source for this sysctl, it should revert to the default of 0 after the
next reboot.

A deployer looking to more aggressively change the value may set
neutron_l3_agent_host_ipv4_ip_forward to 0. Any deployments still
relying on the previous value may set
neutron_l3_agent_host_ipv4_ip_forward to 1.

Related-Bug: #1945453

Change-Id: I9b39307ad8d6c51e215fe3d3bc56aab998d218ec
2022-03-07 17:31:46 +00:00
Radosław Piliszek
833c45ea82 [TrivialFix] Remove old comment
Ironic is dropping default_boot_option and the new default has
been around for quite a while now so let's remove this old
scary comment.

Change-Id: I80d645cb97251ac63e04d7ec1c87d4600d17d4ee
2022-03-04 21:22:48 +01:00
Radosław Piliszek
19c5f2f033 Fix prechecks for "Ironic iPXE" container
Since I30c2ad2bf2957ac544942aefae8898cdc8a61ec6 this container
is always enabled and thus the port should always be checked.

Change-Id: I94a70d89123611899872061bd69593280d0a68c4
2022-03-04 18:50:11 +01:00
Radosław Piliszek
87f7586340 Ironic: Avoid setting deprecated pxe_append_params
Set kernel_append_params instead.

Change-Id: I4fb42d376636dc363cd86950ed37de4a3d28df73
2022-03-04 18:11:43 +01:00
Zuul
44517dd7b7 Merge "Add Rocky Linux support as Host OS" 2022-03-03 15:45:35 +00:00
Zuul
5dc04b9f47 Merge "rabbitmq: add node parameter in rabbitmq_user call" 2022-03-03 12:47:03 +00:00
Michal Nasiadka
7080ccfc3d Add Rocky Linux support as Host OS
Depends-On: https://review.opendev.org/c/openstack/ansible-collection-kolla/+/831642
Change-Id: I70dcd2d0cade52a23b3e219b7e0aaa31193ec938
2022-03-03 09:59:16 +00:00
IDerr
38729dc39c rabbitmq: add node parameter in rabbitmq_user call
Change-Id: I4cf48620f03d67ea4a9ef327afbf3b1ebe28550b
Closes-Bug: #1946506
2022-03-02 12:57:42 +00:00
Zuul
30e0c01413 Merge "Remove grafana [session] configuration" 2022-02-28 13:42:20 +00:00
Zuul
09db789a65 Merge "Fix hard coded OIDC response type" 2022-02-28 13:42:17 +00:00
Zuul
5e58d6d502 Merge "Add openvswitch and prometheus to logrotate" 2022-02-24 10:37:34 +00:00
Juan Pablo Suazo
80ee3f2e5c Add openvswitch and prometheus to logrotate
Closes-Bug: #1961795

Change-Id: I5547cce5c389846ed216bb898b78e45b8f231e1e
2022-02-24 08:03:17 +00:00
Zuul
6e267aed1d Merge "Remove classic queue mirroring for internal RabbitMQ" 2022-02-23 11:43:26 +00:00
Piotr Parczewski
d32197271f Fix hard coded OIDC response type
Closes-bug: 1959781
Change-Id: If574d2242aa6a875dcf624d95495e6cec6fefddd
2022-02-23 10:57:33 +01:00
Mark Goddard
a6768dd33b Fix location of release note for ironic-neutron-agent healthcheck
TrivialFix

Change-Id: Id85a5d69e1222b616705e24885252425c92af527
2022-02-22 12:12:00 +00:00
Pierre Riteau
f37562827d Remove grafana [session] configuration
These configuration settings were removed in Grafana 6.2. Instead we can
use [remote_cache], but it is not required since it will use database
settings by default.

Change-Id: I37966027aea9039b2ecba4214444507e9d87f513
2022-02-22 10:26:37 +01:00
Zuul
d25d490e4d Merge "cloudkitty: fix URL used for Prometheus collector" 2022-02-22 09:07:51 +00:00
Zuul
8ff7b51fef Merge "Install openstack.kolla collection" 2022-02-21 21:51:10 +00:00
Zuul
0ed32c82b9 Merge "ironic: sync default inspection UEFI iPXE bootloader with Ironic" 2022-02-21 21:35:58 +00:00
Zuul
63706667e1 Merge "Add support for deploying Prometheus libvirt exporter" 2022-02-21 21:35:55 +00:00
Doug Szumski
6bfe1927f0 Remove classic queue mirroring for internal RabbitMQ
When OpenStack is deployed with Kolla-Ansible, by default there
are no durable queues or exchanges created by the OpenStack
services in RabbitMQ. In Rabbit terminology, not being durable
is referred to as `transient`, and this means that the queue
is generally held in memory.

Whether OpenStack services create durable or transient queues is
traditionally controlled by the Oslo Notification config option:
`amqp_durable_queues`. In Kolla-Ansible, this remains set to
the default of `False` in all services. The only `durable`
objects are the `amq*` exchanges which are internal to RabbitMQ.

More recently, Oslo Notification has introduced support for
Quorum queues [7]. These are a successor to durable classic
queues, however it isn't yet clear if they are a good fit for
OpenStack in general [8].

For clustered RabbitMQ deployments, Kolla-Ansible configures all
queues as `replicated` [1]. Replication occurs over all nodes
in the cluster. RabbitMQ refers to this as 'mirroring of classic
queues'.

In summary, this means that a multi-node Kolla-Ansible deployment
will end up with a large number of transient, mirrored queues
and exchanges. However, the RabbitMQ documentation warns against
this, stating that 'For replicated queues, the only reasonable
option is to use durable queues: [2]`. This is discussed
further in the following bug report: [3].

Whilst we could try enabling the `amqp_durable_queues` option
for each service (this is suggested in [4]), there are
a number of complexities with this approach, not limited to:

1) RabbitMQ is planning to remove classic queue mirroring in
   favor of 'Quorum queues' in a forthcoming release [5].
2) Durable queues will be written to disk, which may cause
   performance problems at scale. Note that this includes
   Quorum queues which are always durable.
3) Potential for race conditions and other complexity
   discussed recently on the mailing list under:
   `[ops] [kolla] RabbitMQ High Availability`

The remaining option, proposed here, is to use classic
non-mirrored queues everywhere, and rely on services to recover
if the node hosting a queue or exchange they are using fails.
There is some discussion of this approach in [6]. The downside
of potential message loss needs to be weighed against the real
upsides of increasing the performance of RabbitMQ, and moving
to a configuration which is officially supported and hopefully
more stable. In the future, we can then consider promoting
specific queues to quorum queues, in cases where message loss
can result in failure states which are hard to recover from.

[1] https://www.rabbitmq.com/ha.html
[2] https://www.rabbitmq.com/queues.html
[3] https://github.com/rabbitmq/rabbitmq-server/issues/2045
[4] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit
[5] https://blog.rabbitmq.com/posts/2021/08/4.0-deprecation-announcements/
[6] https://fuel-ccp.readthedocs.io/en/latest/design/ref_arch_1000_nodes.html#replication
[7] https://bugs.launchpad.net/oslo.messaging/+bug/1942933
[8] https://www.rabbitmq.com/quorum-queues.html#use-cases

Partial-Bug: #1954925
Change-Id: I91d0e23b22319cf3fdb7603f5401d24e3b76a56e
2022-02-21 18:54:04 +00:00
Pierre Riteau
b36c91b684 cloudkitty: fix URL used for Prometheus collector
The Prometheus HTTP API is reachable under /api/v1. Without this fix,
CloudKitty receives 404 errors from Prometheus.

Change-Id: Ie872da5ccddbcb8028b8b57022e2427372ed474e
2022-02-21 18:06:13 +01:00
Mark Goddard
f63f1f3082 Install openstack.kolla collection
This change adds an Ansible Galaxy requirements file including the
openstack.kolla collection. A new 'kolla-ansible install-deps' command
is provided to install the requirements.

With the new collection in place, this change also switches to using the
baremetal role from the openstack.kolla collection, and removes the
baremetal role from this repository.

Depends-On: https://review.opendev.org/c/openstack/ansible-collection-kolla/+/820168

Change-Id: I9708f57b4bb9d64eb4903c253684fe0d9147bd4a
2022-02-21 14:26:48 +00:00
Zuul
83fa907961 Merge "Add support for VMware First Class Disk (FCD)" 2022-02-21 11:07:00 +00:00
Pierre Riteau
b210dcd6e2 Configure node-exporter to report correct file system metrics
Without this configuration, all mount points are reporting the same
utilisation metrics [1]. With the rslave option, all root mounts from
the host are visible in the container, so we can remove the bind mounts
for /proc and /sys.

[1] https://github.com/prometheus/node_exporter#docker

Change-Id: I4087dc81f9d1fa5daa24b9df6daf1f9e1ccd702f
Closes-Bug: #1961438
2022-02-18 18:36:22 +01:00
Zuul
b668e27356 Merge "Add support for VMware NSXP" 2022-02-18 12:04:41 +00:00
alecorps
812e03f75e Add support for VMware First Class Disk (FCD)
An FCD, also known as an Improved Virtual Disk (IVD) or
Managed Virtual Disk, is a named virtual disk independent of
a virtual machine. Using FCDs for Cinder volumes eliminates
the need for shadow virtual machines.
This patch adds Kolla support.

Change-Id: Ic0b66269e6d32762e786c95cf6da78cb201d2765
2022-02-18 11:15:14 +00:00
Pierre Riteau
dcba829792 Allow to define extra parameters for Prometheus exporters
The following variables are added:

* prometheus_blackbox_exporter_cmdline_extras
* prometheus_elasticsearch_exporter_cmdline_extras
* prometheus_haproxy_exporter_cmdline_extras
* prometheus_memcached_exporter_cmdline_extras
* prometheus_mysqld_exporter_cmdline_extras
* prometheus_node_exporter_cmdline_extras
* prometheus_openstack_exporter_cmdline_extras

Change-Id: I5da2031b9367115384045775c515628e2acb1aa4
2022-02-18 10:12:22 +01:00
Alban Lecorps
458c8b13df Add support for VMware NSXP
NSXP is the OpenStack support for the NSX Policy platform.
This is supported from neutron in the Stein version. This patch
adds Kolla support

This adds a new neutron_plugin_agent type 'vmware_nsxp'. The plugin
does not run any neutron agents.

Change-Id: I9e9d8f07e586bdc143d293e572031368af7f3fca
2022-02-17 08:59:14 +00:00
Zuul
df4b46a31b Merge "Fix fluentd v1 buffer syntax issue" 2022-02-16 20:55:37 +00:00
Zuul
c37c9b59b4 Merge "Refactor fluentd syslog logging" 2022-02-16 20:55:11 +00:00
Zuul
facd64ef26 Merge "[haproxy] optionally set socket to allow admin commands" 2022-02-16 00:33:06 +00:00
Michal Nasiadka
fcdba9e850 CI: Fix new ansible-lint failures
Change-Id: I27b0e42fba93a35c6d878d108bf1e7fdebc9e3db
2022-02-15 07:42:53 +00:00
Will Szumski
033db44f1c Adds prometheus_scrape_interval
Grafana requires the scrape interval to be set to be able to compute
$__rate_interval. The default is 15s which does not match the kolla
default of 60s. The symptom of not setting this is that you will see
"no data" when zooming graphs that use rate queries. This occurs as the
interval will be set to a period shorter than the scrape interval.
The recommendation is that you use a common scrape interval for all
jobs. See:

- https://grafana.com/blog/2020/09/28/new-in-grafana-7.2-__rate_interval-for-prometheus-rate-queries-that-just-work/
- https://stackoverflow.com/questions/66369969/set-scrape-interval-in-provisioned-prometheus-data-source-in-grafana

Change-Id: I7e5c1e20c7b66b64cbd333f669ef8d8da60daaa8
2022-02-14 11:10:44 +00:00
Isaac Prior
b3e2fcc793 Fix fluentd v1 buffer syntax issue
Change-Id: I5b3ab3ab8153cda283dec772bf1393af0caf4137
Closes-Bug: 1919179
2022-02-11 11:33:38 +00:00
Michal Nasiadka
b97832dd4f Refactor fluentd syslog logging
Co-Authored-By: Mark Goddard <mark@stackhpc.com>

Change-Id: I75ca59d981bcd2dd51faa296ab0b4223a891f5cb
2022-02-11 11:33:38 +00:00
Pierre Riteau
50edb94ded neutron: fix placement endpoint type configuration
Change-Id: I3362bd283eb7fb80f5da70f2a388f89f220617ea
Closes-Bug: #1960503
2022-02-10 13:14:32 +01:00
Mark Goddard
556d979930 ironic: sync default inspection UEFI iPXE bootloader with Ironic
The bootloader used to boot Ironic nodes in UEFI boot mode during
inspection when iPXE is enabled has been changed from ipxe.efi to
snponly.efi. This is in line with the default UEFI iPXE bootloader used
in Ironic since the Xena release. The bootloader may be changed via
ironic_dnsmasq_uefi_ipxe_boot_file.

Note that snponly.efi was not available via in the ironic-pxe image
prior to I79e78dca550262fc86b092a036f9ea96b214ab48.

Related-Bug: #1959203

Change-Id: I879db340769cc1b076e77313dff15876e27fcac4
2022-02-10 11:46:54 +00:00
Zuul
9fcbbfad75 Merge "Fix Apparmor libvirt profile removal" 2022-02-10 10:36:06 +00:00
Imran Hussain
f4bfab57bd [haproxy] optionally set socket to allow admin commands
Allow operators to set haproxy socket to admin level.
This is done via the flag haproxy_socket_level_admin which
is set to "no" by default.

Closes-Bug: 1960215

Signed-off-by: Imran Hussain <ih@imranh.co.uk>
Change-Id: Ia0da89288d68f5803ace1934c013053f12343195
2022-02-09 17:21:18 +00:00
Zuul
54e543ac34 Merge "octavia: drop warning about certificate changes" 2022-02-09 07:45:40 +00:00
Zuul
211c34b40e Merge "Glance: add lock_path setting" 2022-02-08 17:24:15 +00:00