Move documentation from osh-infra

This is to prepare for the upcoming merger
with the openstack-helm-infra repo.

The doc files are copied from the openstack-helm-infra
repo w/o the commit history. The list of authors
is attached to the commit.

Co-Authored-By: Al Lau <al070e@att.com>
Co-Authored-By: Andreas Jaeger <aj@suse.com>
Co-Authored-By: astebenkova <astebenkova@mirantis.com>
Co-Authored-By: Chinasubbareddy Mallavarapu <cr3938@att.com>
Co-Authored-By: Gage Hugo <gagehugo@gmail.com>
Co-Authored-By: jinyuanliu <liujinyuan@inspur.com>
Co-Authored-By: John Haan <sjin.han@sk.com>
Co-Authored-By: Leontii Istomin <listomin@mirantis.com>
Co-Authored-By: lijunjie <lijunjie@cloudin.cn>
Co-Authored-By: Matthew Heler <matthew.heler@hotmail.com>
Co-Authored-By: Parsons, Cliff (cp769u) <cp769u@att.com>
Co-Authored-By: pengyuesheng <pengyuesheng@gohighsec.com>
Co-Authored-By: Q.hongtao <qihongtao@inspur.com>
Co-Authored-By: Roman Gorshunov <roman.gorshunov@att.com>
Co-Authored-By: Stephen Taylor <stephen.taylor.1@att.com>
Co-Authored-By: Steven Fitzpatrick <steven.fitzpatrick@att.com>
Co-Authored-By: Steve Wilkerson <sw5822@att.com>
Co-Authored-By: Steve Wilkerson <wilkers.steve@gmail.com>
Co-Authored-By: sunxifa <sunxifa@inspur.com>
Co-Authored-By: Tin Lam <tin@irrational.io>
Co-Authored-By: Tin Lam <t@lam.wtf>
Co-Authored-By: wangjiaqi07 <wangjiaqi07@inspur.com>
Change-Id: I6a4166f5d4d69279ebd56c66f74e2cbc8cbd17dd
This commit is contained in:
Vladimir Kozhukalov 2025-03-13 17:50:12 -05:00
parent d0b6b667de
commit 995d995624
15 changed files with 2892 additions and 0 deletions

@ -11,6 +11,9 @@ Contents:
chart/index
devref/index
testing/index
monitoring/index
logging/index
upgrade/index
troubleshooting/index
specs/index

@ -0,0 +1,196 @@
Elasticsearch
=============
The Elasticsearch chart in openstack-helm-infra provides a distributed data
store to index and analyze logs generated from the OpenStack-Helm services.
The chart contains templates for:
- Elasticsearch client nodes
- Elasticsearch data nodes
- Elasticsearch master nodes
- An Elasticsearch exporter for providing cluster metrics to Prometheus
- A cronjob for Elastic Curator to manage data indices
Authentication
--------------
The Elasticsearch deployment includes a sidecar container that runs an Apache
reverse proxy to add authentication capabilities for Elasticsearch. The
username and password are configured under the Elasticsearch entry in the
endpoints section of the chart's values.yaml.
The configuration for Apache can be found under the conf.httpd key, and uses a
helm-toolkit function that allows for including gotpl entries in the template
directly. This allows the use of other templates, like the endpoint lookup
function templates, directly in the configuration for Apache.
Elasticsearch Service Configuration
-----------------------------------
The Elasticsearch service configuration file can be modified with a combination
of pod environment variables and entries in the values.yaml file. Elasticsearch
does not require much configuration out of the box, and the default values for
these configuration settings are meant to provide a highly available cluster by
default.
The vital entries in this configuration file are:
- path.data: The path at which to store the indexed data
- path.repo: The location of any snapshot repositories to backup indexes
- bootstrap.memory_lock: Ensures none of the JVM is swapped to disk
- discovery.zen.minimum_master_nodes: Minimum required masters for the cluster
The bootstrap.memory_lock entry ensures none of the JVM will be swapped to disk
during execution, and setting this value to false will negatively affect the
health of your Elasticsearch nodes. The discovery.zen.minimum_master_nodes flag
registers the minimum number of masters required for your Elasticsearch cluster
to register as healthy and functional.
To read more about Elasticsearch's configuration file, please see the official
documentation_.
.. _documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html
Elastic Curator
---------------
The Elasticsearch chart contains a cronjob to run Elastic Curator at specified
intervals to manage the lifecycle of your indices. Curator can perform:
- Take and send a snapshot of your indexes to a specified snapshot repository
- Delete indexes older than a specified length of time
- Restore indexes with previous index snapshots
- Reindex an index into a new or preexisting index
The full list of supported Curator actions can be found in the actions_ section of
the official Curator documentation. The list of options available for those
actions can be found in the options_ section of the Curator documentation.
.. _actions: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/actions.html
.. _options: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/options.html
Curator's configuration is handled via entries in Elasticsearch's values.yaml
file and must be overridden to achieve your index lifecycle management
needs. Please note that any unused field should be left blank, as an entry of
"None" will result in an exception, as Curator will read it as a Python NoneType
insead of a value of None.
The section for Curator's service configuration can be found at:
::
conf:
curator:
config:
client:
hosts:
- elasticsearch-logging
port: 9200
url_prefix:
use_ssl: False
certificate:
client_cert:
client_key:
ssl_no_validate: False
http_auth:
timeout: 30
master_only: False
logging:
loglevel: INFO
logfile:
logformat: default
blacklist: ['elasticsearch', 'urllib3']
Curator's actions are configured in the following section:
::
conf:
curator:
action_file:
actions:
1:
action: delete_indices
description: "Clean up ES by deleting old indices"
options:
timeout_override:
continue_if_exception: False
ignore_empty_list: True
disable_action: True
filters:
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 30
field:
stats_result:
epoch:
exclude: False
The Elasticsearch chart contains commented example actions for deleting and
snapshotting indexes older 30 days. Please note these actions are provided as a
reference and are disabled by default to avoid any unexpected behavior against
your indexes.
Elasticsearch Exporter
----------------------
The Elasticsearch chart contains templates for an exporter to provide metrics
for Prometheus. These metrics provide insight into the performance and overall
health of your Elasticsearch cluster. Please note monitoring for Elasticsearch
is disabled by default, and must be enabled with the following override:
::
monitoring:
prometheus:
enabled: true
The Elasticsearch exporter uses the same service annotations as the other
exporters, and no additional configuration is required for Prometheus to target
the Elasticsearch exporter for scraping. The Elasticsearch exporter is
configured with command line flags, and the flags' default values can be found
under the following key in the values.yaml file:
::
conf:
prometheus_elasticsearch_exporter:
es:
all: true
timeout: 20s
The configuration keys configure the following behaviors:
- es.all: Gather information from all nodes, not just the connecting node
- es.timeout: Timeout for metrics queries
More information about the Elasticsearch exporter can be found on the exporter's
GitHub_ page.
.. _GitHub: https://github.com/prometheus-community/elasticsearch_exporter
Snapshot Repositories
---------------------
Before Curator can store snapshots in a specified repository, Elasticsearch must
register the configured repository. To achieve this, the Elasticsearch chart
contains a job for registering an s3 snapshot repository backed by radosgateway.
This job is disabled by default as the curator actions for snapshots are
disabled by default. To enable the snapshot job, the
conf.elasticsearch.snapshots.enabled flag must be set to true. The following
configuration keys are relevant:
- conf.elasticsearch.snapshots.enabled: Enable snapshot repositories
- conf.elasticsearch.snapshots.bucket: Name of the RGW s3 bucket to use
- conf.elasticsearch.snapshots.repositories: Name of repositories to create
More information about Elasticsearch repositories can be found in the official
Elasticsearch snapshot_ documentation:
.. _snapshot: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html#_repositories

@ -0,0 +1,279 @@
Fluent-logging
===============
The fluent-logging chart in openstack-helm-infra provides the base for a
centralized logging platform for OpenStack-Helm. The chart combines two
services, Fluentbit and Fluentd, to gather logs generated by the services,
filter on or add metadata to logged events, then forward them to Elasticsearch
for indexing.
Fluentbit
---------
Fluentbit runs as a log-collecting component on each host in the cluster, and
can be configured to target specific log locations on the host. The Fluentbit_
configuration schema can be found on the official Fluentbit website.
.. _Fluentbit: http://fluentbit.io/documentation/0.12/configuration/schema.html
Fluentbit provides a set of plug-ins for ingesting and filtering various log
types. These plug-ins include:
- Tail: Tails a defined file for logged events
- Kube: Adds Kubernetes metadata to a logged event
- Systemd: Provides ability to collect logs from the journald daemon
- Syslog: Provides the ability to collect logs from a Unix socket (TCP or UDP)
The complete list of plugins can be found in the configuration_ section of the
Fluentbit documentation.
.. _configuration: http://fluentbit.io/documentation/current/configuration/
Fluentbit uses parsers to turn unstructured log entries into structured entries
to make processing and filtering events easier. The two formats supported are
JSON maps and regular expressions. More information about Fluentbit's parsing
abilities can be found in the parsers_ section of Fluentbit's documentation.
.. _parsers: http://fluentbit.io/documentation/current/parser/
Fluentbit's service and parser configurations are defined via the values.yaml
file, which allows for custom definitions of inputs, filters and outputs for
your logging needs.
Fluentbit's configuration can be found under the following key:
::
conf:
fluentbit:
- service:
header: service
Flush: 1
Daemon: Off
Log_Level: info
Parsers_File: parsers.conf
- containers_tail:
header: input
Name: tail
Tag: kube.*
Path: /var/log/containers/*.log
Parser: docker
DB: /var/log/flb_kube.db
Mem_Buf_Limit: 5MB
- kube_filter:
header: filter
Name: kubernetes
Match: kube.*
Merge_JSON_Log: On
- fluentd_output:
header: output
Name: forward
Match: "*"
Host: ${FLUENTD_HOST}
Port: ${FLUENTD_PORT}
Fluentbit is configured by default to capture logs at the info log level. To
change this, override the Log_Level key with the appropriate levels, which are
documented in Fluentbit's configuration_.
Fluentbit's parser configuration can be found under the following key:
::
conf:
parsers:
- docker:
header: parser
Name: docker
Format: json
Time_Key: time
Time_Format: "%Y-%m-%dT%H:%M:%S.%L"
Time_Keep: On
The values for the fluentbit and parsers keys are consumed by a fluent-logging
helper template that produces the appropriate configurations for the relevant
sections. Each list item (keys prefixed with a '-') represents a section in the
configuration files, and the arbitrary name of the list item should represent a
logical description of the section defined. The header key represents the type
of definition (filter, input, output, service or parser), and the remaining
entries will be rendered as space delimited configuration keys and values. For
example, the definitions above would result in the following:
::
[SERVICE]
Daemon false
Flush 1
Log_Level info
Parsers_File parsers.conf
[INPUT]
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Name tail
Parser docker
Path /var/log/containers/*.log
Tag kube.*
[FILTER]
Match kube.*
Merge_JSON_Log true
Name kubernetes
[OUTPUT]
Host ${FLUENTD_HOST}
Match *
Name forward
Port ${FLUENTD_PORT}
[PARSER]
Format json
Name docker
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep true
Time_Key time
Fluentd
-------
Fluentd runs as a forwarding service that receives event entries from Fluentbit
and routes them to the appropriate destination. By default, Fluentd will route
all entries received from Fluentbit to Elasticsearch for indexing. The
Fluentd_ configuration schema can be found at the official Fluentd website.
.. _Fluentd: https://docs.fluentd.org/v0.12/articles/config-file
Fluentd's configuration is handled in the values.yaml file in fluent-logging.
Similar to Fluentbit, configuration overrides provide flexibility in defining
custom routes for tagged log events. The configuration can be found under the
following key:
::
conf:
fluentd:
- fluentbit_forward:
header: source
type: forward
port: "#{ENV['FLUENTD_PORT']}"
bind: 0.0.0.0
- elasticsearch:
header: match
type: elasticsearch
expression: "**"
include_tag_key: true
host: "#{ENV['ELASTICSEARCH_HOST']}"
port: "#{ENV['ELASTICSEARCH_PORT']}"
logstash_format: true
buffer_chunk_limit: 10M
buffer_queue_limit: 32
flush_interval: "20"
max_retry_wait: 300
disable_retry_limit: ""
The values for the fluentd keys are consumed by a fluent-logging helper template
that produces appropriate configurations for each directive desired. The list
items (keys prefixed with a '-') represent sections in the configuration file,
and the name of each list item should represent a logical description of the
section defined. The header key represents the type of definition (name of the
fluentd plug-in used), and the expression key is used when the plug-in requires
a pattern to match against (example: matches on certain input patterns). The
remaining entries will be rendered as space delimited configuration keys and
values. For example, the definition above would result in the following:
::
<source>
bind 0.0.0.0
port "#{ENV['FLUENTD_PORT']}"
@type forward
</source>
<match **>
buffer_chunk_limit 10M
buffer_queue_limit 32
disable_retry_limit
flush_interval 20s
host "#{ENV['ELASTICSEARCH_HOST']}"
include_tag_key true
logstash_format true
max_retry_wait 300
port "#{ENV['ELASTICSEARCH_PORT']}"
@type elasticsearch
</match>
Some fluentd plug-ins require nested definitions. The fluentd helper template
can handle these definitions with the following structure:
::
conf:
td_agent:
- fluentbit_forward:
header: source
type: forward
port: "#{ENV['FLUENTD_PORT']}"
bind: 0.0.0.0
- log_transformer:
header: filter
type: record_transformer
expression: "foo.bar"
inner_def:
- record_transformer:
header: record
hostname: my_host
tag: my_tag
In this example, the my_transformer list will generate a nested configuration
entry in the log_transformer section. The nested definitions are handled by
supplying a list as the value for an arbitrary key, and the list value will
indicate the entry should be handled as a nested definition. The helper
template will render the above example key/value pairs as the following:
::
<source>
bind 0.0.0.0
port "#{ENV['FLUENTD_PORT']}"
@type forward
</source>
<filter foo.bar>
<record>
hostname my_host
tag my_tag
</record>
@type record_transformer
</filter>
Fluentd Exporter
----------------------
The fluent-logging chart contains templates for an exporter to provide metrics
for Fluentd. These metrics provide insight into Fluentd's performance. Please
note monitoring for Fluentd is disabled by default, and must be enabled with the
following override:
::
monitoring:
prometheus:
enabled: true
The Fluentd exporter uses the same service annotations as the other exporters,
and no additional configuration is required for Prometheus to target the
Fluentd exporter for scraping. The Fluentd exporter is configured with command
line flags, and the flags' default values can be found under the following key
in the values.yaml file:
::
conf:
fluentd_exporter:
log:
format: "logger:stdout?json=true"
level: "info"
The configuration keys configure the following behaviors:
- log.format: Define the logger used and format of the output
- log.level: Log level for the exporter to use
More information about the Fluentd exporter can be found on the exporter's
GitHub_ page.
.. _GitHub: https://github.com/V3ckt0r/fluentd_exporter

@ -0,0 +1,11 @@
OpenStack-Helm Logging
======================
Contents:
.. toctree::
:maxdepth: 2
elasticsearch
fluent-logging
kibana

@ -0,0 +1,76 @@
Kibana
======
The Kibana chart in OpenStack-Helm Infra provides visualization for logs indexed
into Elasticsearch. These visualizations provide the means to view logs captured
from services deployed in cluster and targeted for collection by Fluentbit.
Authentication
--------------
The Kibana deployment includes a sidecar container that runs an Apache reverse
proxy to add authentication capabilities for Kibana. The username and password
are configured under the Kibana entry in the endpoints section of the chart's
values.yaml.
The configuration for Apache can be found under the conf.httpd key, and uses a
helm-toolkit function that allows for including gotpl entries in the template
directly. This allows the use of other templates, like the endpoint lookup
function templates, directly in the configuration for Apache.
Configuration
-------------
Kibana's configuration is driven by the chart's values.yaml file. The configuration
options are found under the following keys:
::
conf:
elasticsearch:
pingTimeout: 1500
preserveHost: true
requestTimeout: 30000
shardTimeout: 0
startupTimeout: 5000
i18n:
defaultLocale: en
kibana:
defaultAppId: discover
index: .kibana
logging:
quiet: false
silent: false
verbose: false
ops:
interval: 5000
server:
host: localhost
maxPayloadBytes: 1048576
port: 5601
ssl:
enabled: false
The case of the sub-keys is important as these values are injected into
Kibana's configuration configmap with the toYaml function. More information on
the configuration options and available settings can be found in the official
Kibana documentation_.
.. _documentation: https://www.elastic.co/guide/en/kibana/current/settings.html
Installation
------------
.. code_block: bash
helm install --namespace=<namespace> local/kibana --name=kibana
Setting Time Field
------------------
For Kibana to successfully read the logs from Elasticsearch's indexes, the time
field will need to be manually set after Kibana has successfully deployed. Upon
visiting the Kibana dashboard for the first time, a prompt will appear to choose the
time field with a drop down menu. The default time field for Elasticsearch indexes
is '@timestamp'. Once this field is selected, the default view for querying log entries
can be found by selecting the "Discover"

@ -0,0 +1,89 @@
Grafana
=======
The Grafana chart in OpenStack-Helm Infra provides default dashboards for the
metrics gathered with Prometheus. The default dashboards include visualizations
for metrics on: Ceph, Kubernetes, nodes, containers, MySQL, RabbitMQ, and
OpenStack.
Configuration
-------------
Grafana
~~~~~~~
Grafana's configuration is driven with the chart's values.YAML file, and the
relevant configuration entries are under the following key:
::
conf:
grafana:
paths:
server:
database:
session:
security:
users:
log:
log.console:
dashboards.json:
grafana_net:
These keys correspond to sections in the grafana.ini configuration file, and the
to_ini helm-toolkit function will render these values into the appropriate
format in grafana.ini. The list of options for these keys can be found in the
official Grafana configuration_ documentation.
.. _configuration: https://grafana.com/docs/installation/configuration/
Prometheus Data Source
~~~~~~~~~~~~~~~~~~~~~~
Grafana requires configured data sources for gathering metrics for display in
its dashboards. The configuration options for datasources are found under the
following key in Grafana's values.YAML file:
::
conf:
provisioning:
datasources;
monitoring:
name: prometheus
type: prometheus
access: proxy
orgId: 1
editable: true
basicAuth: true
The Grafana chart will use the keys under each entry beneath
.conf.provisioning.datasources as inputs to a helper template that will render
the appropriate configuration for the data source. The key for each data source
(monitoring in the above example) should map to an entry in the endpoints
section in the chart's values.yaml, as the data source's URL and authentication
credentials will be populated by the values defined in the defined endpoint.
.. _sources: https://grafana.com/docs/features/datasources/
Dashboards
~~~~~~~~~~
Grafana adds dashboards during installation with dashboards defined in YAML under
the following key:
::
conf:
dashboards:
These YAML definitions are transformed to JSON are added to Grafana's
configuration configmap and mounted to the Grafana pods dynamically, allowing for
flexibility in defining and adding custom dashboards to Grafana. Dashboards can
be added by inserting a new key along with a YAML dashboard definition as the
value. Additional dashboards can be found by searching on Grafana's dashboards_
page or you can define your own. A json-to-YAML tool, such as json2yaml_ , will
help transform any custom or new dashboards from JSON to YAML.
.. _json2yaml: https://www.json2yaml.com/

@ -0,0 +1,11 @@
OpenStack-Helm Monitoring
=========================
Contents:
.. toctree::
:maxdepth: 2
grafana
prometheus
nagios

@ -0,0 +1,365 @@
Nagios
======
The Nagios chart in openstack-helm-infra can be used to provide an alarming
service that's tightly coupled to an OpenStack-Helm deployment. The Nagios
chart uses a custom Nagios core image that includes plugins developed to query
Prometheus directly for scraped metrics and triggered alarms, query the Ceph
manager endpoints directly to determine the health of a Ceph cluster, and to
query Elasticsearch for logged events that meet certain criteria (experimental).
Authentication
--------------
The Nagios deployment includes a sidecar container that runs an Apache reverse
proxy to add authentication capabilities for Nagios. The username and password
are configured under the nagios entry in the endpoints section of the chart's
values.yaml.
The configuration for Apache can be found under the conf.httpd key, and uses a
helm-toolkit function that allows for including gotpl entries in the template
directly. This allows the use of other templates, like the endpoint lookup
function templates, directly in the configuration for Apache.
Image Plugins
-------------
The Nagios image used contains custom plugins that can be used for the defined
service check commands. These plugins include:
- check_prometheus_metric.py: Query Prometheus for a specific metric and value
- check_exporter_health_metric.sh: Nagios plugin to query prometheus exporter
- check_rest_get_api.py: Check REST API status
- check_update_prometheus_hosts.py: Queries Prometheus, updates Nagios config
- query_prometheus_alerts.py: Nagios plugin to query prometheus ALERTS metric
More information about the Nagios image and plugins can be found here_.
.. _here: https://github.com/att-comdev/nagios
Nagios Service Configuration
----------------------------
The Nagios service is configured via the following section in the chart's
values file:
::
conf:
nagios:
nagios:
log_file: /opt/nagios/var/log/nagios.log
cfg_file:
- /opt/nagios/etc/nagios_objects.cfg
- /opt/nagios/etc/objects/commands.cfg
- /opt/nagios/etc/objects/contacts.cfg
- /opt/nagios/etc/objects/timeperiods.cfg
- /opt/nagios/etc/objects/templates.cfg
- /opt/nagios/etc/objects/prometheus_discovery_objects.cfg
object_cache_file: /opt/nagios/var/objects.cache
precached_object_file: /opt/nagios/var/objects.precache
resource_file: /opt/nagios/etc/resource.cfg
status_file: /opt/nagios/var/status.dat
status_update_interval: 10
nagios_user: nagios
nagios_group: nagios
check_external_commands: 1
command_file: /opt/nagios/var/rw/nagios.cmd
lock_file: /var/run/nagios.lock
temp_file: /opt/nagios/var/nagios.tmp
temp_path: /tmp
event_broker_options: -1
log_rotation_method: d
log_archive_path: /opt/nagios/var/log/archives
use_syslog: 1
log_service_retries: 1
log_host_retries: 1
log_event_handlers: 1
log_initial_states: 0
log_current_states: 1
log_external_commands: 1
log_passive_checks: 1
service_inter_check_delay_method: s
max_service_check_spread: 30
service_interleave_factor: s
host_inter_check_delay_method: s
max_host_check_spread: 30
max_concurrent_checks: 60
check_result_reaper_frequency: 10
max_check_result_reaper_time: 30
check_result_path: /opt/nagios/var/spool/checkresults
max_check_result_file_age: 3600
cached_host_check_horizon: 15
cached_service_check_horizon: 15
enable_predictive_host_dependency_checks: 1
enable_predictive_service_dependency_checks: 1
soft_state_dependencies: 0
auto_reschedule_checks: 0
auto_rescheduling_interval: 30
auto_rescheduling_window: 180
service_check_timeout: 60
host_check_timeout: 60
event_handler_timeout: 60
notification_timeout: 60
ocsp_timeout: 5
perfdata_timeout: 5
retain_state_information: 1
state_retention_file: /opt/nagios/var/retention.dat
retention_update_interval: 60
use_retained_program_state: 1
use_retained_scheduling_info: 1
retained_host_attribute_mask: 0
retained_service_attribute_mask: 0
retained_process_host_attribute_mask: 0
retained_process_service_attribute_mask: 0
retained_contact_host_attribute_mask: 0
retained_contact_service_attribute_mask: 0
interval_length: 1
check_workers: 4
check_for_updates: 1
bare_update_check: 0
use_aggressive_host_checking: 0
execute_service_checks: 1
accept_passive_service_checks: 1
execute_host_checks: 1
accept_passive_host_checks: 1
enable_notifications: 1
enable_event_handlers: 1
process_performance_data: 0
obsess_over_services: 0
obsess_over_hosts: 0
translate_passive_host_checks: 0
passive_host_checks_are_soft: 0
check_for_orphaned_services: 1
check_for_orphaned_hosts: 1
check_service_freshness: 1
service_freshness_check_interval: 60
check_host_freshness: 0
host_freshness_check_interval: 60
additional_freshness_latency: 15
enable_flap_detection: 1
low_service_flap_threshold: 5.0
high_service_flap_threshold: 20.0
low_host_flap_threshold: 5.0
high_host_flap_threshold: 20.0
date_format: us
use_regexp_matching: 1
use_true_regexp_matching: 0
daemon_dumps_core: 0
use_large_installation_tweaks: 0
enable_environment_macros: 0
debug_level: 0
debug_verbosity: 1
debug_file: /opt/nagios/var/nagios.debug
max_debug_file_size: 1000000
allow_empty_hostgroup_assignment: 1
illegal_macro_output_chars: "`~$&|'<>\""
Nagios CGI Configuration
------------------------
The Nagios CGI configuration is defined via the following section in the chart's
values file:
::
conf:
nagios:
cgi:
main_config_file: /opt/nagios/etc/nagios.cfg
physical_html_path: /opt/nagios/share
url_html_path: /nagios
show_context_help: 0
use_pending_states: 1
use_authentication: 0
use_ssl_authentication: 0
authorized_for_system_information: "*"
authorized_for_configuration_information: "*"
authorized_for_system_commands: nagiosadmin
authorized_for_all_services: "*"
authorized_for_all_hosts: "*"
authorized_for_all_service_commands: "*"
authorized_for_all_host_commands: "*"
default_statuswrl_layout: 4
ping_syntax: /bin/ping -n -U -c 5 $HOSTADDRESS$
refresh_rate: 90
result_limit: 100
escape_html_tags: 1
action_url_target: _blank
notes_url_target: _blank
lock_author_names: 1
navbar_search_for_addresses: 1
navbar_search_for_aliases: 1
Nagios Host Configuration
-------------------------
The Nagios chart includes a single host definition for the Prometheus instance
queried for metrics. The host definition can be found under the following
values key:
::
conf:
nagios:
hosts:
- prometheus:
use: linux-server
host_name: prometheus
alias: "Prometheus Monitoring"
address: 127.0.0.1
hostgroups: prometheus-hosts
check_command: check-prometheus-host-alive
The address for the Prometheus host is defined by the PROMETHEUS_SERVICE
environment variable in the deployment template, which is determined by the
monitoring entry in the Nagios chart's endpoints section. The endpoint is then
available as a macro for Nagios to use in all Prometheus based queries. For
example:
::
- check_prometheus_host_alive:
command_name: check-prometheus-host-alive
command_line: "$USER1$/check_rest_get_api.py --url $USER2$ --warning_response_seconds 5 --critical_response_seconds 10"
The $USER2$ macro above corresponds to the Prometheus endpoint defined in the
PROMETHEUS_SERVICE environment variable. All checks that use the
prometheus-hosts hostgroup will map back to the Prometheus host defined by this
endpoint.
Nagios HostGroup Configuration
------------------------------
The Nagios chart includes configuration values for defined host groups under the
following values key:
::
conf:
nagios:
host_groups:
- prometheus-hosts:
hostgroup_name: prometheus-hosts
alias: "Prometheus Virtual Host"
- base-os:
hostgroup_name: base-os
alias: "base-os"
These hostgroups are used to define which group of hosts should be targeted by
a particular nagios check. An example of a check that targets Prometheus for a
specific metric query would be:
::
- check_ceph_monitor_quorum:
use: notifying_service
hostgroup_name: prometheus-hosts
service_description: "CEPH_quorum"
check_command: check_prom_alert!ceph_monitor_quorum_low!CRITICAL- ceph monitor quorum does not exist!OK- ceph monitor quorum exists
check_interval: 60
An example of a check that targets all hosts for a base-os type check (memory
usage, latency, etc) would be:
::
- check_memory_usage:
use: notifying_service
service_description: Memory_usage
check_command: check_memory_usage
hostgroup_name: base-os
These two host groups allow for a wide range of targeted checks for determining
the status of all components of an OpenStack-Helm deployment.
Nagios Command Configuration
----------------------------
The Nagios chart includes configuration values for the command definitions Nagios
will use when executing service checks. These values are found under the
following key:
::
conf:
nagios:
commands:
- send_service_snmp_trap:
command_name: send_service_snmp_trap
command_line: "$USER1$/send_service_trap.sh '$USER8$' '$HOSTNAME$' '$SERVICEDESC$' $SERVICESTATEID$ '$SERVICEOUTPUT$' '$USER4$' '$USER5$'"
- send_host_snmp_trap:
command_name: send_host_snmp_trap
command_line: "$USER1$/send_host_trap.sh '$USER8$' '$HOSTNAME$' $HOSTSTATEID$ '$HOSTOUTPUT$' '$USER4$' '$USER5$'"
- send_service_http_post:
command_name: send_service_http_post
command_line: "$USER1$/send_http_post_event.py --type service --hostname '$HOSTNAME$' --servicedesc '$SERVICEDESC$' --state_id $SERVICESTATEID$ --output '$SERVICEOUTPUT$' --monitoring_hostname '$HOSTNAME$' --primary_url '$USER6$' --secondary_url '$USER7$'"
- send_host_http_post:
command_name: send_host_http_post
command_line: "$USER1$/send_http_post_event.py --type host --hostname '$HOSTNAME$' --state_id $HOSTSTATEID$ --output '$HOSTOUTPUT$' --monitoring_hostname '$HOSTNAME$' --primary_url '$USER6$' --secondary_url '$USER7$'"
- check_prometheus_host_alive:
command_name: check-prometheus-host-alive
command_line: "$USER1$/check_rest_get_api.py --url $USER2$ --warning_response_seconds 5 --critical_response_seconds 10"
The list of defined commands can be modified with configuration overrides, which
allows for the ability define commands specific to an infrastructure deployment.
These commands can include querying Prometheus for metrics on dependencies for a
service to determine whether an alert should be raised, executing checks on each
host to determine network latency or file system usage, or checking each node
for issues with ntp clock skew.
Note: Since the conf.nagios.commands key contains a list of the defined commands,
the entire contents of conf.nagios.commands will need to be overridden if
additional commands are desired (due to the immutable nature of lists).
Nagios Service Check Configuration
----------------------------------
The Nagios chart includes configuration values for the service checks Nagios
will execute. These service check commands can be found under the following
key:
::
conf:
nagios:
services:
- notifying_service:
name: notifying_service
use: generic-service
flap_detection_enabled: 0
process_perf_data: 0
contact_groups: snmp_and_http_notifying_contact_group
check_interval: 60
notification_interval: 120
retry_interval: 30
register: 0
- check_ceph_health:
use: notifying_service
hostgroup_name: base-os
service_description: "CEPH_health"
check_command: check_ceph_health
check_interval: 300
- check_hosts_health:
use: generic-service
hostgroup_name: prometheus-hosts
service_description: "Nodes_health"
check_command: check_prom_alert!K8SNodesNotReady!CRITICAL- One or more nodes are not ready.
check_interval: 60
- check_prometheus_replicas:
use: notifying_service
hostgroup_name: prometheus-hosts
service_description: "Prometheus_replica-count"
check_command: check_prom_alert_with_labels!replicas_unavailable_statefulset!statefulset="prometheus"!statefulset {statefulset} has lesser than configured replicas
check_interval: 60
The Nagios service configurations define the checks Nagios will perform. These
checks contain keys for defining: the service type to use, the host group to
target, the description of the service check, the command the check should use,
and the interval at which to trigger the service check. These services can also
be extended to provide additional insight into the overall status of a
particular service. These services also allow the ability to define advanced
checks for determining the overall health and liveness of a service. For
example, a service check could trigger an alarm for the OpenStack services when
Nagios detects that the relevant database and message queue has become
unresponsive.

@ -0,0 +1,338 @@
Prometheus
==========
The Prometheus chart in openstack-helm-infra provides a time series database and
a strong querying language for monitoring various components of OpenStack-Helm.
Prometheus gathers metrics by scraping defined service endpoints or pods at
specified intervals and indexing them in the underlying time series database.
Authentication
--------------
The Prometheus deployment includes a sidecar container that runs an Apache
reverse proxy to add authentication capabilities for Prometheus. The
username and password are configured under the monitoring entry in the endpoints
section of the chart's values.yaml.
The configuration for Apache can be found under the conf.httpd key, and uses a
helm-toolkit function that allows for including gotpl entries in the template
directly. This allows the use of other templates, like the endpoint lookup
function templates, directly in the configuration for Apache.
Prometheus Service configuration
--------------------------------
The Prometheus service is configured via command line flags set during runtime.
These flags include: setting the configuration file, setting log levels, setting
characteristics of the time series database, and enabling the web admin API for
snapshot support. These settings can be configured via the values tree at:
::
conf:
prometheus:
command_line_flags:
log.level: info
query.max_concurrency: 20
query.timeout: 2m
storage.tsdb.path: /var/lib/prometheus/data
storage.tsdb.retention: 7d
web.enable_admin_api: false
web.enable_lifecycle: false
The Prometheus configuration file contains the definitions for scrape targets
and the location of the rules files for triggering alerts on scraped metrics.
The configuration file is defined in the values file, and can be found at:
::
conf:
prometheus:
scrape_configs: |
By defining the configuration via the values file, an operator can override all
configuration components of the Prometheus deployment at runtime.
Kubernetes Endpoint Configuration
---------------------------------
The Prometheus chart in openstack-helm-infra uses the built-in service discovery
mechanisms for Kubernetes endpoints and pods to automatically configure scrape
targets. Functions added to helm-toolkit allows configuration of these targets
via annotations that can be applied to any service or pod that exposes metrics
for Prometheus, whether a service for an application-specific exporter or an
application that provides a metrics endpoint via its service. The values in
these functions correspond to entries in the monitoring tree under the
prometheus key in a chart's values.yaml file.
The functions definitions are below:
::
{{- define "helm-toolkit.snippets.prometheus_service_annotations" -}}
{{- $config := index . 0 -}}
{{- if $config.scrape }}
prometheus.io/scrape: {{ $config.scrape | quote }}
{{- end }}
{{- if $config.scheme }}
prometheus.io/scheme: {{ $config.scheme | quote }}
{{- end }}
{{- if $config.path }}
prometheus.io/path: {{ $config.path | quote }}
{{- end }}
{{- if $config.port }}
prometheus.io/port: {{ $config.port | quote }}
{{- end }}
{{- end -}}
::
{{- define "helm-toolkit.snippets.prometheus_pod_annotations" -}}
{{- $config := index . 0 -}}
{{- if $config.scrape }}
prometheus.io/scrape: {{ $config.scrape | quote }}
{{- end }}
{{- if $config.path }}
prometheus.io/path: {{ $config.path | quote }}
{{- end }}
{{- if $config.port }}
prometheus.io/port: {{ $config.port | quote }}
{{- end }}
{{- end -}}
These functions render the following annotations:
- prometheus.io/scrape: Must be set to true for Prometheus to scrape target
- prometheus.io/scheme: Overrides scheme used to scrape target if not http
- prometheus.io/path: Overrides path used to scrape target metrics if not /metrics
- prometheus.io/port: Overrides port to scrape metrics on if not service's default port
Each chart that can be targeted for monitoring by Prometheus has a prometheus
section under a monitoring tree in the chart's values.yaml, and Prometheus
monitoring is disabled by default for those services. Example values for the
required entries can be found in the following monitoring configuration for the
prometheus-node-exporter chart:
::
monitoring:
prometheus:
enabled: false
node_exporter:
scrape: true
If the prometheus.enabled key is set to true, the annotations are set on the
targeted service or pod as the condition for applying the annotations evaluates
to true. For example:
::
{{- $prometheus_annotations := $envAll.Values.monitoring.prometheus.node_exporter }}
---
apiVersion: v1
kind: Service
metadata:
name: {{ tuple "node_metrics" "internal" . | include "helm-toolkit.endpoints.hostname_short_endpoint_lookup" }}
labels:
{{ tuple $envAll "node_exporter" "metrics" | include "helm-toolkit.snippets.kubernetes_metadata_labels" | indent 4 }}
annotations:
{{- if .Values.monitoring.prometheus.enabled }}
{{ tuple $prometheus_annotations | include "helm-toolkit.snippets.prometheus_service_annotations" | indent 4 }}
{{- end }}
Kubelet, API Server, and cAdvisor
---------------------------------
The Prometheus chart includes scrape target configurations for the kubelet, the
Kubernetes API servers, and cAdvisor. These targets are configured based on
a kubeadm deployed Kubernetes cluster, as OpenStack-Helm uses kubeadm to deploy
Kubernetes in the gates. These configurations may need to change based on your
chosen method of deployment. Please note the cAdvisor metrics will not be
captured if the kubelet was started with the following flag:
::
--cadvisor-port=0
To enable the gathering of the kubelet's custom metrics, the following flag must
be set:
::
--enable-custom-metrics
Installation
------------
The Prometheus chart can be installed with the following command:
.. code-block:: bash
helm install --namespace=openstack local/prometheus --name=prometheus
The above command results in a Prometheus deployment configured to automatically
discover services with the necessary annotations for scraping, configured to
gather metrics on the kubelet, the Kubernetes API servers, and cAdvisor.
Extending Prometheus
--------------------
Prometheus can target various exporters to gather metrics related to specific
applications to extend visibility into an OpenStack-Helm deployment. Currently,
openstack-helm-infra contains charts for:
- prometheus-kube-state-metrics: Provides additional Kubernetes metrics
- prometheus-node-exporter: Provides metrics for nodes and linux kernels
- prometheus-openstack-metrics-exporter: Provides metrics for OpenStack services
Kube-State-Metrics
~~~~~~~~~~~~~~~~~~
The prometheus-kube-state-metrics chart provides metrics for Kubernetes objects
as well as metrics for kube-scheduler and kube-controller-manager. Information
on the specific metrics available via the kube-state-metrics service can be
found in the kube-state-metrics_ documentation.
The prometheus-kube-state-metrics chart can be installed with the following:
.. code-block:: bash
helm install --namespace=kube-system local/prometheus-kube-state-metrics --name=prometheus-kube-state-metrics
.. _kube-state-metrics: https://github.com/kubernetes/kube-state-metrics/tree/master/Documentation
Node Exporter
~~~~~~~~~~~~~
The prometheus-node-exporter chart provides hardware and operating system metrics
exposed via Linux kernels. Information on the specific metrics available via
the Node exporter can be found on the Node_exporter_ GitHub page.
The prometheus-node-exporter chart can be installed with the following:
.. code-block:: bash
helm install --namespace=kube-system local/prometheus-node-exporter --name=prometheus-node-exporter
.. _Node_exporter: https://github.com/prometheus/node_exporter
OpenStack Exporter
~~~~~~~~~~~~~~~~~~
The prometheus-openstack-exporter chart provides metrics specific to the
OpenStack services. The exporter's source code can be found here_. While the
metrics provided are by no means comprehensive, they will be expanded upon.
Please note the OpenStack exporter requires the creation of a Keystone user to
successfully gather metrics. To create the required user, the chart uses the
same keystone user management job the OpenStack service charts use.
The prometheus-openstack-exporter chart can be installed with the following:
.. code-block:: bash
helm install --namespace=openstack local/prometheus-openstack-exporter --name=prometheus-openstack-exporter
.. _here: https://github.com/att-comdev/openstack-metrics-collector
Other exporters
~~~~~~~~~~~~~~~
Certain charts in OpenStack-Helm include templates for application-specific
Prometheus exporters, which keeps the monitoring of those services tightly coupled
to the chart. The templates for these exporters can be found in the monitoring
subdirectory in the chart. These exporters are disabled by default, and can be
enabled by setting the appropriate flag in the monitoring.prometheus key of the
chart's values.yaml file. The charts containing exporters include:
- Elasticsearch_
- RabbitMQ_
- MariaDB_
- Memcached_
- Fluentd_
- Postgres_
.. _Elasticsearch: https://github.com/prometheus-community/elasticsearch_exporter
.. _RabbitMQ: https://github.com/kbudde/rabbitmq_exporter
.. _MariaDB: https://github.com/prometheus/mysqld_exporter
.. _Memcached: https://github.com/prometheus/memcached_exporter
.. _Fluentd: https://github.com/V3ckt0r/fluentd_exporter
.. _Postgres: https://github.com/wrouesnel/postgres_exporter
Ceph
~~~~
Starting with Luminous, Ceph can export metrics with ceph-mgr prometheus module.
This module can be enabled in Ceph's values.yaml under the ceph_mgr_enabled_plugins
key by appending prometheus to the list of enabled modules. After enabling the
prometheus module, metrics can be scraped on the ceph-mgr service endpoint. This
relies on the Prometheus annotations attached to the ceph-mgr service template, and
these annotations can be modified in the endpoints section of Ceph's values.yaml
file. Information on the specific metrics available via the prometheus module
can be found in the Ceph prometheus_ module documentation.
.. _prometheus: http://docs.ceph.com/docs/master/mgr/prometheus/
Prometheus Dashboard
--------------------
Prometheus includes a dashboard that can be accessed via the accessible
Prometheus endpoint (NodePort or otherwise). This dashboard will give you a
view of your scrape targets' state, the configuration values for Prometheus's
scrape jobs and command line flags, a view of any alerts triggered based on the
defined rules, and a means for using PromQL to query scraped metrics. The
Prometheus dashboard is a useful tool for verifying Prometheus is configured
appropriately and to verify the status of any services targeted for scraping via
the Prometheus service discovery annotations.
Rules Configuration
-------------------
Prometheus provides a querying language that can operate on defined rules which
allow for the generation of alerts on specific metrics. The Prometheus chart in
openstack-helm-infra defines these rules via the values.yaml file. By defining
these in the values file, it allows operators flexibility to provide specific
rules via overrides at installation. The following rules keys are provided:
::
values:
conf:
rules:
alertmanager:
etcd3:
kube_apiserver:
kube_controller_manager:
kubelet:
kubernetes:
rabbitmq:
mysql:
ceph:
openstack:
custom:
These provided keys provide recording and alert rules for all infrastructure
components of an OpenStack-Helm deployment. If you wish to exclude rules for a
component, leave the tree empty in an overrides file. To read more
about Prometheus recording and alert rules definitions, please see the official
Prometheus recording_ and alert_ rules documentation.
.. _recording: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/
.. _alert: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
Note: Prometheus releases prior to 2.0 used gotpl to define rules. Prometheus
2.0 changed the rules format to YAML, making them much easier to read. The
Prometheus chart in openstack-helm-infra uses Prometheus 2.0 by default to take
advantage of changes to the underlying storage layer and the handling of stale
data. The chart will not support overrides for Prometheus versions below 2.0,
as the command line flags for the service changed between versions.
The wide range of exporters included in OpenStack-Helm coupled with the ability
to define rules with configuration overrides allows for the addition of custom
alerting and recording rules to fit an operator's monitoring needs. Adding new
rules or modifying existing rules require overrides for either an existing key
under conf.rules or the addition of a new key under conf.rules. The addition
of custom rules can be used to define complex checks that can be extended for
determining the liveliness or health of infrastructure components.

@ -0,0 +1,979 @@
.. -*- coding: utf-8 -*-
.. NOTE TO MAINTAINERS: use rst2html script to convert .rst to .html
rst2html ./failure-domain.rst ./failure-domain.html
open ./failure-domain.html
==============================
Failure Domains in CRUSH Map
==============================
.. contents::
.. sectnum::
Overview
========
The `CRUSH Map <http://docs.ceph.com/docs/master/rados/operations/crush-map/?highlight=hammer%20profile>`__ in a Ceph cluster is best visualized
as an inverted tree. The hierarchical layout describes the physical
topology of the Ceph cluster. Through the physical topology, failure
domains are conceptualized from the different branches in the inverted
tree. CRUSH rules are created and map to failure domains with data
placement policy to distribute the data.
The internal nodes (non-leaves and non-root) in the hierarchy are identified
as buckets. Each bucket is a hierarchical aggregation of storage locations
and their assigned weights. These are the types defined by CRUSH as the
supported buckets.
::
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
This guide describes the host and rack buckets and their role in constructing
a CRUSH Map with separate failure domains. Once a Ceph cluster is configured
with the expected CRUSh Map and Rule, the PGs of the designated pool are
verified with a script (**utils-checkPGs.py**) to ensure that the OSDs in all the PGs
reside in separate failure domains.
Ceph Environment
================
The ceph commands and scripts described in this write-up are executed as
Linux user root on one of orchestration nodes and one of the ceph monitors
deployed as kubernetes pods. The root user has the credential to execute
all the ceph commands.
On a kubernetes cluster, a separate namespace named **ceph** is configured
for the ceph cluster. Include the **ceph** namespace in **kubectl** when
executing this command.
A kubernetes pod is a collection of docker containers sharing a network
and mount namespace. It is the basic unit of deployment in the kubernetes
cluster. The node in the kubernetes cluster where the orchestration
operations are performed needs access to the **kubectl** command. In this
guide, this node is referred to as the orchestration node. On this
node, you can list all the pods that are deployed. To execute a command
in a given pod, use **kubectl** to locate the name of the pod and switch
to it to execute the command.
Orchestration Node
------------------
To gain access to the kubernetes orchestration node, use your login
credential and the authentication procedure assigned to you. For
environments setup with SSH key-based access, your id_rsa.pub (generated
through the ssh-keygen) public key should be in your ~/.ssh/authorized_keys
file on the orchestration node.
The kubernetes and ceph commands require the root login credential to
execute. Your Linux login requires the *sudo* privilege to execute
commands as user root. On the orchestration node, acquire the root's
privilege with your Linux login through the *sudo* command.
::
[orchestration]$ sudo -i
<Your Linux login's password>:
[orchestration]#
Kubernetes Pods
---------------
On the orchestration node, execute the **kubectl** command to list the
specific set of pods with the **--selector** option. This **kubectl**
command lists all the ceph monitor pods.
::
[orchestration]# kubectl get pods -n ceph --selector component=mon
NAME READY STATUS RESTARTS AGE
ceph-mon-85mlt 2/2 Running 0 9d
ceph-mon-9mpnb 2/2 Running 0 9d
ceph-mon-rzzqr 2/2 Running 0 9d
ceph-mon-snds8 2/2 Running 0 9d
ceph-mon-snzwx 2/2 Running 0 9d
The following **kubectl** command lists the Ceph OSD pods.
::
[orchestration]# kubectl get pods -n ceph --selector component=osd
NAME READY STATUS RESTARTS AGE
ceph-osd-default-166a1044-95s74 2/2 Running 0 9d
ceph-osd-default-166a1044-bglnm 2/2 Running 0 9d
ceph-osd-default-166a1044-lq5qq 2/2 Running 0 9d
ceph-osd-default-166a1044-lz6x6 2/2 Running 0 9d
. . .
To list all the pods in all the namespaces, execute this **kubectl** command.
::
[orchestration]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
ceph ceph-bootstrap-rpzld 0/1 Completed 0 10d
ceph ceph-cephfs-client-key-generator-pvzs6 0/1 Completed 0 10d
Execute Commands in Pods
^^^^^^^^^^^^^^^^^^^^^^^^
To execute multiple commands in a pod, you can switch to the execution
context of the pod with a /bin/bash session.
::
[orchestration]# kubectl exec -it ceph-mon-85mlt -n ceph -- /bin/bash
[ceph-mon]# ceph status
cluster:
id: 07c31d0f-bcc6-4db4-aadf-2d2a0f13edb8
health: HEALTH_OK
services:
mon: 5 daemons, quorum host1,host2,host3,host4,host5
mgr: host6(active), standbys: host1
mds: cephfs-1/1/1 up {0=mds-ceph-mds-7cb4f57cc-prh87=up:active}, 1 up:standby
osd: 72 osds: 72 up, 72 in
rgw: 2 daemons active
data:
pools: 20 pools, 3944 pgs
objects: 86970 objects, 323 GB
usage: 1350 GB used, 79077 GB / 80428 GB avail
pgs: 3944 active+clean
io:
client: 981 kB/s wr, 0 op/s rd, 84 op/s wr
To verify that you are executing within the context of a pod. Display the
content of the */proc/self/cgroup* control group file. The *kubepods* output
in the cgroup file shows that you're executing in a docker container of a pod.
::
[ceph-mon]# cat /proc/self/cgroup
11:hugetlb:/kubepods/besteffort/podafb3689c-8c5b-11e8-be6a-246e96290f14/ff6cbc58348a44722ee6a493845b9c2903fabdce80d0902d217cc4d6962d7b53
. . .
To exit the pod and resume the orchestration node's execution context.
::
[ceph-mon]# exit
[orchestration]#
To verify that you are executing on the orchestration node's context, display
the */proc/self/cgroup* control group file. You would not see the *kubepods*
docker container in the output.
::
[orchestration]# cat /proc/self/cgroup
11:blkio:/user.slice
10:freezer:/
9:hugetlb:/
. . .
It is also possible to run the ceph commands via the **kubectl exec**
without switching to a pod's container.
::
[orchestration]# kubectl exec ceph-mon-9mpnb -n ceph -- ceph status
cluster:
id: 07c31d0f-bcc6-4db4-aadf-2d2a0f13edb8
health: HEALTH_OK
. . .
Failure Domains
===============
A failure domain provides the fault isolation for the data and it corresponds
to a branch on the hierarchical topology. To protect against data loss, OSDs
that are allocated to PGs should be chosen from different failure
domains. Losing a branch takes down all the OSDs in that branch only and
OSDs in the other branches are not effected.
In a data center, baremetal hosts are typically installed in a
rack (refrigerator size cabinet). Multiple racks with hosts in each rack
are used to provision the OSDs running on each host. A rack is envisioned
as a branch in the CRUSH topology.
To provide data redundancy, ceph maintains multiple copies of the data. The
total number of copies to store for each piece of data is determined by the
ceph **osd_pool_default_size** ceph.conf parameter. With this parameter set
to 3, each piece of the data has 3 copies that gets stored in a pool. Each
copy is stored on different OSDs allocated from different failure domains.
Host
----
Choosing host as the failure domain lacks all the protections against
data loss.
To illustrate, a Ceph cluster has been provisioned with six hosts and four
OSDs on each host. The hosts are enclosed in respective racks where each
rack contains two hosts.
In the configuration of the Ceph cluster, without explicit instructions on
where the host and rack buckets should be placed, Ceph would create a
CRUSH map without the rack bucket. A CRUSH rule that get created uses
the host as the failure domain. With the size (replica) of a pool set
to 3, the OSDs in all the PGs are allocated from different hosts.
::
root=default
├── host1
│   ├── osd.1
│   ├── osd.2
│   ├── osd.3
│   └── osd.4
├── host2
│   ├── osd.5
│   ├── osd.6
│   ├── osd.7
│   └── osd.8
├── host3
│   ├── osd.9
│   ├── osd.10
│   ├── osd.11
│   └── osd.12
├── host4
│   ├── osd.13
│   ├── osd.14
│   ├── osd.15
│   └── osd.16
├── host5
│   ├── osd.17
│   ├── osd.18
│   ├── osd.19
│   └── osd.20
└── host6
├── osd.21
├── osd.22
├── osd.23
└── osd.24
On this ceph cluster, it has a CRUSH rule that uses the host as the
failure domain.
::
# ceph osd crush rule ls
replicated_host
# ceph osd crush rule dump replicated_host
{
"rule_id": 0,
"rule_name": "replicated_host",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host" },
{
"op": "emit"
}
]
}
Verify the CRUSH rule that is assigned to the ceph pool. In this
example, the rbd pool is used.
::
# ceph osd pool get rbd crush_rule
crush_rule: replicated_host
# ceph osd pool get rbd size
size: 3
# ceph osd pool get rbd pg_num
pg_num: 1024
To verify that the OSDs in all the PGs are allocated from different
hosts, invoke the **utils-checkPGs.py** utility on the ceph pool. The offending
PGs are printed to stdout.
::
# /tmp/utils-checkPGs.py rbd
Checking PGs in pool rbd ... Passed
With host as the failure domain, quite possibly, some of the PGs might
have OSDs allocated from different hosts that are located in the same
rack. For example, one PG might have OSD numbers [1, 8, 13]. OSDs 1 and 8
are found on hosts located in rack1. When rack1 suffers a catastrophe
failure, PGs with OSDs allocated from the hosts in rack1 would be severely
degraded.
Rack
----
Choosing rack as the failure domain provides better protection against data
loss.
To prevent PGs with OSDs allocated from hosts that are located in the same
rack, configure the CRUSH hierarchy with the rack buckets. In each rack
bucket, it contains the hosts that reside in the same physical rack. A
CRUSH Rule is configured with rack as the failure domain.
In the following hierarchical topology, the Ceph cluster was configured with
three rack buckets. Each bucket has two hosts. In pools that were created
with the CRUSH rule set to rack, the OSDs in all the PGs are allocated from
the distinct rack.
::
root=default
├── rack1
│   ├── host1
│   │   ├── osd.1
│   │   ├── osd.2
│   │   ├── osd.3
│   │   └── osd.4
│   └── host2
│   ├── osd.5
│   ├── osd.6
│   ├── osd.7
│   └── osd.8
├── rack2
│   ├── host3
│   │   ├── osd.9
│   │   ├── osd.10
│   │   ├── osd.11
│   │   └── osd.12
│   └── host4
│   ├── osd.13
│   ├── osd.14
│   ├── osd.15
│   └── osd.16
└── rack3
├── host5
│   ├── osd.17
│   ├── osd.18
│   ├── osd.19
│   └── osd.20
└── host6
├── osd.21
├── osd.22
├── osd.23
└── osd.24
Verify the Ceph cluster has a CRUSH rule with rack as the failure domain.
::
# ceph osd crush rule ls
rack_replicated_rule
# ceph osd crush rule dump rack_replicated_rule
{
"rule_id": 2,
"rule_name": "rack_replicated_rule",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "rack"
},
{
"op": "emit"
}
]
}
Create a ceph pool with its CRUSH rule set to the rack's rule.
::
# ceph osd pool create rbd 2048 2048 replicated rack_replicated_rule
pool 'rbd' created
# ceph osd pool get rbd crush_rule
crush_rule: rack_replicated_rule
# ceph osd pool get rbd size
size: 3
# ceph osd pool get rbd pg_num
pg_num: 2048
Invoke the **utils-checkPGs.py** script on the pool to verify that there are no PGs
with OSDs allocated from the same rack. The offending PGs are printed to
stdout.
::
# /tmp/utils-checkPGs.py rbd
Checking PGs in pool rbd ... Passed
CRUSH Map and Rule
==================
On a properly configured Ceph cluster, there are different ways to view
the CRUSH hierarchy.
ceph CLI
--------
Print to stdout the CRUSH hierarchy with the ceph CLI.
::
root@host5:/# ceph osd crush tree
ID CLASS WEIGHT TYPE NAME
-1 78.47974 root default
-15 26.15991 rack rack1
-2 13.07996 host host1
0 hdd 1.09000 osd.0
1 hdd 1.09000 osd.1
2 hdd 1.09000 osd.2
3 hdd 1.09000 osd.3
4 hdd 1.09000 osd.4
5 hdd 1.09000 osd.5
6 hdd 1.09000 osd.6
7 hdd 1.09000 osd.7
8 hdd 1.09000 osd.8
9 hdd 1.09000 osd.9
10 hdd 1.09000 osd.10
11 hdd 1.09000 osd.11
-5 13.07996 host host2
12 hdd 1.09000 osd.12
13 hdd 1.09000 osd.13
14 hdd 1.09000 osd.14
15 hdd 1.09000 osd.15
16 hdd 1.09000 osd.16
17 hdd 1.09000 osd.17
18 hdd 1.09000 osd.18
19 hdd 1.09000 osd.19
20 hdd 1.09000 osd.20
21 hdd 1.09000 osd.21
22 hdd 1.09000 osd.22
23 hdd 1.09000 osd.23
-16 26.15991 rack rack2
-13 13.07996 host host3
53 hdd 1.09000 osd.53
54 hdd 1.09000 osd.54
58 hdd 1.09000 osd.58
59 hdd 1.09000 osd.59
64 hdd 1.09000 osd.64
65 hdd 1.09000 osd.65
66 hdd 1.09000 osd.66
67 hdd 1.09000 osd.67
68 hdd 1.09000 osd.68
69 hdd 1.09000 osd.69
70 hdd 1.09000 osd.70
71 hdd 1.09000 osd.71
-9 13.07996 host host4
36 hdd 1.09000 osd.36
37 hdd 1.09000 osd.37
38 hdd 1.09000 osd.38
39 hdd 1.09000 osd.39
40 hdd 1.09000 osd.40
41 hdd 1.09000 osd.41
42 hdd 1.09000 osd.42
43 hdd 1.09000 osd.43
44 hdd 1.09000 osd.44
45 hdd 1.09000 osd.45
46 hdd 1.09000 osd.46
47 hdd 1.09000 osd.47
-17 26.15991 rack rack3
-11 13.07996 host host5
48 hdd 1.09000 osd.48
49 hdd 1.09000 osd.49
50 hdd 1.09000 osd.50
51 hdd 1.09000 osd.51
52 hdd 1.09000 osd.52
55 hdd 1.09000 osd.55
56 hdd 1.09000 osd.56
57 hdd 1.09000 osd.57
60 hdd 1.09000 osd.60
61 hdd 1.09000 osd.61
62 hdd 1.09000 osd.62
63 hdd 1.09000 osd.63
-7 13.07996 host host6
24 hdd 1.09000 osd.24
25 hdd 1.09000 osd.25
26 hdd 1.09000 osd.26
27 hdd 1.09000 osd.27
28 hdd 1.09000 osd.28
29 hdd 1.09000 osd.29
30 hdd 1.09000 osd.30
31 hdd 1.09000 osd.31
32 hdd 1.09000 osd.32
33 hdd 1.09000 osd.33
34 hdd 1.09000 osd.34
35 hdd 1.09000 osd.35
root@host5:/#
To see weight and affinity of each OSD.
::
root@host5:/# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 78.47974 root default
-15 26.15991 rack rack1
-2 13.07996 host host1
0 hdd 1.09000 osd.0 up 1.00000 1.00000
1 hdd 1.09000 osd.1 up 1.00000 1.00000
2 hdd 1.09000 osd.2 up 1.00000 1.00000
3 hdd 1.09000 osd.3 up 1.00000 1.00000
4 hdd 1.09000 osd.4 up 1.00000 1.00000
5 hdd 1.09000 osd.5 up 1.00000 1.00000
6 hdd 1.09000 osd.6 up 1.00000 1.00000
7 hdd 1.09000 osd.7 up 1.00000 1.00000
8 hdd 1.09000 osd.8 up 1.00000 1.00000
9 hdd 1.09000 osd.9 up 1.00000 1.00000
10 hdd 1.09000 osd.10 up 1.00000 1.00000
11 hdd 1.09000 osd.11 up 1.00000 1.00000
-5 13.07996 host host2
12 hdd 1.09000 osd.12 up 1.00000 1.00000
13 hdd 1.09000 osd.13 up 1.00000 1.00000
14 hdd 1.09000 osd.14 up 1.00000 1.00000
15 hdd 1.09000 osd.15 up 1.00000 1.00000
16 hdd 1.09000 osd.16 up 1.00000 1.00000
17 hdd 1.09000 osd.17 up 1.00000 1.00000
18 hdd 1.09000 osd.18 up 1.00000 1.00000
19 hdd 1.09000 osd.19 up 1.00000 1.00000
20 hdd 1.09000 osd.20 up 1.00000 1.00000
21 hdd 1.09000 osd.21 up 1.00000 1.00000
22 hdd 1.09000 osd.22 up 1.00000 1.00000
23 hdd 1.09000 osd.23 up 1.00000 1.00000
crushtool CLI
-------------
To extract the CRUSH Map from a running cluster and convert it into ascii text.
::
# ceph osd getcrushmap -o /tmp/cm.bin
100
# crushtool -d /tmp/cm.bin -o /tmp/cm.rack.ascii
# cat /tmp/cm.rack.ascii
. . .
# buckets
host host1 {
id -2 # do not change unnecessarily
id -3 class hdd # do not change unnecessarily
# weight 13.080
alg straw2
hash 0 # rjenkins1
item osd.0 weight 1.090
item osd.1 weight 1.090
item osd.2 weight 1.090
item osd.3 weight 1.090
item osd.4 weight 1.090
item osd.5 weight 1.090
item osd.6 weight 1.090
item osd.7 weight 1.090
item osd.8 weight 1.090
item osd.9 weight 1.090
item osd.10 weight 1.090
item osd.11 weight 1.090
}
host host2 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 13.080
alg straw2
hash 0 # rjenkins1
item osd.12 weight 1.090
item osd.13 weight 1.090
item osd.14 weight 1.090
item osd.15 weight 1.090
item osd.16 weight 1.090
item osd.18 weight 1.090
item osd.19 weight 1.090
item osd.17 weight 1.090
item osd.20 weight 1.090
item osd.21 weight 1.090
item osd.22 weight 1.090
item osd.23 weight 1.090
}
rack rack1 {
id -15 # do not change unnecessarily
id -20 class hdd # do not change unnecessarily
# weight 26.160
alg straw2
hash 0 # rjenkins1
item host1 weight 13.080
item host2 weight 13.080
}
. . .
root default {
id -1 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 78.480
alg straw2
hash 0 # rjenkins1
item rack1 weight 26.160
item rack2 weight 26.160
item rack3 weight 26.160
}
# rules
rule replicated_rack {
id 2
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type rack
step emit
}
# end crush map
The **utils-checkPGs.py** script can read the same data from memory and construct
the failure domains with OSDs. Verify the OSDs in each PG against the
constructed failure domains.
Configure the Failure Domain in CRUSH Map
=========================================
The Ceph ceph-osd, ceph-client and cinder charts accept configuration parameters to set the Failure Domain for CRUSH.
The options available are **failure_domain**, **failure_domain_by_hostname**, **failure_domain_name** and **crush_rule**
::
ceph-osd specific overrides
failure_domain: Set the CRUSH bucket type for your OSD to reside in. (DEFAULT: "host")
failure_domain_by_hostname: Specify the portion of the hostname to use for your failure domain bucket name. (DEFAULT: "false")
failure_domain_name: Manually name the failure domain bucket name. This configuration option should only be used when using host based overrides. (DEFAULT: "false")
::
ceph-client and cinder specific overrides
crush_rule**: Set the crush rule for a pool (DEFAULT: "replicated_rule")
An example of a lab enviroment had the following paramters set for the ceph yaml override file to apply a rack level failure domain within CRUSH.
::
endpoints:
identity:
namespace: openstack
object_store:
namespace: ceph
ceph_mon:
namespace: ceph
network:
public: 10.0.0.0/24
cluster: 10.0.0.0/24
deployment:
storage_secrets: true
ceph: true
csi_rbd_provisioner: true
rbd_provisioner: false
cephfs_provisioner: false
client_secrets: false
rgw_keystone_user_and_endpoints: false
bootstrap:
enabled: true
conf:
ceph:
global:
fsid: 6c12a986-148d-45a7-9120-0cf0522ca5e0
rgw_ks:
enabled: true
pool:
default:
crush_rule: rack_replicated_rule
crush:
tunables: null
target:
# NOTE(portdirect): 5 nodes, with one osd per node
osd: 18
pg_per_osd: 100
storage:
osd:
- data:
type: block-logical
location: /dev/vdb
journal:
type: block-logical
location: /dev/vde1
- data:
type: block-logical
location: /dev/vdc
journal:
type: block-logical
location: /dev/vde2
- data:
type: block-logical
location: /dev/vdd
journal:
type: block-logical
location: /dev/vde3
overrides:
ceph_osd:
hosts:
- name: osh-1
conf:
storage:
failure_domain: "rack"
failure_domain_name: "rack1"
- name: osh-2
conf:
storage:
failure_domain: "rack"
failure_domain_name: "rack1"
- name: osh-3
conf:
storage:
failure_domain: "rack"
failure_domain_name: "rack2"
- name: osh-4
conf:
storage:
failure_domain: "rack"
failure_domain_name: "rack2"
- name: osh-5
conf:
storage:
failure_domain: "rack"
failure_domain_name: "rack3"
- name: osh-6
conf:
storage:
failure_domain: "rack"
failure_domain_name: "rack3"
.. NOTE::
Note that the cinder chart will need an override configured to ensure the cinder pools in Ceph are using the correct **crush_rule**.
::
pod:
replicas:
api: 2
volume: 1
scheduler: 1
backup: 1
conf:
cinder:
DEFAULT:
backup_driver: cinder.backup.drivers.swift
ceph:
pools:
backup:
replicated: 3
crush_rule: rack_replicated_rule
chunk_size: 8
volume:
replicated: 3
crush_rule: rack_replicated_rule
chunk_size: 8
The charts can be updated with these overrides pre or post deployment. If this is a post deployment change then the following steps will apply for a gate based openstack-helm deployment.
::
cd /opt/openstack-helm
helm upgrade --install ceph-osd ../openstack-helm-infra/ceph-osd --namespace=ceph --values=/tmp/ceph.yaml
kubectl delete jobs/ceph-rbd-pool -n ceph
helm upgrade --install ceph-client ../openstack-helm-infra/ceph-client --namespace=ceph --values=/tmp/ceph.yaml
helm delete cinder --purge
helm upgrade --install cinder ./cinder --namespace=openstack --values=/tmp/cinder.yaml
.. NOTE::
There will be a brief interuption of I/O and a data movement of placement groups in Ceph while these changes are
applied. The data movement operation can take several minutes to several days to complete.
The utils-checkPGs.py Script
============================
The purpose of the **utils-checkPGs.py** script is to check whether a PG has OSDs
allocated from the same failure domain. The violating PGs with their
respective OSDs are printed to the stdout.
In this example, a pool was created with the CRUSH rule set to the host
failure domain. The ceph cluster was configured with the rack
buckets. The CRUSH algorithm allocated the OSDs from different hosts
in each PG. The rack buckets were ignored and thus the duplicate
racks which get reported by the script.
::
root@host5:/# /tmp/utils-checkPGs.py cmTestPool
Checking PGs in pool cmTestPool ... Failed
OSDs [44, 32, 53] in PG 20.a failed check in rack [u'rack2', u'rack2', u'rack2']
OSDs [61, 5, 12] in PG 20.19 failed check in rack [u'rack1', u'rack1', u'rack1']
OSDs [69, 9, 15] in PG 20.2a failed check in rack [u'rack1', u'rack1', u'rack1']
. . .
.. NOTE::
The **utils-checkPGs.py** utility is executed on-demand. It is intended to be executed on one of the ceph-mon pods.
If the **utils-checkPGs.py** script did not find any violation, it prints
Passed. In this example, the ceph cluster was configured with the rack
buckets. The rbd pool was created with its CRUSH rule set to the
rack. The **utils-checkPGs.py** script did not find duplicate racks in PGs.
::
root@host5:/# /tmp/utils-checkPGs.py rbd
Checking PGs in pool rbd ... Passed
Invoke the **utils-checkPGs.py** script with the --help option to get the
script's usage.
::
root@host5:/# /tmp/utils-checkPGs.py --help
usage: utils-checkPGs.py [-h] PoolName [PoolName ...]
Cross-check the OSDs assigned to the Placement Groups (PGs) of a ceph pool
with the CRUSH topology. The cross-check compares the OSDs in a PG and
verifies the OSDs reside in separate failure domains. PGs with OSDs in
the same failure domain are flagged as violation. The offending PGs are
printed to stdout.
This CLI is executed on-demand on a ceph-mon pod. To invoke the CLI, you
can specify one pool or list of pools to check. The special pool name
All (or all) checks all the pools in the ceph cluster.
positional arguments:
PoolName List of pools (or All) to validate the PGs and OSDs mapping
optional arguments:
-h, --help show this help message and exit
root@host5:/#
The source for the **utils-checkPGs.py** script is available
at **openstack-helm/ceph-mon/templates/bin/utils/_checkPGs.py.tpl**.
Ceph Deployments
================
Through testing and verification, you derive at a CRUSH Map with the buckets
that are deemed beneficial to your ceph cluster. Standardize on the verified
CRUSH map to have the consistency in all the Ceph deployments across the
data centers.
Mimicking the hierarchy in your CRUSH Map with the physical hardware setup
should provide the needed information on the topology layout. With the
racks layout, each rack can store a replica of your data.
To validate a ceph cluster with the number of replica that is based on
the number of racks:
#. The number of physical racks and the number of replicas are 3, respectively. Create a ceph pool with replica set to 3 and pg_num set to (# of OSDs * 50) / 3 and round the number to the next power-of-2. For example, if the calculation is 240, round it to 256. Assuming the pool you just created had 256 PGs. In each PG, verify the OSDs are chosen from the three racks, respectively. Use the **utils-checkPGs.py** script to verify the OSDs in all the PGs of the pool.
#. The number of physical racks is 2 and the number of replica is 3. Create a ceph pool as described in the previous step. In the pool you created, in each PG, verify two of the OSDs are chosen from the two racks, respectively. The third OSD can come from one of the two racks but not from the same hosts as the other two OSDs.
Data Movement
=============
Changes to the CRUSH Map always trigger data movement. It is prudent that
you plan accordingly when restructuring the CRUSH Map. Once started, the
CRUSH Map restructuring runs to completion and can neither be stopped nor
suspended. On a busy Ceph cluster with live transactions, it is always
safer to use divide-and-conquer approach to complete small chunk of works
in multiple sessions.
Watch the progress of the data movement while the Ceph cluster re-balances
itself.
::
# watch ceph status
cluster:
id: 07c31d0f-bcc6-4db4-aadf-2d2a0f13edb8
health: HEALTH_WARN
137084/325509 objects misplaced (42.114%)
Degraded data redundancy: 28/325509 objects degraded (0.009%), 15 pgs degraded
services:
mon: 5 daemons, quorum host1,host2,host3,host4,host5
mgr: host6(active), standbys: host1
mds: cephfs-1/1/1 up {0=mds-ceph-mds-7cb4f57cc-prh87=up:active}, 1 up:standby
osd: 72 osds: 72 up, 72 in; 815 remapped pgs
rgw: 2 daemons active
data:
pools: 19 pools, 2920 pgs
objects: 105k objects, 408 GB
usage: 1609 GB used, 78819 GB / 80428 GB avail
pgs: 28/325509 objects degraded (0.009%)
137084/325509 objects misplaced (42.114%)
2085 active+clean
790 active+remapped+backfill_wait
22 active+remapped+backfilling
15 active+recovery_wait+degraded
4 active+recovery_wait+remapped
4 active+recovery_wait
io:
client: 11934 B/s rd, 3731 MB/s wr, 2 op/s rd, 228 kop/s wr
recovery: 636 MB/s, 163 objects/s
At the time this **ceph status** command was executed, the status's output
showed that the ceph cluster was going through re-balancing. Among the
overall 2920 pgs, 2085 of them are in **active+clean** state. The
remaining pgs are either being remapped or recovered. As the ceph
cluster continues its re-balance, the number of pgs
in **active+clean** increases.
::
# ceph status
cluster:
id: 07c31d0f-bcc6-4db4-aadf-2d2a0f13edb8
health: HEALTH_OK
services:
mon: 5 daemons, quorum host1,host2,host3,host4,host5
mgr: host6(active), standbys: host1
mds: cephfs-1/1/1 up {0=mds-ceph-mds-7cc55c9695-lj22d=up:active}, 1 up:standby
osd: 72 osds: 72 up, 72 in
rgw: 2 daemons active
data:
pools: 19 pools, 2920 pgs
objects: 134k objects, 519 GB
usage: 1933 GB used, 78494 GB / 80428 GB avail
pgs: 2920 active+clean
io:
client: 1179 B/s rd, 971 kB/s wr, 1 op/s rd, 41 op/s wr
When the overall number of pgs is equal to the number
of **active+clean** pgs, the health of the ceph cluster changes
to **HEALTH_OK** (assuming there are no other warning conditions).

@ -10,3 +10,6 @@ Ceph Resiliency
osd-failure
disk-failure
host-failure
failure-domain
validate-object-replication
namespace-deletion

@ -0,0 +1,222 @@
===============================
3. Namespace deletion recovery
===============================
This document captures steps to bring Ceph back up after deleting it's associated namespace.
3.1 Setup
==========
.. note::
Follow OSH single node or multinode guide to bring up OSH envronment.
3.2 Setup the OSH environment and check ceph cluster health
=============================================================
.. note::
Ensure a healthy ceph cluster is running.
.. code-block:: console
kubectl exec -n ceph ceph-mon-dtw6m -- ceph -s
cluster:
id: fbaf9ce8-5408-4fce-9bfe-bf7fb938474c
health: HEALTH_OK
services:
mon: 5 daemons, quorum osh-1,osh-2,osh-5,osh-4,osh-3
mgr: osh-3(active), standbys: osh-4
mds: cephfs-1/1/1 up {0=mds-ceph-mds-77dc68f476-jb5th=up:active}, 1 up:standby
osd: 15 osds: 15 up, 15 in
data:
pools: 18 pools, 182 pgs
objects: 21 objects, 2246 bytes
usage: 3025 MB used, 1496 GB / 1499 GB avail
pgs: 182 active+clean
- Ceph cluster is in HEALTH_OK state with 5 MONs and 15 OSDs.
3.3 Delete Ceph namespace
==========================
.. note::
Removing the namespace will delete all pods and secrets associated to Ceph.
!! DO NOT PROCEED WITH DELETING THE CEPH NAMESPACES ON A PRODUCTION ENVIRONMENT !!
.. code-block:: console
CEPH_NAMESPACE="ceph"
MON_POD=$(kubectl get pods --namespace=${CEPH_NAMESPACE} \
--selector="application=ceph" --selector="component=mon" \
--no-headers | awk '{ print $1; exit }')
kubectl exec --namespace=${CEPH_NAMESPACE} ${MON_POD} -- ceph status \
| awk '/id:/{print $2}' | tee /tmp/ceph-fs-uuid.txt
.. code-block:: console
kubectl delete namespace ${CEPH_NAMESPACE}
.. code-block:: console
kubectl get pods --namespace ${CEPH_NAMESPACE} -o wide
No resources found.
kubectl get secrets --namespace ${CEPH_NAMESPACE}
No resources found.
- Ceph namespace is currently deleted and all associated resources will be not found.
3.4 Reinstall Ceph charts
==========================
.. note::
Instructions are specific to a multinode environment.
For AIO environments follow the development guide for reinstalling Ceph.
.. code-block:: console
helm delete --purge ceph-openstack-config
for chart in $(helm list --namespace ${CEPH_NAMESPACE} | awk '/ceph-/{print $1}'); do
helm delete ${chart} --purge;
done
.. note::
It will be normal not to see all PODs come back online during a reinstall.
Only the ceph-mon helm chart is required.
.. code-block:: console
cd /opt/openstack-helm-infra/
./tools/deployment/multinode/030-ceph.sh
3.5 Disable CephX authentication
=================================
.. note::
Wait until MON pods are running before proceeding here.
.. code-block:: console
mkdir -p /tmp/ceph/ceph-templates /tmp/ceph/extracted-keys
kubectl get -n ${CEPH_NAMESPACE} configmaps ceph-mon-etc -o=jsonpath='{.data.ceph\.conf}' > /tmp/ceph/ceph-mon.conf
sed '/\[global\]/a auth_client_required = none' /tmp/ceph/ceph-mon.conf | \
sed '/\[global\]/a auth_service_required = none' | \
sed '/\[global\]/a auth_cluster_required = none' > /tmp/ceph/ceph-mon-noauth.conf
kubectl --namespace ${CEPH_NAMESPACE} delete configmap ceph-mon-etc
kubectl --namespace ${CEPH_NAMESPACE} create configmap ceph-mon-etc --from-file=ceph.conf=/tmp/ceph/ceph-mon-noauth.conf
kubectl delete pod --namespace ${CEPH_NAMESPACE} -l application=ceph,component=mon
.. note::
Wait until the MON pods are running before proceeding here.
.. code-block:: console
MON_POD=$(kubectl get pods --namespace=${CEPH_NAMESPACE} \
--selector="application=ceph" --selector="component=mon" \
--no-headers | awk '{ print $1; exit }')
kubectl exec --namespace=${CEPH_NAMESPACE} ${MON_POD} -- ceph status
- The Ceph cluster will not be healthy and in a HEALTH_WARN or HEALTH_ERR state.
3.6 Replace key secrets with ones extracted from a Ceph MON
============================================================
.. code-block:: console
tee /tmp/ceph/ceph-templates/mon <<EOF
[mon.]
key = $(kubectl --namespace ${CEPH_NAMESPACE} exec ${MON_POD} -- bash -c "ceph-authtool -l \"/var/lib/ceph/mon/ceph-\$(hostname)/keyring\"" | awk '/key =/ {print $NF}')
caps mon = "allow *"
EOF
for KEY in mds osd rgw; do
tee /tmp/ceph/ceph-templates/${KEY} <<EOF
[client.bootstrap-${KEY}]
key = $(kubectl --namespace ${CEPH_NAMESPACE} exec ${MON_POD} -- ceph auth get-key client.bootstrap-${KEY})
caps mon = "allow profile bootstrap-${KEY}"
EOF
done
tee /tmp/ceph/ceph-templates/admin <<EOF
[client.admin]
key = $(kubectl --namespace ${CEPH_NAMESPACE} exec ${MON_POD} -- ceph auth get-key client.admin)
auid = 0
caps mds = "allow"
caps mon = "allow *"
caps osd = "allow *"
caps mgr = "allow *"
EOF
.. code-block:: console
tee /tmp/ceph/ceph-key-relationships <<EOF
mon ceph-mon-keyring ceph.mon.keyring mon.
mds ceph-bootstrap-mds-keyring ceph.keyring client.bootstrap-mds
osd ceph-bootstrap-osd-keyring ceph.keyring client.bootstrap-osd
rgw ceph-bootstrap-rgw-keyring ceph.keyring client.bootstrap-rgw
admin ceph-client-admin-keyring ceph.client.admin.keyring client.admin
EOF
.. code-block:: console
while read CEPH_KEY_RELATIONS; do
KEY_RELATIONS=($(echo ${CEPH_KEY_RELATIONS}))
COMPONENT=${KEY_RELATIONS[0]}
KUBE_SECRET_NAME=${KEY_RELATIONS[1]}
KUBE_SECRET_DATA_KEY=${KEY_RELATIONS[2]}
KEYRING_NAME=${KEY_RELATIONS[3]}
DATA_PATCH=$(cat /tmp/ceph/ceph-templates/${COMPONENT} | envsubst | base64 -w0)
kubectl --namespace ${CEPH_NAMESPACE} patch secret ${KUBE_SECRET_NAME} -p "{\"data\":{\"${KUBE_SECRET_DATA_KEY}\": \"${DATA_PATCH}\"}}"
done < /tmp/ceph/ceph-key-relationships
3.7 Re-enable CephX Authentication
===================================
.. code-block:: console
kubectl --namespace ${CEPH_NAMESPACE} delete configmap ceph-mon-etc
kubectl --namespace ${CEPH_NAMESPACE} create configmap ceph-mon-etc --from-file=ceph.conf=/tmp/ceph/ceph-mon.conf
3.8 Reinstall Ceph charts
==========================
.. note::
Instructions are specific to a multinode environment.
For AIO environments follow the development guide for reinstalling Ceph.
.. code-block:: console
for chart in $(helm list --namespace ${CEPH_NAMESPACE} | awk '/ceph-/{print $1}'); do
helm delete ${chart} --purge;
done
.. code-block:: console
cd /opt/openstack-helm-infra/
./tools/deployment/multinode/030-ceph.sh
./tools/deployment/multinode/040-ceph-ns-activate.sh
.. code-block:: console
MON_POD=$(kubectl get pods --namespace=${CEPH_NAMESPACE} \
--selector="application=ceph" --selector="component=mon" \
--no-headers | awk '{ print $1; exit }')
kubectl exec --namespace=${CEPH_NAMESPACE} ${MON_POD} -- ceph status
.. note::
AIO environments will need the following command to repair MDS standby failures.
.. code-block:: console
kubectl exec --namespace=${CEPH_NAMESPACE} ${MON_POD} -- ceph fs set cephfs standby_count_wanted 0
- Ceph pods are now running and cluster is healthy (HEALTH_OK).

@ -0,0 +1,65 @@
===========================================
Ceph - Test object replication across hosts
===========================================
This document captures steps to validate object replcation is happening across
hosts or not .
Setup:
======
- Follow OSH single node or multinode guide to bring up OSH envronment.
Step 1: Setup the OSH environment and check ceph cluster health
=================================================================
.. note::
Make sure we have healthy ceph cluster running
``Ceph status:``
.. code-block:: console
ubuntu@mnode1:/opt/openstack-helm$ kubectl exec -n ceph ceph-mon-5qn68 -- ceph -s
cluster:
id: 54d9af7e-da6d-4980-9075-96bb145db65c
health: HEALTH_OK
services:
mon: 3 daemons, quorum mnode1,mnode2,mnode3
mgr: mnode2(active), standbys: mnode3
mds: cephfs-1/1/1 up {0=mds-ceph-mds-6f66956547-c25cx=up:active}, 1 up:standby
osd: 3 osds: 3 up, 3 in
rgw: 2 daemons active
data:
pools: 19 pools, 101 pgs
objects: 354 objects, 260 MB
usage: 77807 MB used, 70106 MB / 144 GB avail
pgs: 101 active+clean
io:
client: 48769 B/s wr, 0 op/s rd, 12 op/s wr
- Ceph cluster is in HEALTH_OK state with 3 MONs and 3 OSDs.
Step 2: Run validation script
=============================
.. note::
Exec into ceph mon pod and execute the validation script by giving pool name as
first argument, as shown below rbd is the pool name .
.. code-block:: console
ubuntu@mnode1:/opt/openstack-helm$ /tmp/checkObjectReplication.py rbd
Test object got replicated on these osds: [1, 0, 2]
Test object got replicated on these hosts: [u'mnode1', u'mnode2', u'mnode3']
Hosts hosting multiple copies of a placement groups are:[]
- If there are any objects replicated on same host then we will see them in the last
line of the script output

@ -0,0 +1,9 @@
Upgrade
=======
Contents:
.. toctree::
:maxdepth: 2
multiple-osd-releases

@ -0,0 +1,246 @@
================================================================
Ceph - upgrade monolithic ceph-osd chart to multiple ceph charts
================================================================
This document captures the steps to move from installed monolithic ceph-osd chart
to mutlitple ceph osd charts.
this work will bring flexibility on site update as we will have more control on osds.
Install single ceph-osd chart:
==============================
step 1: Setup:
==============
- Follow OSH single node or multinode guide to bring up OSH environment.
.. note::
we will have single ceph osd chart and here are the override values for ceph disks
osd:
- data:
type: block-logical
location: /dev/vdb
journal:
type: block-logical
location: /dev/vda1
- data:
type: block-logical
location: /dev/vdc
journal:
type: block-logical
location: /dev/vda2
Step 2: Setup the OSH environment and check ceph cluster health
=================================================================
.. note::
Make sure we have healthy ceph cluster running
``Ceph status:``
.. code-block:: console
ubuntu@k8smaster:/opt/openstack-helm$ kubectl exec -n ceph ceph-mon-5qn68 -- ceph -s
cluster:
id: 61a4e07f-8b4a-4c47-8fc7-a0e7345ac0b0
health: HEALTH_OK
services:
mon: 3 daemons, quorum k8smaster,k8sslave1,k8sslave2
mgr: k8sslave2(active), standbys: k8sslave1
mds: cephfs-1/1/1 up {0=mds-ceph-mds-5bf9fdfc6b-8nq4p=up:active}, 1 up:standby
osd: 6 osds: 6 up, 6 in
data:
pools: 18 pools, 186 pgs
objects: 377 objects, 1.2 GiB
usage: 4.2 GiB used, 116 GiB / 120 GiB avail
pgs: 186 active+clean
- Ceph cluster is in HEALTH_OK state with 3 MONs and 6 OSDs.
.. note::
Make sure we have single ceph osd chart only
``Helm status:``
.. code-block:: console
ubuntu@k8smaster:/opt/openstack-helm$ helm list | grep -i osd
ceph-osd 1 Tue Mar 26 03:21:07 2019 DEPLOYED ceph-osd-vdb-0.1.0
- single osd chart deployed sucessfully.
Upgrade to multiple ceph osd charts:
====================================
step 1: setup
=============
- create multiple ceph osd charts as per requirement
.. note::
copy ceph-osd folder to multiple ceph osd charts in openstack-helm-infra folder
.. code-block:: console
ubuntu@k8smaster:/opt/openstack-helm-infra$ cp -r ceph-osd ceph-osd-vdb
ubuntu@k8smaster:/opt/openstack-helm-infra$ cp -r ceph-osd ceph-osd-vdc
.. note::
make sure to correct chart name in each osd chart folder created above, need to
update it in Charts.yaml .
- create script to install multiple ceph osd charts
.. note::
create new installation scripts to reflect new ceph osd charts.
.. code-block:: console
ubuntu@k8smaster:/opt/openstack-helm$ cp ./tools/deployment/multinode/030-ceph.sh
./tools/deployment/multinode/030-ceph-osd-vdb.sh
ubuntu@k8smaster:/opt/openstack-helm$ cp ./tools/deployment/multinode/030-ceph.sh
./tools/deployment/multinode/030-ceph-osd-vdc.sh
.. note::
make sure to delete all other ceph charts from above scripts and have only new ceph osd chart.
and also have correct overrides as shown below.
example1: for CHART in ceph-osd-vdb; do
helm upgrade --install ${CHART} ${OSH_INFRA_PATH}/${CHART} \
--namespace=ceph \
--values=/tmp/ceph.yaml \
${OSH_EXTRA_HELM_ARGS} \
${OSH_EXTRA_HELM_ARGS_CEPH_DEPLOY}
osd:
- data:
type: block-logical
location: /dev/vdb
journal:
type: block-logical
location: /dev/vda1
example2: for CHART in ceph-osd-vdc; do
helm upgrade --install ${CHART} ${OSH_INFRA_PATH}/${CHART} \
--namespace=ceph \
--values=/tmp/ceph.yaml \
${OSH_EXTRA_HELM_ARGS} \
${OSH_EXTRA_HELM_ARGS_CEPH_DEPLOY}
osd:
- data:
type: block-logical
location: /dev/vdc
journal:
type: block-logical
location: /dev/vda2
step 2: Scale down applications using ceph pvc
===============================================
.. note::
Scale down all the applications who are using pvcs so that we will not
have any writes on ceph rbds .
.. code-block:: console
ubuntu@k8smaster:/opt/openstack-helm$ sudo kubectl scale statefulsets -n openstack
mariadb-server --replicas=0
ubuntu@k8smaster:/opt/openstack-helm$ sudo kubectl scale statefulsets -n openstack
rabbitmq-rabbitmq --replicas=0
- just gave one example but we need to do it for all the applications using pvcs
step 3: Setup ceph cluster flags to prevent rebalance
=====================================================
.. note::
setup few flags on ceph cluster to prevent rebalance during this process.
.. code-block:: console
ubuntu@k8smaster:/opt/openstack-helm$ kubectl exec -n ceph ceph-mon-5qn68 -- ceph osd set
noout
ubuntu@k8smaster:/opt/openstack-helm$ kubectl exec -n ceph ceph-mon-5qn68 -- ceph osd set
nobackfill
ubuntu@k8smaster:/opt/openstack-helm$ kubectl exec -n ceph ceph-mon-5qn68 -- ceph osd set
norecover
ubuntu@k8smaster:/opt/openstack-helm$ kubectl exec -n ceph ceph-mon-5qn68 -- ceph osd set
pause
step 4: Delete single ceph-osd chart
====================================
.. note::
Delete the single ceph-osd chart.
.. code-block:: console
ubuntu@k8smaster:/opt/openstack-helm$ helm delete --purge ceph-osd
step 5: install new ceph-osd charts
===================================
.. note::
Now we can install multiple ceph osd releases.
.. code-block:: console
ubuntu@k8smaster:/opt/openstack-helm$ ./tools/deployment/multinode/030-ceph-osd-vdb.sh
ubuntu@k8smaster:/opt/openstack-helm$ ./tools/deployment/multinode/030-ceph-osd-vdc.sh
ubuntu@k8smaster:/opt/openstack-helm# helm list | grep -i osd
ceph-osd-vdb 1 Tue Mar 26 03:21:07 2019 DEPLOYED ceph-osd-vdb-0.1.0
ceph-osd-vdc 1 Tue Mar 26 03:22:13 2019 DEPLOYED ceph-osd-vdc-0.1.0
- wait and check for healthy ceph cluster , if there are any issues need to sort out until we see
healthy ceph cluster.
step 6: Unset ceph cluster flags
================================
.. note::
unset the flags we set on the ceph cluster in above steps.
.. code-block:: console
ubuntu@k8smaster:/opt/openstack-helm$ kubectl exec -n ceph ceph-mon-5qn68 -- ceph osd unset
noout
ubuntu@k8smaster:/opt/openstack-helm$ kubectl exec -n ceph ceph-mon-5qn68 -- ceph osd unset
nobackfill
ubuntu@k8smaster:/opt/openstack-helm$ kubectl exec -n ceph ceph-mon-5qn68 -- ceph osd unset
norecover
ubuntu@k8smaster:/opt/openstack-helm$ kubectl exec -n ceph ceph-mon-5qn68 -- ceph osd unset
pause
step 7: Scale up the applications using pvc
===========================================
.. note::
Since ceph cluster is back to healthy status, now scale up the applications.
.. code-block:: console
ubuntu@k8smaster:/opt/openstack-helm$ sudo kubectl scale statefulsets -n openstack
mariadb-server --replicas=3
ubuntu@k8smaster:/opt/openstack-helm$ sudo kubectl scale statefulsets -n openstack
rabbitmq-rabbitmq --replicas=3