From 79897566993c3244fe2504301ccdef8923180026 Mon Sep 17 00:00:00 2001 From: Michal Arbet Date: Fri, 20 Sep 2024 18:40:36 +0200 Subject: [PATCH] Improvement of ProxySQL Monitoring Configuration This update enhances the monitoring of the databasecluster in ProxySQL. The default monitoring intervals were insufficient for reliably detecting failures in the Galera cluster environment. A detailed configuration for monitoring intervals has been introduced, providing better control over how quickly and accurately ProxySQL can identify issues. - Variables such as `mariadb_monitor_connect_interval`, `mariadb_monitor_galera_healthcheck_interval, and `mariadb_monitor_ping_interval` significantly reduce the time between connection checks. - Timeouts like `mariadb_monitor_galera_healthcheck_timeout` and `mariadb_monitor_ping_timeout` allow faster failure detection, while `mariadb_monitor_galera_healthcheck_max_timeout_count` sets the maximum number of allowed timeouts before marking a node as down. Calculation: - Galera healthcheck: 4 seconds (interval) + 1 second (timeout) + 4 seconds (interval) + 1 second (timeout) = 10 seconds. - Ping healthcheck: 3 seconds (interval) + 2 seconds (timeout) + 3 seconds (interval) + 2 seconds (timeout) = 10 seconds. Both the health check and ping check mechanisms will detect a node failure within a maximum of 10 seconds. Both processes (health check and ping) operate independently, and failure in either mechanism will mark the node as failed. Health Check Failure Detection: Up to 10 seconds. Ping Failure Detection: Up to 10 seconds. Connect Attempts: ProxySQL also tries to connect every 2 seconds, which helps monitor connectivity. These changes ensure that ProxySQL can detect issues in 10 seconds as haproxy, significantly reducing downtime compared to default settings. This adjustment enables faster and more reliable monitoring, improving system stability and reducing potential downtime in production environments. Change-Id: Ic28801519cdb35ed2387a1468b9df661847a5476 --- ansible/group_vars/all.yml | 8 ++++++++ .../loadbalancer/templates/proxysql/proxysql.yaml.j2 | 7 +++++++ .../notes/proxysql-monitor-0adc0594c06c7362.yaml | 11 +++++++++++ 3 files changed, 26 insertions(+) create mode 100644 releasenotes/notes/proxysql-monitor-0adc0594c06c7362.yaml diff --git a/ansible/group_vars/all.yml b/ansible/group_vars/all.yml index 8458005656..8b2fa3c151 100644 --- a/ansible/group_vars/all.yml +++ b/ansible/group_vars/all.yml @@ -480,7 +480,15 @@ mariadb_wsrep_port: "4567" mariadb_ist_port: "4568" mariadb_sst_port: "4444" mariadb_clustercheck_port: "4569" + mariadb_monitor_user: "{{ 'monitor' if enable_proxysql | bool else 'haproxy' }}" +mariadb_monitor_connect_interval: "2000" +mariadb_monitor_galera_healthcheck_interval: "4000" +mariadb_monitor_galera_healthcheck_timeout: "1000" +mariadb_monitor_galera_healthcheck_max_timeout_count: "2" +mariadb_monitor_ping_interval: "3000" +mariadb_monitor_ping_timeout: "2000" +mariadb_monitor_ping_max_failures: "2" mariadb_datadir_volume: "mariadb" diff --git a/ansible/roles/loadbalancer/templates/proxysql/proxysql.yaml.j2 b/ansible/roles/loadbalancer/templates/proxysql/proxysql.yaml.j2 index e752007068..4d596056e7 100644 --- a/ansible/roles/loadbalancer/templates/proxysql/proxysql.yaml.j2 +++ b/ansible/roles/loadbalancer/templates/proxysql/proxysql.yaml.j2 @@ -22,6 +22,13 @@ mysql_variables: interfaces: "{{ kolla_internal_vip_address | put_address_in_context('url') }}:{{ database_port }}" monitor_username: "{{ mariadb_monitor_user }}" monitor_password: "{{ mariadb_monitor_password }}" + monitor_connect_interval: "{{ mariadb_monitor_connect_interval }}" + monitor_galera_healthcheck_interval: "{{ mariadb_monitor_galera_healthcheck_interval }}" + monitor_galera_healthcheck_timeout: "{{ mariadb_monitor_galera_healthcheck_timeout }}" + monitor_galera_healthcheck_max_timeout_count: "{{ mariadb_monitor_galera_healthcheck_max_timeout_count }}" + monitor_ping_interval: "{{ mariadb_monitor_ping_interval }}" + monitor_ping_timeout: "{{ mariadb_monitor_ping_timeout }}" + monitor_ping_max_failures: "{{ mariadb_monitor_ping_max_failures }}" mysql_servers: {% for shard_id, shard in mariadb_shards_info.shards.items() %} diff --git a/releasenotes/notes/proxysql-monitor-0adc0594c06c7362.yaml b/releasenotes/notes/proxysql-monitor-0adc0594c06c7362.yaml new file mode 100644 index 0000000000..0d42f863b6 --- /dev/null +++ b/releasenotes/notes/proxysql-monitor-0adc0594c06c7362.yaml @@ -0,0 +1,11 @@ +--- +features: + - | + Introduces new variables ``mariadb_monitor_connect_interval``, + ``mariadb_monitor_galera_healthcheck_interval``, + ``mariadb_monitor_galera_healthcheck_timeout``, + ``mariadb_monitor_galera_healthcheck_max_timeout_count``, + ``mariadb_monitor_ping_interval``, ``mariadb_monitor_ping_timeout``, + and ``mariadb_monitor_ping_max_failures``. + These allow faster detection of issues in Galera clusters, + reducing downtime to 10 seconds.