94 lines
4.1 KiB
Diff
Raw Normal View History

From b2c505a9b2a532974b5f69332bd87f03087d74a4 Mon Sep 17 00:00:00 2001
systemd: Prevent excessive /proc/1/mountinfo reparsing Backport the patches for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1819868 We met such an issue: When testing a large number of pods (> 230), occasionally observed a number of issues related to systemd process: systemd ran continually 90-100% cpu usage systemd memory usage started increasing rapidly (20GB/hour) systemctl commands would always timeout (Failed to get properties: Connection timed out) sm services failed and can't recover: open-ldap, registry-token-server, docker-distribution, etcd new pods can't start, and got stuck in state ContainerCreating Those patches work to prevent excessive /proc/1/mountinfo reparsing. It has been verified that those patches can improve this performance greatly. 16 commits are listed in sequence (from [1] to [16]) at below link for the issue: https://github.com/systemd-rhel/rhel-8/pull/154/commits [16](10)core: prevent excessive /proc/self/mountinfo parsing [15][Dropped-6]test: add ratelimiting test [14](9)sd-event: add ability to ratelimit event sources [13](8)sd-event: increase n_enabled_child_sources just once [12](7)sd-event: update state at the end in event_source_enable [11](6)sd-event: remove earliest_index/latest_index into common part of event source objects [10][Dropped-5]sd-event: follow coding style with naming return parameter [9] [Dropped-4]sd-event: ref event loop while in sd_event_prepare() ot sd_event_run() [8] (5)sd-event: refuse running default event loops in any other thread than the one they are default for [7] [Dropped-3]sd-event: let's suffix last_run/last_log with "_usec" [6] [Dropped-2]sd-event: fix delays assert brain-o (#17790) [5] (4)sd-event: split out code to add/remove timer event sources to earliest/latest prioq [4] (3)sd-event: split clock data allocation out of sd_event_add_time() [3] [Dropped-1]sd-event: mention that two debug logged events are ignored [2] (2)sd-event: split out enable and disable codepaths from sd_event_source_set_enabled() [1] (1)sd-event: split out helper functions for reshuffling prioqs I ported 10 of them back (from (1) to (10)) to fix this issue and dropped the other 6 (from [Dropped-1] to [Dropped-6]) for those reasons: [Dropped-1]Only changes error log. [Dropped-2]Fixes a bug introduced in a commit which doesn't exist in this version. [Dropped-3]Only changes vars' names and there is no functional change. [Dropped-4]More commits are needed for merging it, while I don't see any help on adding the rate-limiting ability. [Dropped-5]Change coding style for a function which isn't really used by anyone. [Dropped-6]Add test cases. Closes-Bug: #1924686 Signed-off-by: Li Zhou <li.zhou@windriver.com> Change-Id: Ia4c8f162cb1a47b40d1b26cf4d604976b97e92d6
2021-04-12 02:15:25 -04:00
From: Li Zhou <li.zhou@windriver.com>
Date: Wed, 21 Apr 2021 14:41:27 +0800
systemd: Prevent excessive /proc/1/mountinfo reparsing Backport the patches for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1819868 We met such an issue: When testing a large number of pods (> 230), occasionally observed a number of issues related to systemd process: systemd ran continually 90-100% cpu usage systemd memory usage started increasing rapidly (20GB/hour) systemctl commands would always timeout (Failed to get properties: Connection timed out) sm services failed and can't recover: open-ldap, registry-token-server, docker-distribution, etcd new pods can't start, and got stuck in state ContainerCreating Those patches work to prevent excessive /proc/1/mountinfo reparsing. It has been verified that those patches can improve this performance greatly. 16 commits are listed in sequence (from [1] to [16]) at below link for the issue: https://github.com/systemd-rhel/rhel-8/pull/154/commits [16](10)core: prevent excessive /proc/self/mountinfo parsing [15][Dropped-6]test: add ratelimiting test [14](9)sd-event: add ability to ratelimit event sources [13](8)sd-event: increase n_enabled_child_sources just once [12](7)sd-event: update state at the end in event_source_enable [11](6)sd-event: remove earliest_index/latest_index into common part of event source objects [10][Dropped-5]sd-event: follow coding style with naming return parameter [9] [Dropped-4]sd-event: ref event loop while in sd_event_prepare() ot sd_event_run() [8] (5)sd-event: refuse running default event loops in any other thread than the one they are default for [7] [Dropped-3]sd-event: let's suffix last_run/last_log with "_usec" [6] [Dropped-2]sd-event: fix delays assert brain-o (#17790) [5] (4)sd-event: split out code to add/remove timer event sources to earliest/latest prioq [4] (3)sd-event: split clock data allocation out of sd_event_add_time() [3] [Dropped-1]sd-event: mention that two debug logged events are ignored [2] (2)sd-event: split out enable and disable codepaths from sd_event_source_set_enabled() [1] (1)sd-event: split out helper functions for reshuffling prioqs I ported 10 of them back (from (1) to (10)) to fix this issue and dropped the other 6 (from [Dropped-1] to [Dropped-6]) for those reasons: [Dropped-1]Only changes error log. [Dropped-2]Fixes a bug introduced in a commit which doesn't exist in this version. [Dropped-3]Only changes vars' names and there is no functional change. [Dropped-4]More commits are needed for merging it, while I don't see any help on adding the rate-limiting ability. [Dropped-5]Change coding style for a function which isn't really used by anyone. [Dropped-6]Add test cases. Closes-Bug: #1924686 Signed-off-by: Li Zhou <li.zhou@windriver.com> Change-Id: Ia4c8f162cb1a47b40d1b26cf4d604976b97e92d6
2021-04-12 02:15:25 -04:00
Subject: [PATCH] Add STX patches
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
systemd: Prevent excessive /proc/1/mountinfo reparsing Backport the patches for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1819868 We met such an issue: When testing a large number of pods (> 230), occasionally observed a number of issues related to systemd process: systemd ran continually 90-100% cpu usage systemd memory usage started increasing rapidly (20GB/hour) systemctl commands would always timeout (Failed to get properties: Connection timed out) sm services failed and can't recover: open-ldap, registry-token-server, docker-distribution, etcd new pods can't start, and got stuck in state ContainerCreating Those patches work to prevent excessive /proc/1/mountinfo reparsing. It has been verified that those patches can improve this performance greatly. 16 commits are listed in sequence (from [1] to [16]) at below link for the issue: https://github.com/systemd-rhel/rhel-8/pull/154/commits [16](10)core: prevent excessive /proc/self/mountinfo parsing [15][Dropped-6]test: add ratelimiting test [14](9)sd-event: add ability to ratelimit event sources [13](8)sd-event: increase n_enabled_child_sources just once [12](7)sd-event: update state at the end in event_source_enable [11](6)sd-event: remove earliest_index/latest_index into common part of event source objects [10][Dropped-5]sd-event: follow coding style with naming return parameter [9] [Dropped-4]sd-event: ref event loop while in sd_event_prepare() ot sd_event_run() [8] (5)sd-event: refuse running default event loops in any other thread than the one they are default for [7] [Dropped-3]sd-event: let's suffix last_run/last_log with "_usec" [6] [Dropped-2]sd-event: fix delays assert brain-o (#17790) [5] (4)sd-event: split out code to add/remove timer event sources to earliest/latest prioq [4] (3)sd-event: split clock data allocation out of sd_event_add_time() [3] [Dropped-1]sd-event: mention that two debug logged events are ignored [2] (2)sd-event: split out enable and disable codepaths from sd_event_source_set_enabled() [1] (1)sd-event: split out helper functions for reshuffling prioqs I ported 10 of them back (from (1) to (10)) to fix this issue and dropped the other 6 (from [Dropped-1] to [Dropped-6]) for those reasons: [Dropped-1]Only changes error log. [Dropped-2]Fixes a bug introduced in a commit which doesn't exist in this version. [Dropped-3]Only changes vars' names and there is no functional change. [Dropped-4]More commits are needed for merging it, while I don't see any help on adding the rate-limiting ability. [Dropped-5]Change coding style for a function which isn't really used by anyone. [Dropped-6]Add test cases. Closes-Bug: #1924686 Signed-off-by: Li Zhou <li.zhou@windriver.com> Change-Id: Ia4c8f162cb1a47b40d1b26cf4d604976b97e92d6
2021-04-12 02:15:25 -04:00
Signed-off-by: Li Zhou <li.zhou@windriver.com>
---
SPECS/systemd.spec | 68 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)
diff --git a/SPECS/systemd.spec b/SPECS/systemd.spec
index 07e022e..a9b6fe0 100644
--- a/SPECS/systemd.spec
+++ b/SPECS/systemd.spec
@@ -884,6 +884,74 @@ Patch0842: 0842-core-don-t-update-unit-description-if-it-is-already-.patch
Patch0843: 0843-unit-don-t-emit-PropertiesChanged-signal-if-adding-a.patch
Patch0844: 0844-core-fix-unnecessary-fallback-to-the-rescue-mode-cau.patch
Patch0845: 0845-core-Detect-initial-timer-state-from-serialized-data.patch
+
+# STX Patches
+Patch0851: 851-inject-millisec-in-syslog-date.patch
+Patch0852: 852-fix-build-error-for-unused-variable.patch
+Patch0853: 853-Fix-compile-failure-due-to-deprecated-value.patch
systemd: Prevent excessive /proc/1/mountinfo reparsing Backport the patches for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1819868 We met such an issue: When testing a large number of pods (> 230), occasionally observed a number of issues related to systemd process: systemd ran continually 90-100% cpu usage systemd memory usage started increasing rapidly (20GB/hour) systemctl commands would always timeout (Failed to get properties: Connection timed out) sm services failed and can't recover: open-ldap, registry-token-server, docker-distribution, etcd new pods can't start, and got stuck in state ContainerCreating Those patches work to prevent excessive /proc/1/mountinfo reparsing. It has been verified that those patches can improve this performance greatly. 16 commits are listed in sequence (from [1] to [16]) at below link for the issue: https://github.com/systemd-rhel/rhel-8/pull/154/commits [16](10)core: prevent excessive /proc/self/mountinfo parsing [15][Dropped-6]test: add ratelimiting test [14](9)sd-event: add ability to ratelimit event sources [13](8)sd-event: increase n_enabled_child_sources just once [12](7)sd-event: update state at the end in event_source_enable [11](6)sd-event: remove earliest_index/latest_index into common part of event source objects [10][Dropped-5]sd-event: follow coding style with naming return parameter [9] [Dropped-4]sd-event: ref event loop while in sd_event_prepare() ot sd_event_run() [8] (5)sd-event: refuse running default event loops in any other thread than the one they are default for [7] [Dropped-3]sd-event: let's suffix last_run/last_log with "_usec" [6] [Dropped-2]sd-event: fix delays assert brain-o (#17790) [5] (4)sd-event: split out code to add/remove timer event sources to earliest/latest prioq [4] (3)sd-event: split clock data allocation out of sd_event_add_time() [3] [Dropped-1]sd-event: mention that two debug logged events are ignored [2] (2)sd-event: split out enable and disable codepaths from sd_event_source_set_enabled() [1] (1)sd-event: split out helper functions for reshuffling prioqs I ported 10 of them back (from (1) to (10)) to fix this issue and dropped the other 6 (from [Dropped-1] to [Dropped-6]) for those reasons: [Dropped-1]Only changes error log. [Dropped-2]Fixes a bug introduced in a commit which doesn't exist in this version. [Dropped-3]Only changes vars' names and there is no functional change. [Dropped-4]More commits are needed for merging it, while I don't see any help on adding the rate-limiting ability. [Dropped-5]Change coding style for a function which isn't really used by anyone. [Dropped-6]Add test cases. Closes-Bug: #1924686 Signed-off-by: Li Zhou <li.zhou@windriver.com> Change-Id: Ia4c8f162cb1a47b40d1b26cf4d604976b97e92d6
2021-04-12 02:15:25 -04:00
+
+# This cluster of patches relates to fixing redhat bug #1819868
+# "systemd excessively reads mountinfo and udev in dense container environments"
+
+# Below patches are added for merging patch (1)
+Patch0901: 901-sd-event-don-t-touch-fd-s-accross-forks.patch
+Patch0902: 902-sd-event-make-sure-RT-signals-are-not-dropped.patch
+# Patch (1) for solving #1819868
+Patch0903: 903-sd-event-split-out-helper-functions-for-reshuffling-.patch
+
+# Below patches are added for merging patch (2)
+Patch0904: 904-sd-event-drop-pending-events-when-we-turn-off-on-an-.patch
+Patch0905: 905-sd-event-fix-call-to-event_make_signal_data.patch
+Patch0906: 906-sd-event-make-sure-to-create-a-signal-queue-for-the-.patch
+# Patch (2) for solving #1819868
+Patch0907: 907-sd-event-split-out-enable-and-disable-codepaths-from.patch
+
+# Below patch is added for merging patch (3)
+Patch0908: 908-sd-event-use-prioq_ensure_allocated-where-possible.patch
+# Patch (3) for solving #1819868
+Patch0909: 909-sd-event-split-clock-data-allocation-out-of-sd_event.patch
+
+# Patch (4) for solving #1819868
+Patch0910: 910-sd-event-split-out-code-to-add-remove-timer-event-so.patch
+
+# Below patch is added for merging patch (5)
+Patch0911: 911-sd-event-rename-PASSIVE-PREPARED-to-INITIAL-ARMED.patch
+# Patch (5) for solving #1819868
+Patch0912: 912-sd-event-refuse-running-default-event-loops-in-any-o.patch
+
+# Patch (6) for solving #1819868
+Patch0913: 913-sd-event-remove-earliest_index-latest_index-into-com.patch
+
+# Patch (7) for solving #1819868
+Patch0914: 914-sd-event-update-state-at-the-end-in-event_source_ena.patch
+
+# Patch (8) for solving #1819868
+Patch0915: 915-sd-event-increase-n_enabled_child_sources-just-once.patch
+
+# Below patches are added for merging patch (9)
+Patch0916: 916-sd-event-don-t-provide-priority-stability.patch
+Patch0917: 917-sd-event-when-determining-the-last-allowed-time-a-ti.patch
+Patch0918: 918-sd-event-permit-a-USEC_INFINITY-timeout-as-an-altern.patch
+# Patch (9) for solving #1819868
+Patch0919: 919-sd-event-add-ability-to-ratelimit-event-sources.patch
+
+# Patch (10) for solving #1819868
+Patch0920: 920-core-prevent-excessive-proc-self-mountinfo-parsing.patch
+
+# This patch fixes build issues related to the above patches. Our goal is to keep
+# upstream patches as unmodified as possible to facilitate maintaining them, so instead
+# of individually changing them for compilation, we just have one patch at the end to do it.
+Patch0921: 921-systemd-Fix-compiling-errors-when-merging-1819868.patch
+
+# This cluster of patches relates to fixing redhat bug #1968528
+# "fix rate-limiting of mount events"
+Patch0922: 922-sd-event-change-ordering-of-pending-ratelimited-even.patch
+Patch0923: 923-sd-event-drop-unnecessary-else.patch
+Patch0924: 924-sd-event-use-CMP-macro.patch
+Patch0925: 925-sd-event-use-usec_add.patch
+Patch0926: 926-sd-event-make-event_source_time_prioq_reshuffle-acce.patch
+Patch0927: 927-sd-event-always-reshuffle-time-prioq-on-changing-onl.patch
+
Patch9999: 9999-Update-kernel-install-script-by-backporting-fedora-p.patch
%global num_patches %{lua: c=0; for i,p in ipairs(patches) do c=c+1; end; print(c);}
--
systemd: Prevent excessive /proc/1/mountinfo reparsing Backport the patches for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1819868 We met such an issue: When testing a large number of pods (> 230), occasionally observed a number of issues related to systemd process: systemd ran continually 90-100% cpu usage systemd memory usage started increasing rapidly (20GB/hour) systemctl commands would always timeout (Failed to get properties: Connection timed out) sm services failed and can't recover: open-ldap, registry-token-server, docker-distribution, etcd new pods can't start, and got stuck in state ContainerCreating Those patches work to prevent excessive /proc/1/mountinfo reparsing. It has been verified that those patches can improve this performance greatly. 16 commits are listed in sequence (from [1] to [16]) at below link for the issue: https://github.com/systemd-rhel/rhel-8/pull/154/commits [16](10)core: prevent excessive /proc/self/mountinfo parsing [15][Dropped-6]test: add ratelimiting test [14](9)sd-event: add ability to ratelimit event sources [13](8)sd-event: increase n_enabled_child_sources just once [12](7)sd-event: update state at the end in event_source_enable [11](6)sd-event: remove earliest_index/latest_index into common part of event source objects [10][Dropped-5]sd-event: follow coding style with naming return parameter [9] [Dropped-4]sd-event: ref event loop while in sd_event_prepare() ot sd_event_run() [8] (5)sd-event: refuse running default event loops in any other thread than the one they are default for [7] [Dropped-3]sd-event: let's suffix last_run/last_log with "_usec" [6] [Dropped-2]sd-event: fix delays assert brain-o (#17790) [5] (4)sd-event: split out code to add/remove timer event sources to earliest/latest prioq [4] (3)sd-event: split clock data allocation out of sd_event_add_time() [3] [Dropped-1]sd-event: mention that two debug logged events are ignored [2] (2)sd-event: split out enable and disable codepaths from sd_event_source_set_enabled() [1] (1)sd-event: split out helper functions for reshuffling prioqs I ported 10 of them back (from (1) to (10)) to fix this issue and dropped the other 6 (from [Dropped-1] to [Dropped-6]) for those reasons: [Dropped-1]Only changes error log. [Dropped-2]Fixes a bug introduced in a commit which doesn't exist in this version. [Dropped-3]Only changes vars' names and there is no functional change. [Dropped-4]More commits are needed for merging it, while I don't see any help on adding the rate-limiting ability. [Dropped-5]Change coding style for a function which isn't really used by anyone. [Dropped-6]Add test cases. Closes-Bug: #1924686 Signed-off-by: Li Zhou <li.zhou@windriver.com> Change-Id: Ia4c8f162cb1a47b40d1b26cf4d604976b97e92d6
2021-04-12 02:15:25 -04:00
2.17.1