4 Commits

Author SHA1 Message Date
Li Zhou
bed1e46362 systemd: fix rate-limiting of mount events
Backport the patches for this issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1968528
It reports:
The fix for Bug 1819868 has introduced a new issue related to its
implementation of rate limiting.
Rate limiting the mount_event_source can cause unmount events to be
missed, which leads to mount unit cgroups being leaked (not cleaned up
when the mount is gone).

The fix for 1968528 can fix the issue we met:
During the reboot process of subclouds (either lock-unlock or sudo
reboot), unmounting failure messages repeat for a few hundred of times.

The patches are listed at:
https://github.com/redhat-plumbers/systemd-rhel8/pull/198/commits
And they are picked from https://github.com/systemd-rhel/rhel-8/ (branch
rhel-8.4.0).

Verification:
  In my test on an AIO-SX lab, the bug appears as:
  run "sudo reboot" on controller, endless unmounting failure logs
  printed.
  Verified that the problem was there during the shutdown
  phase of a reboot. Reinstalled with a fixed image, and verified that
  the issue was now gone by doing 5 reboots. Ran sanity on the lab,
  and verified no new issues seen.

Closes-Bug: #1948899
Signed-off-by: Li Zhou <li.zhou@windriver.com>
Change-Id: If95932ceead1bea973f2219d3a8d6b04cf0fd5f8
2021-10-28 23:29:07 -04:00
Li Zhou
4850ab86da systemd: Upgrade to version 219-78.el7_9.3
This fixes the issue of systemd sending tons of useless
PropertiesChanged messages when a mount happens as described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1793527

Depends-On: https://review.opendev.org/c/starlingx/tools/+/786601
Partial-Bug: #1924691
Signed-off-by: Li Zhou <li.zhou@windriver.com>
Change-Id: I3596303d77211a135e8559a05806395328725cde
2021-04-27 02:09:27 +00:00
Li Zhou
ccfeeef59d systemd: Prevent excessive /proc/1/mountinfo reparsing
Backport the patches for this issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1819868

We met such an issue:
When testing a large number of pods (> 230), occasionally observed a
number of issues related to systemd process:
    systemd ran continually 90-100% cpu usage
    systemd memory usage started increasing rapidly (20GB/hour)
    systemctl commands would always timeout (Failed to get properties:
        Connection timed out)
    sm services failed and can't recover: open-ldap,
        registry-token-server, docker-distribution, etcd
    new pods can't start, and got stuck in state ContainerCreating

Those patches work to prevent excessive /proc/1/mountinfo reparsing.
It has been verified that those patches can improve this performance
greatly.

16 commits are listed in sequence (from [1] to [16]) at below link
for the issue:
https://github.com/systemd-rhel/rhel-8/pull/154/commits

[16](10)core: prevent excessive /proc/self/mountinfo parsing
[15][Dropped-6]test: add ratelimiting test
[14](9)sd-event: add ability to ratelimit event sources
[13](8)sd-event: increase n_enabled_child_sources just once
[12](7)sd-event: update state at the end in event_source_enable
[11](6)sd-event: remove earliest_index/latest_index into common part of
event source objects
[10][Dropped-5]sd-event: follow coding style with naming return
parameter
[9] [Dropped-4]sd-event: ref event loop while in sd_event_prepare() ot
sd_event_run()
[8] (5)sd-event: refuse running default event loops in any other thread
than the one they are default for
[7] [Dropped-3]sd-event: let's suffix last_run/last_log with "_usec"
[6] [Dropped-2]sd-event: fix delays assert brain-o (#17790)
[5] (4)sd-event: split out code to add/remove timer event sources to
earliest/latest prioq
[4] (3)sd-event: split clock data allocation out of sd_event_add_time()
[3] [Dropped-1]sd-event: mention that two debug logged events are
ignored
[2] (2)sd-event: split out enable and disable codepaths from
sd_event_source_set_enabled()
[1] (1)sd-event: split out helper functions for reshuffling prioqs

I ported 10 of them back (from (1) to (10)) to fix this issue
and dropped the other 6 (from [Dropped-1] to [Dropped-6]) for those
reasons:
[Dropped-1]Only changes error log.
[Dropped-2]Fixes a bug introduced in a commit which doesn't exist in
this version.
[Dropped-3]Only changes vars' names and there is no functional change.
[Dropped-4]More commits are needed for merging it, while I don't see
any help on adding the rate-limiting ability.
[Dropped-5]Change coding style for a function which isn't really used
by anyone.
[Dropped-6]Add test cases.

Closes-Bug: #1924686
Signed-off-by: Li Zhou <li.zhou@windriver.com>
Change-Id: Ia4c8f162cb1a47b40d1b26cf4d604976b97e92d6
2021-04-22 22:09:33 -04:00
Jim Somerville
0231aba5cd Uprev systemd to version 219-67.el7
This solves:
systemd: line splitting via fgets() allows for state injection
during daemon-reexec (CVE-2018-15686)

along with some other less critical issues.  See the security
announcement link:

https://lists.centos.org/pipermail/centos-cr-announce/2019-August/006149.html

for more details.

Here we rebase the patches, and fix the atrocious crime of "name of patch file
doesn't match what git format-patch generates".  We also squash down the
meta patches which add the patches to the spec file as part of
good housekeeping.

Change-Id: I01a3fa329bbad541a063cb604d1756892139967f
Closes-Bug: 1849200
Depends-On: https://review.opendev.org/#/c/695560
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
2019-11-21 16:48:47 -05:00