Peng Zhang 470193ffc9 kdump-tools: disable AER to fix kdump hung issue
This issue is detected after kernel updated from 5.10.112 version to
5.10.152 version. Bad commit is d83d886e69bd (PCI/ERR: Recover from
RCEC AER errors) which comes from linux-yocto 5.10 stable tree. It
will lead to board hang up after triggering kdump.

This issue can be reproduced on board whose name is Supermicro
A2SDi-16C-TP8F, bios version is 1.4 and build date is 01/29/2021.

We don't need pci AER functionality enabled in the kdump kernel, and it
causes some boards to hang in certain situations as kernel AER error log
shows. So we just disable it.

KERNEL AER ERROR LOG:
[    7.409296] pcieport 0000:00:05.0: AER: Multiple Corrected error
received: 0000:00:05.0
[    7.417311] BUG: kernel NULL pointer dereference, address:
0000000000000028
[    7.418296] #PF: supervisor read access in kernel mode
[    7.418296] #PF: error_code(0x0000) - not-present page
[    7.418296] PGD 0 P4D 0
[    7.418296] Oops: 0000 [#1] PREEMPT SMP NOPTI
[    7.418296] CPU: 0 PID: 93 Comm: irq/25-aerdrv Not tainted
5.10.0-6-amd64 #1 Debian 5.10.152-1.stx.25
[    7.418296] Hardware name: Supermicro
SYS-E300-9A-16CN8TP/A2SDi-16C-TP8F, BIOS 1.4 01/29/2021
[    7.418296] RIP: 0010:pci_walk_bus+0x25/0x90
[    7.418296] Code: 00 00 00 00 00 0f 1f 44 00 00 41 56 41 55 49 89 fd
48 c7 c7 20 37 9a 99 41 54 49 89 f4 55 48 89 d5 53 4c 89 eb e8 2b 5a 56
00 <49> 8b 7d 28 eb 1f 48 8b 47 18 48 85 c0 74 31 4c 8b 70 28 48 89 c3
[    7.418296] RSP: 0000:ffffa60040173dc8 EFLAGS: 00010282
[    7.418296] RAX: ffff8b553fded001 RBX: 0000000000000000 RCX:
0000000000000000
[    7.418296] RDX: ffff8b553fded000 RSI: ffffffff9833c6e0 RDI:
ffffffff999a3720
[    7.418296] RBP: ffffa60040173e10 R08: 0000000000000002 R09:
ffffa60040173d74
[    7.418296] R10: 0000000000000001 R11: 0000000000000000 R12:
ffffffff9833c6e0
[    7.418296] R13: 0000000000000000 R14: 0000000000000028 R15:
ffff8b555e206328
[    7.418296] FS:  0000000000000000(0000) GS:ffff8b55bec00000(0000)
knlGS:0000000000000000
[    7.418296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.418296] CR2: 0000000000000028 CR3: 000000087d80a000 CR4:
00000000003506f0
[    7.418296] Call Trace:
[    7.418296]  find_source_device+0x34/0x5a
[    7.418296]  aer_isr.cold+0x89/0x9e
[    7.418296]  ? __set_cpus_allowed_ptr+0xb6/0x220
[    7.418296]  ? disable_irq_nosync+0x10/0x10
[    7.418296]  irq_thread_fn+0x20/0x60
[    7.418296]  irq_thread+0x104/0x1b0
[    7.418296]  ? irq_finalize_oneshot.part.0+0xe0/0xe0
[    7.418296]  ? irq_thread_check_affinity+0xa0/0xa0
[    7.418296]  kthread+0x133/0x150
[    7.418296]  ? __kthread_bind_mask+0x60/0x60
[    7.418296]  ret_from_fork+0x22/0x30
[    7.418296] Modules linked in:
[    7.418296] CR2: 0000000000000028

TEST PLAN:
PASS: build-pkgs -c -p kdump-tools
PASS: build-pkgs -c -p kdump-tools-rt
PASS: boot
PASS: on troublesome and non-troublesome platform
      systemctl enable kdump-tools.service
      systemctl start kdump-tools.service
      echo 1 >/proc/sysrq-trigger
      echo 'c' > /proc/sysrq-trigger
      vmcore has been created successfully
      system boots back up automatically

Closes-Bug: 1999646

Change-Id: I9ffc6e96d4b7fbd0b29a806d4d96dfc8e89dc4c6
Signed-off-by: Peng Zhang <Peng.Zhang2@windriver.com>
2022-12-17 08:38:58 +08:00
..