View Issue Details

IDProjectCategoryView StatusLast Update
0000487AlmaLinux-8kernelpublic2024-11-25 22:11
ReporterAdministratorte Assigned To 
PriorityhighSeveritycrashReproducibilityalways
Status newResolutionopen 
Summary0000487: .22 and .27 kernels crash on two disperate PowerEdge Machines
DescriptionTwo machines, one old, one new:

PowerEdge R520, Intel(R) Xeon(R) CPU E5-2430 v2 @ 2.50GHz

PowerEdge R7525, AMD EPYC 7763 64-Core Processor

Both run the .16 kernel fine. When the .22 kernel came out,
I ran yum -y update on both machines and rebooted, but both machines
crashed on reboot.

The .22 and .27 kernels are fine on all other machines, including old HP
desktops as well as new PowerEdge machines.

One curious fact that may not be related is that on all other machines
under /boot, there is always a corresponding initramfs*kdump.img
to accompany every kernel, but on the failing machines these do not exist
for the .22 and .27 kernels. They do exist for the .16 kernel and previous
kernels.

I tried the .27 kernel on the old machine with no change. I've yet to try the
.27 kernel on the new machine as I suspect the behavior will not change.

Capturing the crash turns out to be non-trivial, as it scrolls too fast and my attempts
with the serial port have not worked.

So far what I've been able to capture with a video camera:

RIP: 0010(?)xhci_irq+0x140/0x3e0 (?)
Code 00 0F 1F 40 00 48 03 e4 20 09 d0 5b 5d 41 5c 41 5d 41 5e 41 5b 01 00 00 00 0f 1f 44 00 00 0f ae e8 ? 4d 0c <89> c8 ...

Call Trace
<NMI>
? watchdog_overflow_callback.cold.7+0x1e/0x70
? __perf_event_overflow+0x52/0x100
? handle_pmi_common+0x200/0x2d0
? __get_pto_vaddr+0x32/0x58
?__native_set_fixmap+0x24/0x40
? ghea(?)_copy_tofrom_phys+0xf9/0x250?
? intel_pmu_handle_irq+0x119/0x450
? perf_event _nmi_handler+0x24/0x50
? nmi_handle+0x63/0x110
? default_do_nmi+0x19c/0x210
? do_nmi+0x19c/0x69
? end_repeat_nmi+0x16/0x69
? xhci_irq+0x140/0x3e8
? xhci_irq+0x140/0x3e8
</NMI>

Later there is a kernel panic but the screen is blurry. I'll try to get better logs.

Steps To Reproduceyum -y update and reboot. .22 and .27 kernels crash, .16 is OK.
Additional Informationno corresponding initramfs*kdump.img under /boot on the two machines that fail
Tagsalma8, almalinux8, boot, kernel
abrt_hash
URL

Activities

Administratorte

2024-11-25 22:11

reporter   ~0001091

After much ado:

[ESC[0;32m OK ESC[0m] Started Show Plymouth Boot Screen.
[ESC[0;32m OK ESC[0m] Started Forward Password Requests to Plymouth Directory Watch.
[ESC[0;32m OK ESC[0m] Reached target Paths.
[ESC[0;32m OK ESC[0m] Started Journal Service.
[ 19.651718] NMI watchdog: Watchdog detected hard LOCKUP on cpu 5Modules linked in: sd_mod t10_pi sg uas usb_storage fuse
[ 19.651722] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.18.0-553.27.1.el8_10.x86_64 #1
[ 19.651723] Hardware name: Dell Inc. PowerEdge R520/03P5P3, BIOS 2.9.0 01/09/2020
[ 19.651723] RIP: 0010:__radix_tree_lookup+0x6e/0xa0
[ 19.651724] Code: fd 0f b6 08 49 89 c0 48 89 f0 48 d3 e8 83 e0 3f 4c 8d 0c c5 28 00 00 00 4b 8d 04 08 4d 01 c1 48 8b 00 48 3d 02 04 00 00 74 9f <84> c9 74 0c 48 89 c1 83 e1 03 48 83 f9 02 74 c3 48 85 d2 74 03 4c
[ 19.651724] RSP: 0018:ffff9a7d4655ce28 EFLAGS: 00000086
[ 19.651725] RAX: ffff89e718039b62 RBX: 0000000000000040 RCX: 0000000000000018
[ 19.651726] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff89e38a4065c8
[ 19.651726] RBP: 0000000000000002 R08: ffff89e71803fd98 R09: ffff89e71803fdc0
[ 19.651727] R10: 0000000000000000 R11: ffff89e38a4065d0 R12: ffff8a026bca8140
[ 19.651727] R13: ffff89e479957700 R14: ffff89e38765b2c0 R15: ffff89e3d14506b0
[ 19.651728] FS: 0000000000000000(0000) GS:ffff8a02bf340000(0000) knlGS:0000000000000000
[ 19.651728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 19.651729] CR2: 000055cf2bab66c8 CR3: 0000001febe10002 CR4: 00000000001706e0
[ 19.651729] Call Trace:
[ 19.651730] <NMI>
[ 19.651730] ? watchdog_overflow_callback.cold.7+0x1e/0x70
[ 19.651730] ? __perf_event_overflow+0x52/0x100
[ 19.651731] ? handle_pmi_common+0x200/0x2d0
[ 19.651731] ? __set_pte_vaddr+0x32/0x50
[ 19.651732] ? __native_set_fixmap+0x24/0x40
[ 19.651732] ? ghes_copy_tofrom_phys+0xf9/0x250
[ 19.651732] ? intel_pmu_handle_irq+0x119/0x450
[ 19.651733] ? perf_event_nmi_handler+0x2d/0x50
[ 19.651733] ? nmi_handle+0x63/0x110
[ 19.651734] ? default_do_nmi+0x49/0x110
[ 19.651734] ? do_nmi+0x19c/0x210
[ 19.651734] ? end_repeat_nmi+0x16/0x69
[ 19.651735] ? __radix_tree_lookup+0x6e/0xa0
[ 19.651735] ? __radix_tree_lookup+0x6e/0xa0
[ 19.651735] ? __radix_tree_lookup+0x6e/0xa0
[ 19.651736] </NMI>
[ 19.651736] <IRQ>
[ 19.651736] handle_tx_event.isra.58+0x5d/0x1290
[ 19.651737] ? usb_giveback_urb_bh+0xb0/0x140
[ 19.651737] xhci_irq+0x1c5/0x3e0
[ 19.651738] __handle_irq_event_percpu+0x40/0x190
[ 19.651738] handle_irq_event_percpu+0x30/0x80
[ 19.651738] handle_irq_event+0x36/0x57
[ 19.651739] handle_edge_irq+0x82/0x190
[ 19.651739] handle_irq+0x1c/0x30
[ 19.651739] do_IRQ+0x49/0xd0
[ 19.651740] common_interrupt+0xf/0xf
[ 19.651740] </IRQ>
[ 19.651740] RIP: 0010:native_safe_halt+0xe/0x20
[ 19.651741] Code: 00 a8 08 75 be e9 23 ff ff ff 31 ff e9 6a ff ff ff 90 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 16 41 5e 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 e9 07 00 00
[ 19.651742] RSP: 0018:ffff9a7d462ffe28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[ 19.651743] RAX: 0000000080004000 RBX: ffff89e387458464 RCX: 000000000000001f
[ 19.651743] RDX: ffffffffa59c6b80 RSI: ffffffffa72d1ce0 RDI: 0000000000000001
[ 19.651744] RBP: ffff89e387458464 R08: 0000000000000001 R09: ffff89e387458400
[ 19.651744] R10: 00000355e97d9cb7 R11: ffff8a02bf372484 R12: 0000000000000001
[ 19.651745] R13: ffffffffa72d1ce0 R14: 0000000000000001 R15: 0000000000000001
[ 19.651745] ? acpi_processor_thermal_init.cold.6+0x66/0x66
[ 19.651746] ? acpi_processor_thermal_init.cold.6+0x66/0x66
[ 19.651746] acpi_idle_do_entry+0x93/0xa0
[ 19.651746] acpi_idle_enter+0x5f/0xd0
[ 19.651747] cpuidle_enter_state+0x86/0x470
[ 19.651747] cpuidle_enter+0x2c/0x40
[ 19.651748] do_idle+0x26f/0x2d0
[ 19.651748] cpu_startup_entry+0x6f/0x80
[ 19.651748] start_secondary+0x187/0x1d0
[ 19.651749] secondary_startup_64_no_verify+0xd1/0xdb
[ 19.651749] Kernel panic - not syncing: Hard LOCKUP
[ 19.651750] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.18.0-553.27.1.el8_10.x86_64 #1
[ 19.651750] Hardware name: Dell Inc. PowerEdge R520/03P5P3, BIOS 2.9.0 01/09/2020
[ 19.651751] Call Trace:
[ 19.651751] <NMI>
[ 19.651751] dump_stack+0x41/0x60
[ 19.651752] panic+0xe7/0x2ac
[ 19.651752] ? secondary_startup_64_no_verify+0x8c/0xdb
[ 19.651752] nmi_panic.cold.11+0xc/0xc
[ 19.651753] watchdog_overflow_callback.cold.7+0x5c/0x70
[ 19.651753] __perf_event_overflow+0x52/0x100
[ 19.651754] handle_pmi_common+0x200/0x2d0
[ 19.651754] ? __set_pte_vaddr+0x32/0x50
[ 19.651754] ? __native_set_fixmap+0x24/0x40
[ 19.651755] ? ghes_copy_tofrom_phys+0xf9/0x250
[ 19.651755] intel_pmu_handle_irq+0x119/0x450
[ 19.651756] perf_event_nmi_handler+0x2d/0x50
[ 19.651756] nmi_handle+0x63/0x110
[ 19.651756] default_do_nmi+0x49/0x110
[ 19.651757] do_nmi+0x19c/0x210
[ 19.651757] end_repeat_nmi+0x16/0x69
[ 19.651757] RIP: 0010:__radix_tree_lookup+0x6e/0xa0
[ 19.651758] Code: fd 0f b6 08 49 89 c0 48 89 f0 48 d3 e8 83 e0 3f 4c 8d 0c c5 28 00 00 00 4b 8d 04 08 4d 01 c1 48 8b 00 48 3d 02 04 00 00 74 9f <84> c9 74 0c 48 89 c1 83 e1 03 48 83 f9 02 74 c3 48 85 d2 74 03 4c
[ 19.651759] RSP: 0018:ffff9a7d4655ce28 EFLAGS: 00000086
[ 19.651759] RAX: ffff89e718039b62 RBX: 0000000000000040 RCX: 0000000000000018
[ 19.651760] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff89e38a4065c8
[ 19.651760] RBP: 0000000000000002 R08: ffff89e71803fd98 R09: ffff89e71803fdc0
[ 19.651761] R10: 0000000000000000 R11: ffff89e38a4065d0 R12: ffff8a026bca8140
[ 19.651761] R13: ffff89e479957700 R14: ffff89e38765b2c0 R15: ffff89e3d14506b0
[ 19.651762] ? __radix_tree_lookup+0x6e/0xa0
[ 19.651762] ? __radix_tree_lookup+0x6e/0xa0
[ 19.651763] </NMI>
[ 19.651763] <IRQ>
[ 19.651763] handle_tx_event.isra.58+0x5d/0x1290
[ 19.651764] ? usb_giveback_urb_bh+0xb0/0x140
[ 19.651764] xhci_irq+0x1c5/0x3e0
[ 19.651764] __handle_irq_event_percpu+0x40/0x190
[ 19.651765] handle_irq_event_percpu+0x30/0x80
[ 19.651765] handle_irq_event+0x36/0x57
[ 19.651766] handle_edge_irq+0x82/0x190
[ 19.651766] handle_irq+0x1c/0x30
[ 19.651766] do_IRQ+0x49/0xd0
[ 19.651767] common_interrupt+0xf/0xf
[ 19.651767] </IRQ>
[ 19.651767] RIP: 0010:native_safe_halt+0xe/0x20
[ 19.651768] Code: 00 a8 08 75 be e9 23 ff ff ff 31 ff e9 6a ff ff ff 90 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 16 41 5e 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 e9 07 00 00
[ 19.651769] RSP: 0018:ffff9a7d462ffe28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[ 19.651769] RAX: 0000000080004000 RBX: ffff89e387458464 RCX: 000000000000001f
[ 19.651770] RDX: ffffffffa59c6b80 RSI: ffffffffa72d1ce0 RDI: 0000000000000001
[ 19.651770] RBP: ffff89e387458464 R08: 0000000000000001 R09: ffff89e387458400
[ 19.651771] R10: 00000355e97d9cb7 R11: ffff8a02bf372484 R12: 0000000000000001
[ 19.651771] R13: ffffffffa72d1ce0 R14: 0000000000000001 R15: 0000000000000001
[ 19.651772] ? acpi_processor_thermal_init.cold.6+0x66/0x66
[ 19.651772] ? acpi_processor_thermal_init.cold.6+0x66/0x66
[ 19.651773] acpi_idle_do_entry+0x93/0xa0
[ 19.651773] acpi_idle_enter+0x5f/0xd0
[ 19.651774] cpuidle_enter_state+0x86/0x470
[ 19.651774] cpuidle_enter+0x2c/0x40
[ 19.651774] do_idle+0x26f/0x2d0
[ 19.651775] cpu_startup_entry+0x6f/0x80
[ 19.651775] start_secondary+0x187/0x1d0
[ 19.651775] secondary_startup_64_no_verify+0xd1/0xdb
[ 20.678937] Shutting down cpus with NMI
[ 20.678937] Kernel Offset: 0x24400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Issue History

Date Modified Username Field Change
2024-11-18 22:54 Administratorte New Issue
2024-11-18 22:54 Administratorte Tag Attached: alma8
2024-11-18 22:54 Administratorte Tag Attached: almalinux8
2024-11-18 22:54 Administratorte Tag Attached: boot
2024-11-18 22:54 Administratorte Tag Attached: kernel
2024-11-25 22:11 Administratorte Note Added: 0001091