View Issue Details

IDProjectCategoryView StatusLast Update
0000487AlmaLinux-8kernelpublic2024-12-18 20:54
ReporterAdministratorte Assigned To 
PriorityhighSeveritycrashReproducibilityalways
Status newResolutionopen 
Summary0000487: .22 and .27 kernels crash on two disperate PowerEdge Machines
DescriptionTwo machines, one old, one new:

PowerEdge R520, Intel(R) Xeon(R) CPU E5-2430 v2 @ 2.50GHz

PowerEdge R7525, AMD EPYC 7763 64-Core Processor

Both run the .16 kernel fine. When the .22 kernel came out,
I ran yum -y update on both machines and rebooted, but both machines
crashed on reboot.

The .22 and .27 kernels are fine on all other machines, including old HP
desktops as well as new PowerEdge machines.

One curious fact that may not be related is that on all other machines
under /boot, there is always a corresponding initramfs*kdump.img
to accompany every kernel, but on the failing machines these do not exist
for the .22 and .27 kernels. They do exist for the .16 kernel and previous
kernels.

I tried the .27 kernel on the old machine with no change. I've yet to try the
.27 kernel on the new machine as I suspect the behavior will not change.

Capturing the crash turns out to be non-trivial, as it scrolls too fast and my attempts
with the serial port have not worked.

So far what I've been able to capture with a video camera:

RIP: 0010(?)xhci_irq+0x140/0x3e0 (?)
Code 00 0F 1F 40 00 48 03 e4 20 09 d0 5b 5d 41 5c 41 5d 41 5e 41 5b 01 00 00 00 0f 1f 44 00 00 0f ae e8 ? 4d 0c <89> c8 ...

Call Trace
<NMI>
? watchdog_overflow_callback.cold.7+0x1e/0x70
? __perf_event_overflow+0x52/0x100
? handle_pmi_common+0x200/0x2d0
? __get_pto_vaddr+0x32/0x58
?__native_set_fixmap+0x24/0x40
? ghea(?)_copy_tofrom_phys+0xf9/0x250?
? intel_pmu_handle_irq+0x119/0x450
? perf_event _nmi_handler+0x24/0x50
? nmi_handle+0x63/0x110
? default_do_nmi+0x19c/0x210
? do_nmi+0x19c/0x69
? end_repeat_nmi+0x16/0x69
? xhci_irq+0x140/0x3e8
? xhci_irq+0x140/0x3e8
</NMI>

Later there is a kernel panic but the screen is blurry. I'll try to get better logs.

Steps To Reproduceyum -y update and reboot. .22 and .27 kernels crash, .16 is OK.
Additional Informationno corresponding initramfs*kdump.img under /boot on the two machines that fail
Tagsalma8, almalinux8, boot, kernel
abrt_hash
URL

Activities

Administratorte

2024-11-25 22:11

reporter   ~0001091

After much ado:

[ESC[0;32m OK ESC[0m] Started Show Plymouth Boot Screen.
[ESC[0;32m OK ESC[0m] Started Forward Password Requests to Plymouth Directory Watch.
[ESC[0;32m OK ESC[0m] Reached target Paths.
[ESC[0;32m OK ESC[0m] Started Journal Service.
[ 19.651718] NMI watchdog: Watchdog detected hard LOCKUP on cpu 5Modules linked in: sd_mod t10_pi sg uas usb_storage fuse
[ 19.651722] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.18.0-553.27.1.el8_10.x86_64 #1
[ 19.651723] Hardware name: Dell Inc. PowerEdge R520/03P5P3, BIOS 2.9.0 01/09/2020
[ 19.651723] RIP: 0010:__radix_tree_lookup+0x6e/0xa0
[ 19.651724] Code: fd 0f b6 08 49 89 c0 48 89 f0 48 d3 e8 83 e0 3f 4c 8d 0c c5 28 00 00 00 4b 8d 04 08 4d 01 c1 48 8b 00 48 3d 02 04 00 00 74 9f <84> c9 74 0c 48 89 c1 83 e1 03 48 83 f9 02 74 c3 48 85 d2 74 03 4c
[ 19.651724] RSP: 0018:ffff9a7d4655ce28 EFLAGS: 00000086
[ 19.651725] RAX: ffff89e718039b62 RBX: 0000000000000040 RCX: 0000000000000018
[ 19.651726] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff89e38a4065c8
[ 19.651726] RBP: 0000000000000002 R08: ffff89e71803fd98 R09: ffff89e71803fdc0
[ 19.651727] R10: 0000000000000000 R11: ffff89e38a4065d0 R12: ffff8a026bca8140
[ 19.651727] R13: ffff89e479957700 R14: ffff89e38765b2c0 R15: ffff89e3d14506b0
[ 19.651728] FS: 0000000000000000(0000) GS:ffff8a02bf340000(0000) knlGS:0000000000000000
[ 19.651728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 19.651729] CR2: 000055cf2bab66c8 CR3: 0000001febe10002 CR4: 00000000001706e0
[ 19.651729] Call Trace:
[ 19.651730] <NMI>
[ 19.651730] ? watchdog_overflow_callback.cold.7+0x1e/0x70
[ 19.651730] ? __perf_event_overflow+0x52/0x100
[ 19.651731] ? handle_pmi_common+0x200/0x2d0
[ 19.651731] ? __set_pte_vaddr+0x32/0x50
[ 19.651732] ? __native_set_fixmap+0x24/0x40
[ 19.651732] ? ghes_copy_tofrom_phys+0xf9/0x250
[ 19.651732] ? intel_pmu_handle_irq+0x119/0x450
[ 19.651733] ? perf_event_nmi_handler+0x2d/0x50
[ 19.651733] ? nmi_handle+0x63/0x110
[ 19.651734] ? default_do_nmi+0x49/0x110
[ 19.651734] ? do_nmi+0x19c/0x210
[ 19.651734] ? end_repeat_nmi+0x16/0x69
[ 19.651735] ? __radix_tree_lookup+0x6e/0xa0
[ 19.651735] ? __radix_tree_lookup+0x6e/0xa0
[ 19.651735] ? __radix_tree_lookup+0x6e/0xa0
[ 19.651736] </NMI>
[ 19.651736] <IRQ>
[ 19.651736] handle_tx_event.isra.58+0x5d/0x1290
[ 19.651737] ? usb_giveback_urb_bh+0xb0/0x140
[ 19.651737] xhci_irq+0x1c5/0x3e0
[ 19.651738] __handle_irq_event_percpu+0x40/0x190
[ 19.651738] handle_irq_event_percpu+0x30/0x80
[ 19.651738] handle_irq_event+0x36/0x57
[ 19.651739] handle_edge_irq+0x82/0x190
[ 19.651739] handle_irq+0x1c/0x30
[ 19.651739] do_IRQ+0x49/0xd0
[ 19.651740] common_interrupt+0xf/0xf
[ 19.651740] </IRQ>
[ 19.651740] RIP: 0010:native_safe_halt+0xe/0x20
[ 19.651741] Code: 00 a8 08 75 be e9 23 ff ff ff 31 ff e9 6a ff ff ff 90 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 16 41 5e 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 e9 07 00 00
[ 19.651742] RSP: 0018:ffff9a7d462ffe28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[ 19.651743] RAX: 0000000080004000 RBX: ffff89e387458464 RCX: 000000000000001f
[ 19.651743] RDX: ffffffffa59c6b80 RSI: ffffffffa72d1ce0 RDI: 0000000000000001
[ 19.651744] RBP: ffff89e387458464 R08: 0000000000000001 R09: ffff89e387458400
[ 19.651744] R10: 00000355e97d9cb7 R11: ffff8a02bf372484 R12: 0000000000000001
[ 19.651745] R13: ffffffffa72d1ce0 R14: 0000000000000001 R15: 0000000000000001
[ 19.651745] ? acpi_processor_thermal_init.cold.6+0x66/0x66
[ 19.651746] ? acpi_processor_thermal_init.cold.6+0x66/0x66
[ 19.651746] acpi_idle_do_entry+0x93/0xa0
[ 19.651746] acpi_idle_enter+0x5f/0xd0
[ 19.651747] cpuidle_enter_state+0x86/0x470
[ 19.651747] cpuidle_enter+0x2c/0x40
[ 19.651748] do_idle+0x26f/0x2d0
[ 19.651748] cpu_startup_entry+0x6f/0x80
[ 19.651748] start_secondary+0x187/0x1d0
[ 19.651749] secondary_startup_64_no_verify+0xd1/0xdb
[ 19.651749] Kernel panic - not syncing: Hard LOCKUP
[ 19.651750] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.18.0-553.27.1.el8_10.x86_64 #1
[ 19.651750] Hardware name: Dell Inc. PowerEdge R520/03P5P3, BIOS 2.9.0 01/09/2020
[ 19.651751] Call Trace:
[ 19.651751] <NMI>
[ 19.651751] dump_stack+0x41/0x60
[ 19.651752] panic+0xe7/0x2ac
[ 19.651752] ? secondary_startup_64_no_verify+0x8c/0xdb
[ 19.651752] nmi_panic.cold.11+0xc/0xc
[ 19.651753] watchdog_overflow_callback.cold.7+0x5c/0x70
[ 19.651753] __perf_event_overflow+0x52/0x100
[ 19.651754] handle_pmi_common+0x200/0x2d0
[ 19.651754] ? __set_pte_vaddr+0x32/0x50
[ 19.651754] ? __native_set_fixmap+0x24/0x40
[ 19.651755] ? ghes_copy_tofrom_phys+0xf9/0x250
[ 19.651755] intel_pmu_handle_irq+0x119/0x450
[ 19.651756] perf_event_nmi_handler+0x2d/0x50
[ 19.651756] nmi_handle+0x63/0x110
[ 19.651756] default_do_nmi+0x49/0x110
[ 19.651757] do_nmi+0x19c/0x210
[ 19.651757] end_repeat_nmi+0x16/0x69
[ 19.651757] RIP: 0010:__radix_tree_lookup+0x6e/0xa0
[ 19.651758] Code: fd 0f b6 08 49 89 c0 48 89 f0 48 d3 e8 83 e0 3f 4c 8d 0c c5 28 00 00 00 4b 8d 04 08 4d 01 c1 48 8b 00 48 3d 02 04 00 00 74 9f <84> c9 74 0c 48 89 c1 83 e1 03 48 83 f9 02 74 c3 48 85 d2 74 03 4c
[ 19.651759] RSP: 0018:ffff9a7d4655ce28 EFLAGS: 00000086
[ 19.651759] RAX: ffff89e718039b62 RBX: 0000000000000040 RCX: 0000000000000018
[ 19.651760] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff89e38a4065c8
[ 19.651760] RBP: 0000000000000002 R08: ffff89e71803fd98 R09: ffff89e71803fdc0
[ 19.651761] R10: 0000000000000000 R11: ffff89e38a4065d0 R12: ffff8a026bca8140
[ 19.651761] R13: ffff89e479957700 R14: ffff89e38765b2c0 R15: ffff89e3d14506b0
[ 19.651762] ? __radix_tree_lookup+0x6e/0xa0
[ 19.651762] ? __radix_tree_lookup+0x6e/0xa0
[ 19.651763] </NMI>
[ 19.651763] <IRQ>
[ 19.651763] handle_tx_event.isra.58+0x5d/0x1290
[ 19.651764] ? usb_giveback_urb_bh+0xb0/0x140
[ 19.651764] xhci_irq+0x1c5/0x3e0
[ 19.651764] __handle_irq_event_percpu+0x40/0x190
[ 19.651765] handle_irq_event_percpu+0x30/0x80
[ 19.651765] handle_irq_event+0x36/0x57
[ 19.651766] handle_edge_irq+0x82/0x190
[ 19.651766] handle_irq+0x1c/0x30
[ 19.651766] do_IRQ+0x49/0xd0
[ 19.651767] common_interrupt+0xf/0xf
[ 19.651767] </IRQ>
[ 19.651767] RIP: 0010:native_safe_halt+0xe/0x20
[ 19.651768] Code: 00 a8 08 75 be e9 23 ff ff ff 31 ff e9 6a ff ff ff 90 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 16 41 5e 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 e9 07 00 00
[ 19.651769] RSP: 0018:ffff9a7d462ffe28 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[ 19.651769] RAX: 0000000080004000 RBX: ffff89e387458464 RCX: 000000000000001f
[ 19.651770] RDX: ffffffffa59c6b80 RSI: ffffffffa72d1ce0 RDI: 0000000000000001
[ 19.651770] RBP: ffff89e387458464 R08: 0000000000000001 R09: ffff89e387458400
[ 19.651771] R10: 00000355e97d9cb7 R11: ffff8a02bf372484 R12: 0000000000000001
[ 19.651771] R13: ffffffffa72d1ce0 R14: 0000000000000001 R15: 0000000000000001
[ 19.651772] ? acpi_processor_thermal_init.cold.6+0x66/0x66
[ 19.651772] ? acpi_processor_thermal_init.cold.6+0x66/0x66
[ 19.651773] acpi_idle_do_entry+0x93/0xa0
[ 19.651773] acpi_idle_enter+0x5f/0xd0
[ 19.651774] cpuidle_enter_state+0x86/0x470
[ 19.651774] cpuidle_enter+0x2c/0x40
[ 19.651774] do_idle+0x26f/0x2d0
[ 19.651775] cpu_startup_entry+0x6f/0x80
[ 19.651775] start_secondary+0x187/0x1d0
[ 19.651775] secondary_startup_64_no_verify+0xd1/0xdb
[ 20.678937] Shutting down cpus with NMI
[ 20.678937] Kernel Offset: 0x24400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Administratorte

2024-12-16 23:09

reporter   ~0001098

The .30 kernels act in the same way that the .22/.27 kernels do and the < = .16 kernels do not.

The issue is with USB3. The crash on machine 0000002 is caused by having an external USB3 disk attached
to the USB3 port. If the disk is not attached, the .30 kernel boots. If a USB2 device is connected to
the USB3 port (such as a USB to serial port adapter), the machine boots. If the USB3 device is connected
to a USB2 port, the machine boots.

After all the tests on the R7525, I upgraded to the latest BIOS, 2.17.4, with no change.
The lockup below references upowerd, but disabling upowerd showed no change in behavior.

Once the machine is booted, if the external USB3 disk is plugged into the USB3 port, the machine crashes.
So far I've tried two different disks (8TB and 2TB) and two different cases with the same result.
One case is a NexStar 6, the other is a Startech dock.

[ 323.846031] usb 1-2.1: USB disconnect, device number 4
[ 325.012979] usb 2-1: USB disconnect, device number 2
[ 338.297029] usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd
[ 338.309674] usb 4-1: New USB device found, idVendor=174c, idProduct=1153, bcdDevice= 1.00
[ 338.309682] usb 4-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[ 338.309687] usb 4-1: Product: USB3.0 Device
[ 338.309692] usb 4-1: Manufacturer: Generic
[ 338.309695] usb 4-1: SerialNumber: WD-WXE1E14MVJ21
[ 338.311701] scsi host12: uas
[ 338.312814] scsi 12:0:0:0: Direct-Access HGST HUS 728T8TALE6L4 0 PQ: 0 ANSI: 6
[ 338.313819] sd 12:0:0:0: Attached scsi generic sg3 type 0
[ 338.314210] sd 12:0:0:0: [sdc] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB)
[ 338.314217] sd 12:0:0:0: [sdc] 4096-byte physical blocks
[ 338.314324] sd 12:0:0:0: [sdc] Write Protect is off
[ 338.314327] sd 12:0:0:0: [sdc] Mode Sense: 43 00 00 00
[ 338.314484] sd 12:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 364.248981] watchdog: BUG: soft lockup - CPU#50 stuck for 23s! [upowerd:4892]
[ 364.248989] Modules linked in: nft_counter nf_tables libcrc32c nfnetlink sunrpc intel_rapl_msr intel_rapl_common amd64_edac_mod edac_mce_amd amd_energy del
l_smbios dcdbas wmi_bmof dell_wmi_descriptor ipmi_ssif kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel bnxt_re ib_uverbs cdc_ether usb
net joydev mii ib_core rapl sp5100_tco pcspkr acpi_ipmi ccp ptdma i2c_piix4 k10temp ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter ext4 mbcache jbd
2 uas usb_storage sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_shmem_helper crc32c_intel drm tg3 ahci bnxt_en li
bahci libata megaraid_sas fuse
[ 364.249070] CPU: 50 PID: 4892 Comm: upowerd Kdump: loaded Not tainted 4.18.0-553.30.1.el8_10.x86_64 #1
[ 364.249074] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.17.4 10/04/2024
[ 364.249076] RIP: 0010:smp_call_function_many_cond+0x268/0x2f0
[ 364.249083] Code: e8 9d be 84 00 3b 05 8b 50 e1 01 0f 83 27 fe ff ff 48 63 d0 49 8b 0e 48 03 0c d5 80 68 3c a3 8b 11 83 e2 01 74 09 f3 90 8b 11 <83> e2 01
75 f7 eb c9 48 c7 c2 60 49 fc a3 4c 89 ee 44 89 e7 e8 3f
[ 364.249086] RSP: 0018:ffffb01a73c9fb58 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 364.249089] RAX: 0000000000000092 RBX: 0000000000000000 RCX: ffff8aea3e6bb240
[ 364.249091] RDX: 0000000000000011 RSI: 0000000000000002 RDI: ffff8a6ccdbfd180
[ 364.249093] RBP: ffff8a6ccdbfd0a0 R08: 0000000080000000 R09: 0000000000000000
[ 364.249094] R10: 0000000000000003 R11: 0000000000000004 R12: ffffffffa21b03fb
[ 364.249096] R13: 0000000000000100 R14: ffff8aea3deb4a00 R15: 00000000000000ff
[ 364.249098] FS: 00007f83b1708b00(0000) GS:ffff8aea3de80000(0000) knlGS:0000000000000000
[ 364.249100] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 364.249101] CR2: 000055967715eec8 CR3: 0000000190f1e002 CR4: 0000000000770ee0
[ 364.249103] PKRU: 55555554
[ 364.249104] Call Trace:
[ 364.249107] <IRQ>
[ 364.249110] ? watchdog_timer_fn.cold.10+0x46/0x9e
[ 364.249114] ? watchdog+0x30/0x30
[ 364.249117] ? __hrtimer_run_queues+0x101/0x280
[ 364.249122] ? hrtimer_interrupt+0x100/0x220
[ 364.249124] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 364.249129] ? smp_apic_timer_interrupt+0x6a/0x130
[ 364.249131] ? apic_timer_interrupt+0xf/0x20
[ 364.249134] </IRQ>
[ 364.249135] ? on_each_cpu+0x2b/0x60
[ 364.249139] ? smp_call_function_many_cond+0x268/0x2f0
[ 364.249142] ? smp_call_function_many_cond+0x243/0x2f0
[ 364.249144] ? load_new_mm_cr3+0xe0/0xe0
[ 364.249148] ? load_new_mm_cr3+0xe0/0xe0
[ 364.249150] on_each_cpu+0x2b/0x60
[ 364.249153] flush_tlb_kernel_range+0x48/0x90
[ 364.249156] ? do_jit+0x74e/0x2320
[ 364.249160] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 364.249162] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 364.249164] __purge_vmap_area_lazy+0x70/0x730
[ 364.249170] _vm_unmap_aliases+0xe5/0x120
[ 364.249174] change_page_attr_set_clr+0xa5/0x1a0
[ 364.249179] set_memory_ro+0x26/0x30
[ 364.249182] bpf_int_jit_compile+0x486/0x4c0
[ 364.249186] bpf_prog_select_runtime+0xb0/0xf0
[ 364.249190] bpf_prepare_filter+0x52c/0x5a0
[ 364.249196] sk_attach_filter+0x13/0x60
[ 364.249198] ? lock_sock_nested+0x1e/0x50
[ 364.249202] sock_setsockopt+0x701/0xc00
[ 364.249205] ? alloc_file_pseudo+0xa7/0x100
[ 364.249210] __sys_setsockopt+0x1a3/0x1d0
[ 364.249214] __x64_sys_setsockopt+0x20/0x30
[ 364.249217] do_syscall_64+0x5b/0x1a0
[ 364.249221] entry_SYSCALL_64_after_hwframe+0x66/0xcb
[ 364.249226] RIP: 0033:0x7f83afdc329e
[ 364.249229] Code: 48 8b 0d ed 5b 39 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ba 5b 39 00 f7 d8 64 89 01 48
[ 364.249231] RSP: 002b:00007ffcae082a28 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
[ 364.249233] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f83afdc329e
[ 364.249235] RDX: 000000000000001a RSI: 0000000000000001 RDI: 000000000000000a
[ 364.249236] RBP: 00007ffcae082a50 R08: 0000000000000010 R09: 00000000ffffffff
[ 364.249237] R10: 00007ffcae082a40 R11: 0000000000000246 R12: 0000000000000008
[ 364.249239] R13: 00007ffcae082a98 R14: 0000564562e433d0 R15: 0000000000000000
[ 367.936458] NMI watchdog: Watchdog detected hard LOCKUP on cpu 146Modules linked in: nft_counter nf_tables libcrc32c nfnetlink sunrpc intel_rapl_msr intel_rapl_common amd64_edac_mod edac_mce_amd amd_energy dell_smbios dcdbas wmi_bmof dell_wmi_descriptor ipmi_ssif kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel bnxt_re ib_uverbs cdc_ether usbnet joydev mii ib_core rapl sp5100_tco pcspkr acpi_ipmi ccp ptdma i2c_piix4 k10temp ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter ext4 mbcache jbd2 uas usb_storage sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_shmem_helper crc32c_intel drm tg3 ahci bnxt_en libahci libata megaraid_sas fuse
[ 367.936491] CPU: 146 PID: 0 Comm: swapper/146 Kdump: loaded Tainted: G L -------- - - 4.18.0-553.30.1.el8_10.x86_64 #1
[ 367.936492] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.17.4 10/04/2024
[ 367.936493] RIP: 0010:xhci_update_erst_dequeue.isra.43+0x13/0xa0
[ 367.936494] Code: c0 74 06 81 23 ff fe ff ff 5b e9 33 04 5e 00 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 10 48 8b 02 44 8b 40 18 8b 40 1c <45> 89 c0 48 c1 e0 20 4c 01 c0 4c 8b 06 49 8b 70 20 48 39 ce 74 55
[ 367.936494] RSP: 0018:ffffb01a5b3bcea8 EFLAGS: 00000082
[ 367.936496] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff8a6cdad5c860
[ 367.936496] RDX: ffff8a6ccf8b81a8 RSI: ffff8a6ccf8b8180 RDI: ffff8a6cc006e2c0
[ 367.936497] RBP: ffff8a6cdad5c860 R08: 000000001ad5c868 R09: ffff8a6d2d5b24a8
[ 367.936497] R10: 0000000000000000 R11: ffff8a6cdd58b8d0 R12: 0000000000000000
[ 367.936498] R13: ffff8a6cd235ca00 R14: ffff8a6cc006e000 R15: ffff8a6ccf8b8180
[ 367.936498] FS: 0000000000000000(0000) GS:ffff8aea3e680000(0000) knlGS:0000000000000000
[ 367.936499] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 367.936500] CR2: 0000559677164178 CR3: 0000000f74410005 CR4: 0000000000770ee0
[ 367.936500] PKRU: 55555554
[ 367.936500] Call Trace:
[ 367.936501] <NMI>
[ 367.936501] ? watchdog_overflow_callback.cold.7+0x1e/0x70
[ 367.936502] ? __perf_event_overflow+0x52/0x100
[ 367.936502] ? x86_pmu_handle_irq+0x12f/0x190
[ 367.936503] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 367.936503] ? __set_pte_vaddr+0x32/0x50
[ 367.936504] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 367.936504] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 367.936505] ? ghes_copy_tofrom_phys+0xf9/0x250
[ 367.936505] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 367.936506] ? amd_pmu_handle_irq+0x46/0xc0
[ 367.936506] ? perf_event_nmi_handler+0x2d/0x50
[ 367.936507] ? nmi_handle+0x63/0x110
[ 367.936507] ? default_do_nmi+0x49/0x110
[ 367.936508] ? do_nmi+0x19c/0x210
[ 367.936508] ? end_repeat_nmi+0x16/0x69
[ 367.936509] ? xhci_update_erst_dequeue.isra.43+0x13/0xa0
[ 367.936509] ? xhci_update_erst_dequeue.isra.43+0x13/0xa0
[ 367.936510] ? xhci_update_erst_dequeue.isra.43+0x13/0xa0
[ 367.936510] </NMI>
[ 367.936511] <IRQ>
[ 367.936511] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 367.936512] xhci_irq+0x22f/0x3e0
[ 367.936512] __handle_irq_event_percpu+0x40/0x190
[ 367.936513] handle_irq_event_percpu+0x30/0x80
[ 367.936513] handle_irq_event+0x36/0x57
[ 367.936514] handle_edge_irq+0x82/0x190
[ 367.936514] handle_irq+0x1c/0x30
[ 367.936515] do_IRQ+0x49/0xd0
[ 367.936515] common_interrupt+0xf/0xf
[ 367.936516] </IRQ>
[ 367.936516] RIP: 0010:mwait_idle+0x61/0x80
[ 367.936517] Code: 48 8b 04 25 40 dc 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 17 e9 07 00 00 00 0f 00 2d ba d1 5d 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 07 fb 66 0f 1f 44 00 00 65 48 8b 04 25 40 dc 01 00 f0 80 60 02
[ 367.936517] RSP: 0018:ffffb01a592bbea0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[ 367.936519] RAX: 0000000000000000 RBX: 0000000000000092 RCX: 0000000000000000
[ 367.936519] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000004ec515f680
[ 367.936520] RBP: 0000000000000092 R08: 0000000000000002 R09: 0000000000033000
[ 367.936520] R10: 000001a96ec0922b R11: 0000000000000067 R12: 0000000000000000
[ 367.936521] R13: 0000000000000000 R14: ffffffffffffffff R15: ffff8a6cccc12840
[ 367.936521] default_idle_call+0x44/0xf0
[ 367.936522] do_idle+0x21a/0x2d0
[ 367.936522] cpu_startup_entry+0x6f/0x80
[ 367.936523] start_secondary+0x187/0x1d0
[ 367.936523] secondary_startup_64_no_verify+0xd1/0xdb
[ 367.936524] Kernel panic - not syncing: Hard LOCKUP
[ 367.936525] CPU: 146 PID: 0 Comm: swapper/146 Kdump: loaded Tainted: G L -------- - - 4.18.0-553.30.1.el8_10.x86_64 #1
[ 367.936525] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.17.4 10/04/2024
[ 367.936526] Call Trace:
[ 367.936526] <NMI>
[ 367.936527] dump_stack+0x41/0x60
[ 367.936527] panic+0xe7/0x2ac
[ 367.936528] ? secondary_startup_64_no_verify+0x8c/0xdb
[ 367.936528] nmi_panic.cold.11+0xc/0xc
[ 367.936529] watchdog_overflow_callback.cold.7+0x5c/0x70
[ 367.936529] __perf_event_overflow+0x52/0x100
[ 367.936530] x86_pmu_handle_irq+0x12f/0x190
[ 367.936530] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 367.936531] ? __set_pte_vaddr+0x32/0x50
[ 367.936531] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 367.936532] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 367.936532] ? ghes_copy_tofrom_phys+0xf9/0x250
[ 367.936533] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 367.936533] amd_pmu_handle_irq+0x46/0xc0
[ 367.936534] perf_event_nmi_handler+0x2d/0x50
[ 367.936534] nmi_handle+0x63/0x110
[ 367.936534] default_do_nmi+0x49/0x110
[ 367.936535] do_nmi+0x19c/0x210
[ 367.936535] end_repeat_nmi+0x16/0x69
[ 367.936536] RIP: 0010:xhci_update_erst_dequeue.isra.43+0x13/0xa0
[ 367.936537] Code: c0 74 06 81 23 ff fe ff ff 5b e9 33 04 5e 00 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 10 48 8b 02 44 8b 40 18 8b 40 1c <45> 89 c0 48 c1 e0 20 4c 01 c0 4c 8b 06 49 8b 70 20 48 39 ce 74 55
[ 367.936537] RSP: 0018:ffffb01a5b3bcea8 EFLAGS: 00000082
[ 367.936538] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff8a6cdad5c860
[ 367.936539] RDX: ffff8a6ccf8b81a8 RSI: ffff8a6ccf8b8180 RDI: ffff8a6cc006e2c0
[ 367.936539] RBP: ffff8a6cdad5c860 R08: 000000001ad5c868 R09: ffff8a6d2d5b24a8
[ 367.936540] R10: 0000000000000000 R11: ffff8a6cdd58b8d0 R12: 0000000000000000
[ 367.936541] R13: ffff8a6cd235ca00 R14: ffff8a6cc006e000 R15: ffff8a6ccf8b8180
[ 367.936541] ? xhci_update_erst_dequeue.isra.43+0x13/0xa0
[ 367.936542] ? xhci_update_erst_dequeue.isra.43+0x13/0xa0
[ 367.936542] </NMI>
[ 367.936543] <IRQ>
[ 367.936543] ? srso_alias_return_thunk+0x5/0xfcdfd
[ 367.936543] xhci_irq+0x22f/0x3e0
[ 367.936544] __handle_irq_event_percpu+0x40/0x190
[ 367.936544] handle_irq_event_percpu+0x30/0x80
[ 367.936545] handle_irq_event+0x36/0x57
[ 367.936545] handle_edge_irq+0x82/0x190
[ 367.936546] handle_irq+0x1c/0x30
[ 367.936546] do_IRQ+0x49/0xd0
[ 367.936547] common_interrupt+0xf/0xf
[ 367.936547] </IRQ>
[ 367.936548] RIP: 0010:mwait_idle+0x61/0x80
[ 367.936548] Code: 48 8b 04 25 40 dc 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 17 e9 07 00 00 00 0f 00 2d ba d1 5d 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 07 fb 66 0f 1f 44 00 00 65 48 8b 04 25 40 dc 01 00 f0 80 60 02
[ 367.936549] RSP: 0018:ffffb01a592bbea0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[ 367.936550] RAX: 0000000000000000 RBX: 0000000000000092 RCX: 0000000000000000
[ 367.936551] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000004ec515f680
[ 367.936551] RBP: 0000000000000092 R08: 0000000000000002 R09: 0000000000033000
[ 367.936552] R10: 000001a96ec0922b R11: 0000000000000067 R12: 0000000000000000
[ 367.936552] R13: 0000000000000000 R14: ffffffffffffffff R15: ffff8a6cccc12840
[ 367.936553] default_idle_call+0x44/0xf0
[ 367.936553] do_idle+0x21a/0x2d0
[ 367.936554] cpu_startup_entry+0x6f/0x80
[ 367.936554] start_secondary+0x187/0x1d0
[ 367.936555] secondary_startup_64_no_verify+0xd1/0xdb

Administratorte

2024-12-17 22:46

reporter   ~0001099

The case and the machine matters, the disk does not seem to.

On the Poweredge 7525, I was able to attach a Verbatim 4TB external USB3 to the USB3 port without a crash.
I was also able to take the disk out of the "bad" case and hook it up in an old verbatim case and attach it to the
USB3 port without a crash.

The "bad" case is a Vantec NexStar 6G Model NST-366S3.

A Startech SDOCKU313 also crashes the machine.

On other machines, with the possible exception of the PowerEdge R520 (untested yet),
the Vantec is fine.

Administratorte

2024-12-18 20:54

reporter   ~0001101

Updated the R520, the updates pulled the .32 kernel.

Machine boots fine without external disk connected.

Plug in external disk of "bad" case, machine crashes:

[ 8821.925536] scsi 7:0:0:0: Direct-Access HGST HUS 728T8TALE6L4 0 PQ: 0 ANSI: 6
[ 8821.926513] sd 7:0:0:0: Attached scsi generic sg5 type 0
[ 8821.926887] sd 7:0:0:0: [sde] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB)
[ 8821.926889] sd 7:0:0:0: [sde] 4096-byte physical blocks
[ 8821.926965] sd 7:0:0:0: [sde] Write Protect is off
[ 8821.926967] sd 7:0:0:0: [sde] Mode Sense: 43 00 00 00
[ 8821.927126] sd 7:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 8841.635719] NMI watchdog: Watchdog detected hard LOCKUP on cpu 5

Issue History

Date Modified Username Field Change
2024-11-18 22:54 Administratorte New Issue
2024-11-18 22:54 Administratorte Tag Attached: alma8
2024-11-18 22:54 Administratorte Tag Attached: almalinux8
2024-11-18 22:54 Administratorte Tag Attached: boot
2024-11-18 22:54 Administratorte Tag Attached: kernel
2024-11-25 22:11 Administratorte Note Added: 0001091
2024-12-16 23:09 Administratorte Note Added: 0001098
2024-12-17 22:46 Administratorte Note Added: 0001099
2024-12-18 20:54 Administratorte Note Added: 0001101