View Issue Details

IDProjectCategoryView StatusLast Update
0000462AlmaLinux-8kernelpublic2024-07-16 02:45
Reporterzpalmerlw Assigned To 
PriorityurgentSeveritycrashReproducibilityalways
Status newResolutionopen 
PlatformAlma Linux 8OS Version8.9 
Summary0000462: aacraid driver causes SCSI Hang, followed by I/O Spike, and usually a server reboot is triggered
DescriptionThe aacraid driver is faulty. The server will run fine, then it will hang, the load and I/O spike, and it either crashes to a reboot, or freezes solid for several minutes before recovering. There is output in /var/log/messages regarding "SCSI Hang" every time this happens:
"host kernel: aacraid: Host adapter reset request. SCSI hang ?"


So far the only fix is to either update to the latest mainline kernel, which breaks some backup software (Acronis) due the mainline kernel being too new to be supported, or reverting to the following kernel: 4.18.0-477.27.1.el8_8.x86_64

Steps To ReproduceRun an affected kernel with an affected Adaptec card, then it's a matter of waiting for the bug to happen, usually within 15-20 minutes of a system boot.
Additional InformationThere was an update to the aacraid driver in kernel 6.4.0 that has been backported and is causing this breakage.

From what I have found, these are the affected kernels:
4.18.0-513.11.1.el8_9.x86_64
4.18.0-513.9.1.el8_9.x86_64
(Possibly more, unknown to me if so)

I found that this issue was also reported on Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059624
And a kernel Bugzilla for this issue: https://bugzilla.kernel.org/show_bug.cgi?id=217599

Here are a couple affected cards:
Controller Model : Adaptec ASR8805E
Controller Model : Adaptec ASR8405E
TagsNo tags attached.
abrt_hash
URL

Activities

toracat

2024-04-12 23:06

reporter   ~0001024

> So far the only fix is to either update to the latest mainline kernel, which breaks some backup software (Acronis) due the mainline kernel being too new to be supported, or ...

If the mainline kernel is too new, you might want to give elrepo's kernel-lt (1) a try. It is currently at 5.4.273.el8.

(1) https://elrepo.org/wiki/doku.php?id=kernel-lt

toracat

2024-06-25 17:17

reporter   ~0001047

I have built the kmod-aacraid package using the patch referenced in https://bugzilla.kernel.org/show_bug.cgi?id=217599 (comment c63) and released it to the elrepo testing repository.

If you have elrepo enabled, you can install it by running:

sudo dnf --enablerepo=elrepo-testing install kmod-aacraid

Or you can download the kmod rpm:

https://elrepo.org/linux/testing/el8/x86_64/RPMS/kmod-aacraid-1.2.1-11.1.el8_10.elrepo.x86_64.rpm

toracat

2024-06-27 18:48

reporter   ~0001049

@zpalmerlw

Once I get a positive response, I will move the kmod package to the main repository.

toracat

2024-07-16 02:45

reporter   ~0001054

The kmod-aacraid package has now been moved to the elrepo main repository.

Issue History

Date Modified Username Field Change
2024-04-12 20:57 zpalmerlw New Issue
2024-04-12 23:06 toracat Note Added: 0001024
2024-06-25 17:17 toracat Note Added: 0001047
2024-06-27 18:48 toracat Note Added: 0001049
2024-07-16 02:45 toracat Note Added: 0001054