View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000371 | AlmaLinux-9 | kernel | public | 2023-02-21 16:30 | 2023-03-11 21:10 |
Reporter | John Gong | Assigned To | |||
Priority | high | Severity | minor | Reproducibility | always |
Status | new | Resolution | open | ||
Summary | 0000371: kernel complains "hardware error" when boots up | ||||
Description | With the 5.14.0-162.6.1.el9 and 5.14.0-162.12.1.el9 kernels coming with AlmaLinux 9.1 on Ampere Altra machine, it complains below: ------------------------------------------------------------------------------- Tue Feb 7 09:45:21 CST 2023 [ 16.412276] {1}[Hardware Error]: event severity: recoverable Tue Feb 7 09:45:21 CST 2023 [ 16.417922] {1}[Hardware Error]: Error 0, type: recoverable Tue Feb 7 09:45:21 CST 2023 [ 16.423569] {1}[Hardware Error]: section type: unknown, e8ed898d-df16-43cc-8ecc-54f060ef157f Tue Feb 7 09:45:21 CST 2023 [ 16.432169] {1}[Hardware Error]: section length: 0x30 Tue Feb 7 09:45:21 CST 2023 [ 16.437384] {1}[Hardware Error]: 00000000: 00000005 ec30000e 00080110 80001001 ......0......... Tue Feb 7 09:45:21 CST 2023 [ 16.446330] {1}[Hardware Error]: 00000010: 00000300 00000000 00000000 00000000 ................ Tue Feb 7 09:45:21 CST 2023 [ 16.455274] {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................ ------------------------------------------------------------------------------- This error is introduced from https://kojihub.stream.centos.org/koji/buildinfo?buildID=23939 and still exist on the latest centos kernel. | ||||
Steps To Reproduce | using 5.14.0-162.6.1.el9 or 5.14.0-162.12.1.el9 on Ampere Altra machine, boot up the machine, then it will print up error info. | ||||
Tags | kernel | ||||
|
@John Gong This is just to confirm ... That link points to kernel-5.14.0-136.el9. So kernel-5.14.0-135.el9 did not have the problem reported here? |
|
Yes, kernel-5.14.0-135.el9 has no this issue. |
|
@John Gong To test if the fix for the bug has been added to the upstream kernel from kernel.org, can you do a test-install of kernel-ml from elrepo? # dnf install https://www.elrepo.org/elrepo-release-9.el9.elrepo.noarch.rpm # dnf --enablerepo=elrepo-kernel install kernel-ml That will install the latest kernel (6.2.1 at the moment). Please note that kernel-ml does not work if secure boot is enabled. |
|
Sorry for replying late. RedHat has already located the upstream commit id that be backported to 5.14.0-162.6.1.el9 and causes this issue. c733ebb7cb67dfb146a07c0ae329a0de9ec52f36 is the upstream commit id, and it still exists in the latest upstream kernel. So next step is to communicate with Linux kernel guys to fix this issue. I will report this bug to LKML. Thanks! |
|
Thank you for sharing the info. Glad to learn the problematic commit has been identified. Hope the kernel devs can fix the issue quickly. |
|
Ref: https://lkml.org/lkml/2023/3/2/892 "Error reports at boot time in Ampere Altra machines since c733ebb7c" |
|
Some additional notes from Darren Hart (Ampere): "Just to give a bit more detail here, these messages look scary, but they are benign as the error is managed by hardware and has no adverse effects to software other than the severe looking messages reported." |
Date Modified | Username | Field | Change |
---|---|---|---|
2023-02-21 16:30 | John Gong | New Issue | |
2023-02-21 16:30 | John Gong | Tag Attached: kernel | |
2023-02-22 18:53 | toracat | Note Added: 0000823 | |
2023-02-27 03:14 | John Gong | Note Added: 0000826 | |
2023-02-28 19:47 | alukoshko | Description Updated | |
2023-03-02 02:23 | toracat | Note Added: 0000828 | |
2023-03-03 02:23 | John Gong | Note Added: 0000830 | |
2023-03-03 02:41 | toracat | Note Added: 0000831 | |
2023-03-04 19:51 | toracat | Note Added: 0000832 | |
2023-03-11 21:10 | toracat | Note Added: 0000839 |