View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000442 | AlmaLinux-9 | -OTHER | public | 2023-11-14 18:34 | 2023-11-20 22:04 |
Reporter | aaack | Assigned To | alukoshko | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | feedback | Resolution | open | ||
Platform | Dell PowerEdge R740xd | OS | AlmaLinux | OS Version | 9.3 |
Summary | 0000442: Getting Dell hw error (from iDRAC and IPMI sel log) after patching into Alma 9.3 (from Alma 9.2) | ||||
Description | iDRAC reports as "A fatal error was detected on a component at bus 2 device 0 function 0.", the IPMI sel log entry has "8 | 11/14/2023 | 14:28:36 | Critical Interrupt #0x38 | | Asserted". With this logging, the system report "System has critical issues". So far we have not encountered any issue impacting operation of the systems or hosted apps. | ||||
Steps To Reproduce | Patch/upgrade to Alma 9.3 and reboot system. The event is logged on boot. | ||||
Tags | No tags attached. | ||||
|
Cannot confirm completely, but this may be the error in boot logging... [ 0.696048] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Er ror Source: 4 [ 0.696049] {1}[Hardware Error]: It has been corrected by h/w and requires no further action [ 0.696051] {1}[Hardware Error]: event severity: corrected [ 0.696052] {1}[Hardware Error]: Error 0, type: corrected [ 0.696053] {1}[Hardware Error]: section_type: PCIe error [ 0.696054] {1}[Hardware Error]: port_type: 7, PCIe to PCI/PCI-X bridge [ 0.696055] {1}[Hardware Error]: version: 3.0 [ 0.696056] {1}[Hardware Error]: command: 0x0007, status: 0x0010 [ 0.696057] {1}[Hardware Error]: device_id: 0000:02:00.0 [ 0.696058] {1}[Hardware Error]: slot: 0 [ 0.696058] {1}[Hardware Error]: secondary_bus: 0x03 [ 0.696059] {1}[Hardware Error]: vendor_id: 0x1556, device_id: 0xbe00 [ 0.696060] {1}[Hardware Error]: class_code: 060400 [ 0.696060] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x001b [ 0.696489] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. |
|
Hello. Can it be coincidence? Or it happens every time you boot into new kernel and never happen if you boot to previous 9.2 kernel? |
|
Shutdown/powered down system, cleared hw log, from DRAC state, system returned to green (Healthy), started system, selected Alma 9.2 kernel, system came up and appears operational, rebooted back to default Alma 9.3 kernel and DRAC alerts with "A fatal error was detected on a component at bus 2 device 0 function 0." and IPMI sel log has entries as above. |
|
Missed statement that Alma 9.2 kernel did not trigger the DRAC reported hw event and system was still healthy from DRAC state. |
|
From other sytems that have run for a few days (since Nov 13), the hw log entries only occur during boot. It does not appear to be a reoccurring event. This is the sel log from one of these systems... [root@idx-prod-12 ~]# ipmitool sel list 1 | 07/02/2021 | 15:37:40 | Event Logging Disabled #0x72 | Log area reset/cleared | Asserted 2 | 08/06/2021 | 01:59:18 | OS Boot | Installation started | Asserted 3 | 08/06/2021 | 02:03:36 | OS Boot | Installation completed | Asserted 4 | 08/23/2021 | 09:33:54 | OS Boot | Installation started | Asserted 5 | 08/23/2021 | 09:38:24 | OS Boot | Installation completed | Asserted 6 | 11/13/2023 | 11:18:13 | OS Boot | Installation started | Asserted 7 | 11/13/2023 | 11:33:46 | OS Boot | Installation completed | Asserted 8 | 11/13/2023 | 16:02:11 | Critical Interrupt #0x38 | | Asserted 9 | 11/13/2023 | 16:02:11 | Unknown #0x1a | | Asserted a | 11/13/2023 | 16:02:12 | Unknown #0x1a | | Asserted b | 11/13/2023 | 16:02:12 | Critical Interrupt #0x38 | | Asserted c | 11/13/2023 | 16:02:12 | Unknown #0x1a | | Asserted d | 11/13/2023 | 16:02:13 | Unknown #0x1a | | Asserted e | 11/13/2023 | 16:02:13 | Critical Interrupt #0x38 | | Asserted f | 11/13/2023 | 16:02:13 | Unknown #0x1a | | Asserted 10 | 11/13/2023 | 16:02:13 | Unknown #0x1a | | Asserted 11 | 11/13/2023 | 16:02:13 | Critical Interrupt #0x38 | | Asserted 12 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted 13 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted 14 | 11/13/2023 | 16:02:14 | Critical Interrupt #0x38 | | Asserted 15 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted 16 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted 17 | 11/13/2023 | 16:02:14 | Critical Interrupt #0x38 | | Asserted 18 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted 19 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted 1a | 11/13/2023 | 16:02:15 | Critical Interrupt #0x38 | | Asserted 1b | 11/13/2023 | 16:02:15 | Unknown #0x1a | | Asserted 1c | 11/13/2023 | 16:02:15 | Unknown #0x1a | | Asserted 1d | 11/13/2023 | 16:02:15 | Critical Interrupt #0x38 | | Asserted 1e | 11/13/2023 | 16:02:15 | Unknown #0x1a | | Asserted 1f | 11/13/2023 | 16:02:15 | Unknown #0x1a | | Asserted 20 | 11/13/2023 | 16:02:16 | Critical Interrupt #0x38 | | Asserted 21 | 11/13/2023 | 16:02:16 | Unknown #0x1a | | Asserted 22 | 11/13/2023 | 16:02:16 | Unknown #0x1a | | Asserted 23 | 11/13/2023 | 16:02:16 | Critical Interrupt #0x38 | | Asserted 24 | 11/13/2023 | 16:02:16 | Unknown #0x1a | | Asserted 25 | 11/13/2023 | 16:02:16 | Unknown #0x1a | | Asserted |
|
If the bus reference is PCI slot... [root@idx-prod-12 ~]# lspci | grep "^02" 02:00.0 PCI bridge: PLDA PCI Express Bridge (rev 02) |
|
Tracked down a not currently in use Dell R740xd, installed RHEL9.3 and the same error is triggered. One of my teammates is opening a case with Redhat support. |
|
Thanks a lot! |
Date Modified | Username | Field | Change |
---|---|---|---|
2023-11-14 18:34 | aaack | New Issue | |
2023-11-14 18:46 | aaack | Note Added: 0000992 | |
2023-11-14 19:28 | alukoshko | Note Added: 0000993 | |
2023-11-15 14:32 | aaack | Note Added: 0000994 | |
2023-11-15 14:33 | aaack | Note Added: 0000995 | |
2023-11-15 14:36 | aaack | Note Added: 0000996 | |
2023-11-15 14:41 | aaack | Note Added: 0000997 | |
2023-11-16 17:25 | aaack | Note Added: 0000998 | |
2023-11-20 22:03 | alukoshko | Note Added: 0000999 | |
2023-11-20 22:04 | alukoshko | Assigned To | => alukoshko |
2023-11-20 22:04 | alukoshko | Status | new => feedback |