View Issue Details

IDProjectCategoryView StatusLast Update
0000442AlmaLinux-9-OTHERpublic2023-11-20 22:04
Reporteraaack Assigned Toalukoshko  
PrioritynormalSeverityminorReproducibilityalways
Status feedbackResolutionopen 
PlatformDell PowerEdge R740xdOSAlmaLinuxOS Version9.3
Summary0000442: Getting Dell hw error (from iDRAC and IPMI sel log) after patching into Alma 9.3 (from Alma 9.2)
DescriptioniDRAC reports as "A fatal error was detected on a component at bus 2 device 0 function 0.", the IPMI sel log entry has "8 | 11/14/2023 | 14:28:36 | Critical Interrupt #0x38 | | Asserted". With this logging, the system report "System has critical issues".

So far we have not encountered any issue impacting operation of the systems or hosted apps.
Steps To ReproducePatch/upgrade to Alma 9.3 and reboot system. The event is logged on boot.
TagsNo tags attached.

Activities

aaack

2023-11-14 18:46

reporter   ~0000992

Cannot confirm completely, but this may be the error in boot logging...

[ 0.696048] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Er
ror Source: 4
[ 0.696049] {1}[Hardware Error]: It has been corrected by h/w and requires no
 further action
[ 0.696051] {1}[Hardware Error]: event severity: corrected
[ 0.696052] {1}[Hardware Error]: Error 0, type: corrected
[ 0.696053] {1}[Hardware Error]: section_type: PCIe error
[ 0.696054] {1}[Hardware Error]: port_type: 7, PCIe to PCI/PCI-X bridge
[ 0.696055] {1}[Hardware Error]: version: 3.0
[ 0.696056] {1}[Hardware Error]: command: 0x0007, status: 0x0010
[ 0.696057] {1}[Hardware Error]: device_id: 0000:02:00.0
[ 0.696058] {1}[Hardware Error]: slot: 0
[ 0.696058] {1}[Hardware Error]: secondary_bus: 0x03
[ 0.696059] {1}[Hardware Error]: vendor_id: 0x1556, device_id: 0xbe00
[ 0.696060] {1}[Hardware Error]: class_code: 060400
[ 0.696060] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x001b
[ 0.696489] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.

alukoshko

2023-11-14 19:28

administrator   ~0000993

Hello. Can it be coincidence? Or it happens every time you boot into new kernel and never happen if you boot to previous 9.2 kernel?

aaack

2023-11-15 14:32

reporter   ~0000994

Shutdown/powered down system, cleared hw log, from DRAC state, system returned to green (Healthy), started system, selected Alma 9.2 kernel, system came up and appears operational, rebooted back to default Alma 9.3 kernel and DRAC alerts with "A fatal error was detected on a component at bus 2 device 0 function 0." and IPMI sel log has entries as above.

aaack

2023-11-15 14:33

reporter   ~0000995

Missed statement that Alma 9.2 kernel did not trigger the DRAC reported hw event and system was still healthy from DRAC state.

aaack

2023-11-15 14:36

reporter   ~0000996

From other sytems that have run for a few days (since Nov 13), the hw log entries only occur during boot. It does not appear to be a reoccurring event. This is the sel log from one of these systems...

[root@idx-prod-12 ~]# ipmitool sel list
   1 | 07/02/2021 | 15:37:40 | Event Logging Disabled #0x72 | Log area reset/cleared | Asserted
   2 | 08/06/2021 | 01:59:18 | OS Boot | Installation started | Asserted
   3 | 08/06/2021 | 02:03:36 | OS Boot | Installation completed | Asserted
   4 | 08/23/2021 | 09:33:54 | OS Boot | Installation started | Asserted
   5 | 08/23/2021 | 09:38:24 | OS Boot | Installation completed | Asserted
   6 | 11/13/2023 | 11:18:13 | OS Boot | Installation started | Asserted
   7 | 11/13/2023 | 11:33:46 | OS Boot | Installation completed | Asserted
   8 | 11/13/2023 | 16:02:11 | Critical Interrupt #0x38 | | Asserted
   9 | 11/13/2023 | 16:02:11 | Unknown #0x1a | | Asserted
   a | 11/13/2023 | 16:02:12 | Unknown #0x1a | | Asserted
   b | 11/13/2023 | 16:02:12 | Critical Interrupt #0x38 | | Asserted
   c | 11/13/2023 | 16:02:12 | Unknown #0x1a | | Asserted
   d | 11/13/2023 | 16:02:13 | Unknown #0x1a | | Asserted
   e | 11/13/2023 | 16:02:13 | Critical Interrupt #0x38 | | Asserted
   f | 11/13/2023 | 16:02:13 | Unknown #0x1a | | Asserted
  10 | 11/13/2023 | 16:02:13 | Unknown #0x1a | | Asserted
  11 | 11/13/2023 | 16:02:13 | Critical Interrupt #0x38 | | Asserted
  12 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted
  13 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted
  14 | 11/13/2023 | 16:02:14 | Critical Interrupt #0x38 | | Asserted
  15 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted
  16 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted
  17 | 11/13/2023 | 16:02:14 | Critical Interrupt #0x38 | | Asserted
  18 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted
  19 | 11/13/2023 | 16:02:14 | Unknown #0x1a | | Asserted
  1a | 11/13/2023 | 16:02:15 | Critical Interrupt #0x38 | | Asserted
  1b | 11/13/2023 | 16:02:15 | Unknown #0x1a | | Asserted
  1c | 11/13/2023 | 16:02:15 | Unknown #0x1a | | Asserted
  1d | 11/13/2023 | 16:02:15 | Critical Interrupt #0x38 | | Asserted
  1e | 11/13/2023 | 16:02:15 | Unknown #0x1a | | Asserted
  1f | 11/13/2023 | 16:02:15 | Unknown #0x1a | | Asserted
  20 | 11/13/2023 | 16:02:16 | Critical Interrupt #0x38 | | Asserted
  21 | 11/13/2023 | 16:02:16 | Unknown #0x1a | | Asserted
  22 | 11/13/2023 | 16:02:16 | Unknown #0x1a | | Asserted
  23 | 11/13/2023 | 16:02:16 | Critical Interrupt #0x38 | | Asserted
  24 | 11/13/2023 | 16:02:16 | Unknown #0x1a | | Asserted
  25 | 11/13/2023 | 16:02:16 | Unknown #0x1a | | Asserted

aaack

2023-11-15 14:41

reporter   ~0000997

If the bus reference is PCI slot...

[root@idx-prod-12 ~]# lspci | grep "^02"
02:00.0 PCI bridge: PLDA PCI Express Bridge (rev 02)

aaack

2023-11-16 17:25

reporter   ~0000998

Tracked down a not currently in use Dell R740xd, installed RHEL9.3 and the same error is triggered. One of my teammates is opening a case with Redhat support.

alukoshko

2023-11-20 22:03

administrator   ~0000999

Thanks a lot!

Issue History

Date Modified Username Field Change
2023-11-14 18:34 aaack New Issue
2023-11-14 18:46 aaack Note Added: 0000992
2023-11-14 19:28 alukoshko Note Added: 0000993
2023-11-15 14:32 aaack Note Added: 0000994
2023-11-15 14:33 aaack Note Added: 0000995
2023-11-15 14:36 aaack Note Added: 0000996
2023-11-15 14:41 aaack Note Added: 0000997
2023-11-16 17:25 aaack Note Added: 0000998
2023-11-20 22:03 alukoshko Note Added: 0000999
2023-11-20 22:04 alukoshko Assigned To => alukoshko
2023-11-20 22:04 alukoshko Status new => feedback