View Issue Details

IDProjectCategoryView StatusLast Update
0000142AlmaLinux-8kernelpublic2022-02-17 21:17
Reportersbhat Assigned To 
PriorityhighSeveritymajorReproducibilityalways
Status newResolutionopen 
Summary0000142: Intermittent I/O block, timeout of disks in ZFS on Almalinux8
DescriptionI have two servers A & B and both the servers intermittently rejecting IO, aborting IO commands & causing hiccup in the IO. zpool status turns to warning due to this incident.
Server A:
[root@server-A]$/opt/MegaRAID/storcli/storcli64 /c0 show
CLI Version = 007.1907.0000.0000 Sep 13, 2021
Operating system = Linux 4.18.0-305.el8.x86_64
Controller = 0
Status = Success
Description = None

Product Name = LSI3008-IR
Serial Number = 500304801cca7901
SAS Address = 500304801cca7901
PCI Address = 00:01:00:00
System Time = 11/03/2021 10:43:37
FW Package Build = 00.00.00.00
FW Version = 13.00.00.00
BIOS Version = 08.31.00.00
NVDATA Version = 11.02.49.39
Driver Name = mpt3sas
Driver Version = 35.101.00.00
Bus Number = 1
Device Number = 0
Function Number = 0
Domain ID = 0
Vendor Id = 0x1000
Device Id = 0x97
SubVendor Id = 0x15D9
SubDevice Id = 0x808
Board Name = LSI3008-IR
Board Assembly = N/A
Board Tracer Number = N/A
Security Protocol = None
Physical Drives = 14


Server A Intermittent Errors:
[Wed Nov 3 03:25:16 2021] mpt3sas_cm0: log_info(0x31120b10): originator(PL), code(0x12), sub_code(0x0b10)
[Wed Nov 3 03:25:16 2021] mpt3sas_cm0: log_info(0x31120b10): originator(PL), code(0x12), sub_code(0x0b10)
[Wed Nov 3 03:25:16 2021] mpt3sas_cm0: log_info(0x31120b10): originator(PL), code(0x12), sub_code(0x0b10)
[Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2497 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s
[Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2498 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s
[Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2547 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s
[Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2498 CDB: Write(16) 8a 00 00 00 00 01 52 7f 81 a2 00 00 00 11 00 00
[Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2547 CDB: Write(16) 8a 00 00 00 00 01 52 7f 81 91 00 00 00 11 00 00
[Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2497 CDB: Write(16) 8a 00 00 00 00 01 52 7f 81 b3 00 00 00 11 00 00
[Wed Nov 3 03:25:16 2021] blk_update_request: I/O error, dev sdk, sector 5679055249 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[Wed Nov 3 03:25:16 2021] blk_update_request: I/O error, dev sdk, sector 5679055283 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[Wed Nov 3 03:25:16 2021] blk_update_request: I/O error, dev sdk, sector 5679055266 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[Wed Nov 3 03:25:16 2021] zio pool=backups vdev=/dev/sdk1 error=5 type=2 offset=2907675256320 size=8704 flags=180880
[Wed Nov 3 03:25:16 2021] zio pool=backups vdev=/dev/sdk1 error=5 type=2 offset=2907675247616 size=8704 flags=180880
[Wed Nov 3 03:25:16 2021] zio pool=backups vdev=/dev/sdk1 error=5 type=2 offset=2907675238912 size=8704 flags=180880
[Wed Nov 3 03:25:17 2021] sd 0:0:10:0: Power-on or device reset occurred





Server B:
[root@server-B]$ /opt/MegaRAID/storcli/storcli64 /c0 show
CLI Version = 007.1907.0000.0000 Sep 13, 2021
Operating system = Linux 4.18.0-305.10.2.el8_4.x86_64
Controller = 0
Status = Success
Description = None

Product Name = SAS9300-4i
Serial Number = SP83910736
SAS Address = 500605b00e807090
PCI Address = 00:01:00:00
System Time = 11/03/2021 04:31:03
FW Package Build = 00.00.00.00
FW Version = 16.00.01.00 <--------<
BIOS Version = 08.37.00.00_18.00.00.00
NVDATA Version = 14.01.00.07
Driver Name = mpt3sas
Driver Version = 35.101.00.00
Bus Number = 1
Device Number = 0
Function Number = 0
Domain ID = 0
Vendor Id = 0x1000
Device Id = 0x96
SubVendor Id = 0x1000
SubDevice Id = 0x3110
Board Name = SAS9300-4i
Board Assembly = H3-25473-00G
Board Tracer Number = SP83910736
Security Protocol = None
Physical Drives = 12

Server B Intermittent Errors:
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5085 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: handle(0x0006), sas_address(0x50030480180ec340), phy(0)
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure logical id(0x50030480180ec37f), slot(0)
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure level(0x0000), connector name( )
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: task abort: SUCCESS scmd(0x000000004e58e27c)
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: attempting task abort!scmd(0x00000000e6d2bb01), outstanding for 30408 ms & timeout 30000 ms
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5117 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e1 f8 00 00 00 40 00 00
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: handle(0x0006), sas_address(0x50030480180ec340), phy(0)
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure logical id(0x50030480180ec37f), slot(0)
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure level(0x0000), connector name( )
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: No reference found at driver, assuming scmd(0x00000000e6d2bb01) might have completed
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: task abort: SUCCESS scmd(0x00000000e6d2bb01)
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5117 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5117 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e1 f8 00 00 00 40 00 00
[Wed Nov 3 06:41:07 2021] blk_update_request: I/O error, dev sdc, sector 11320484344 op 0x1:(WRITE) flags 0x700 phys_seg 8 prio class 0
[Wed Nov 3 06:41:07 2021] zio pool=backups vdev=/dev/sdc1 error=5 type=2 offset=5796086935552 size=32768 flags=180880
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: attempting task abort!scmd(0x00000000c1d8d569), outstanding for 30408 ms & timeout 30000 ms
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5116 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e2 78 00 00 00 40 00 00
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: handle(0x0006), sas_address(0x50030480180ec340), phy(0)
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure logical id(0x50030480180ec37f), slot(0)
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure level(0x0000), connector name( )
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: No reference found at driver, assuming scmd(0x00000000c1d8d569) might have completed
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: task abort: SUCCESS scmd(0x00000000c1d8d569)
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5116 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5116 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e2 78 00 00 00 40 00 00
[Wed Nov 3 06:41:07 2021] blk_update_request: I/O error, dev sdc, sector 11320484472 op 0x1:(WRITE) flags 0x700 phys_seg 8 prio class 0
[Wed Nov 3 06:41:07 2021] zio pool=backups vdev=/dev/sdc1 error=5 type=2 offset=5796087001088 size=32768 flags=180880
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: attempting task abort!scmd(0x000000000a3a05e3), outstanding for 30533 ms & timeout 30000 ms
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5115 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e2 38 00 00 00 40 00 00
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: handle(0x0006), sas_address(0x50030480180ec340), phy(0)
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure logical id(0x50030480180ec37f), slot(0)
[Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure level(0x0000), connector name( )
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: No reference found at driver, assuming scmd(0x000000000a3a05e3) might have completed
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: task abort: SUCCESS scmd(0x000000000a3a05e3)
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5115 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
[Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5115 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e2 38 00 00 00 40 00 00
[Wed Nov 3 06:41:07 2021] blk_update_request: I/O error, dev sdc, sector 11320484408 op 0x1:(WRITE) flags 0x700 phys_seg 8 prio class 0
[Wed Nov 3 06:41:07 2021] zio pool=backups vdev=/dev/sdc1 error=5 type=2 offset=5796086968320 size=32768 flags=180880
[Wed Nov 3 06:41:08 2021] sd 0:0:0:0: Power-on or device reset occurred
Steps To ReproduceSupermicro Almalinux8 Server with ZFS
Additional Information1. Both servers A,B are Broadcom HBA controllers communicate to kernel via mpt3sas driver.
2. We have another server C which has megaraid_sas as driver & not seeing this issue.
3. We have 5 OmniOS (solaris based) servers with both mpt3sas/megaraid_sas drivers but not seeing any IO/disk hiccups at all.
4. smartctl says drives are all good.
5. The IO isn't for one disk but for all the disks randomly.
6. Server has no special IO write or anything, the timing is random too, as if disks are waking up from sleep.

Any suggestions will be very greatly appreciated.
Tagsalmalinux8, disk, io-abort, megaraidsas, mpt3sas, smartctl, zfs
abrt_hash
URL

Activities

sbhat

2021-11-09 14:13

reporter   ~0000377

Anyone? help

alukoshko

2021-12-01 08:32

administrator   ~0000435

Hello. Have you tried AlmaLinux 8.5? It has updated kernel.
And it would be great to try to reproduce the same on CentOS or RHEL to understand if it's AlmaLinux specific or upstream issue.

sbhat

2021-12-01 12:01

reporter   ~0000437

The current OS is Alma 8.4. We haven't tried to reproduce it on centos/rhel since this happens on a zfs storage with large data sets. It doesn't happen on OmniOS though. Anyway, thanks!

toracat

2021-12-12 18:22

reporter   ~0000444

Another thing you can do is to test-install ELRepo's kernel-ml. With this, you'll be testing the latest stable kernel from kernel.org.

Issue History

Date Modified Username Field Change
2021-11-04 05:43 sbhat New Issue
2021-11-04 05:43 sbhat Tag Attached: almalinux8
2021-11-04 05:43 sbhat Tag Attached: disk
2021-11-04 05:43 sbhat Tag Attached: io-abort
2021-11-04 05:43 sbhat Tag Attached: megaraidsas
2021-11-04 05:43 sbhat Tag Attached: mpt3sas
2021-11-04 05:43 sbhat Tag Attached: smartctl
2021-11-04 05:43 sbhat Tag Attached: zfs
2021-11-09 14:13 sbhat Note Added: 0000377
2021-12-01 08:32 alukoshko Note Added: 0000435
2021-12-01 12:01 sbhat Note Added: 0000437
2021-12-12 18:22 toracat Note Added: 0000444