View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000142 | AlmaLinux-8 | kernel | public | 2021-11-04 05:43 | 2022-02-17 21:17 |
Reporter | sbhat | Assigned To | |||
Priority | high | Severity | major | Reproducibility | always |
Status | new | Resolution | open | ||
Summary | 0000142: Intermittent I/O block, timeout of disks in ZFS on Almalinux8 | ||||
Description | I have two servers A & B and both the servers intermittently rejecting IO, aborting IO commands & causing hiccup in the IO. zpool status turns to warning due to this incident. Server A: [root@server-A]$/opt/MegaRAID/storcli/storcli64 /c0 show CLI Version = 007.1907.0000.0000 Sep 13, 2021 Operating system = Linux 4.18.0-305.el8.x86_64 Controller = 0 Status = Success Description = None Product Name = LSI3008-IR Serial Number = 500304801cca7901 SAS Address = 500304801cca7901 PCI Address = 00:01:00:00 System Time = 11/03/2021 10:43:37 FW Package Build = 00.00.00.00 FW Version = 13.00.00.00 BIOS Version = 08.31.00.00 NVDATA Version = 11.02.49.39 Driver Name = mpt3sas Driver Version = 35.101.00.00 Bus Number = 1 Device Number = 0 Function Number = 0 Domain ID = 0 Vendor Id = 0x1000 Device Id = 0x97 SubVendor Id = 0x15D9 SubDevice Id = 0x808 Board Name = LSI3008-IR Board Assembly = N/A Board Tracer Number = N/A Security Protocol = None Physical Drives = 14 Server A Intermittent Errors: [Wed Nov 3 03:25:16 2021] mpt3sas_cm0: log_info(0x31120b10): originator(PL), code(0x12), sub_code(0x0b10) [Wed Nov 3 03:25:16 2021] mpt3sas_cm0: log_info(0x31120b10): originator(PL), code(0x12), sub_code(0x0b10) [Wed Nov 3 03:25:16 2021] mpt3sas_cm0: log_info(0x31120b10): originator(PL), code(0x12), sub_code(0x0b10) [Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2497 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s [Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2498 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s [Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2547 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s [Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2498 CDB: Write(16) 8a 00 00 00 00 01 52 7f 81 a2 00 00 00 11 00 00 [Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2547 CDB: Write(16) 8a 00 00 00 00 01 52 7f 81 91 00 00 00 11 00 00 [Wed Nov 3 03:25:16 2021] sd 0:0:10:0: [sdk] tag#2497 CDB: Write(16) 8a 00 00 00 00 01 52 7f 81 b3 00 00 00 11 00 00 [Wed Nov 3 03:25:16 2021] blk_update_request: I/O error, dev sdk, sector 5679055249 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0 [Wed Nov 3 03:25:16 2021] blk_update_request: I/O error, dev sdk, sector 5679055283 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0 [Wed Nov 3 03:25:16 2021] blk_update_request: I/O error, dev sdk, sector 5679055266 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0 [Wed Nov 3 03:25:16 2021] zio pool=backups vdev=/dev/sdk1 error=5 type=2 offset=2907675256320 size=8704 flags=180880 [Wed Nov 3 03:25:16 2021] zio pool=backups vdev=/dev/sdk1 error=5 type=2 offset=2907675247616 size=8704 flags=180880 [Wed Nov 3 03:25:16 2021] zio pool=backups vdev=/dev/sdk1 error=5 type=2 offset=2907675238912 size=8704 flags=180880 [Wed Nov 3 03:25:17 2021] sd 0:0:10:0: Power-on or device reset occurred Server B: [root@server-B]$ /opt/MegaRAID/storcli/storcli64 /c0 show CLI Version = 007.1907.0000.0000 Sep 13, 2021 Operating system = Linux 4.18.0-305.10.2.el8_4.x86_64 Controller = 0 Status = Success Description = None Product Name = SAS9300-4i Serial Number = SP83910736 SAS Address = 500605b00e807090 PCI Address = 00:01:00:00 System Time = 11/03/2021 04:31:03 FW Package Build = 00.00.00.00 FW Version = 16.00.01.00 <--------< BIOS Version = 08.37.00.00_18.00.00.00 NVDATA Version = 14.01.00.07 Driver Name = mpt3sas Driver Version = 35.101.00.00 Bus Number = 1 Device Number = 0 Function Number = 0 Domain ID = 0 Vendor Id = 0x1000 Device Id = 0x96 SubVendor Id = 0x1000 SubDevice Id = 0x3110 Board Name = SAS9300-4i Board Assembly = H3-25473-00G Board Tracer Number = SP83910736 Security Protocol = None Physical Drives = 12 Server B Intermittent Errors: [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5085 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [Wed Nov 3 06:41:07 2021] scsi target0:0:0: handle(0x0006), sas_address(0x50030480180ec340), phy(0) [Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure logical id(0x50030480180ec37f), slot(0) [Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure level(0x0000), connector name( ) [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: task abort: SUCCESS scmd(0x000000004e58e27c) [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: attempting task abort!scmd(0x00000000e6d2bb01), outstanding for 30408 ms & timeout 30000 ms [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5117 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e1 f8 00 00 00 40 00 00 [Wed Nov 3 06:41:07 2021] scsi target0:0:0: handle(0x0006), sas_address(0x50030480180ec340), phy(0) [Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure logical id(0x50030480180ec37f), slot(0) [Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure level(0x0000), connector name( ) [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: No reference found at driver, assuming scmd(0x00000000e6d2bb01) might have completed [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: task abort: SUCCESS scmd(0x00000000e6d2bb01) [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5117 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5117 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e1 f8 00 00 00 40 00 00 [Wed Nov 3 06:41:07 2021] blk_update_request: I/O error, dev sdc, sector 11320484344 op 0x1:(WRITE) flags 0x700 phys_seg 8 prio class 0 [Wed Nov 3 06:41:07 2021] zio pool=backups vdev=/dev/sdc1 error=5 type=2 offset=5796086935552 size=32768 flags=180880 [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: attempting task abort!scmd(0x00000000c1d8d569), outstanding for 30408 ms & timeout 30000 ms [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5116 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e2 78 00 00 00 40 00 00 [Wed Nov 3 06:41:07 2021] scsi target0:0:0: handle(0x0006), sas_address(0x50030480180ec340), phy(0) [Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure logical id(0x50030480180ec37f), slot(0) [Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure level(0x0000), connector name( ) [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: No reference found at driver, assuming scmd(0x00000000c1d8d569) might have completed [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: task abort: SUCCESS scmd(0x00000000c1d8d569) [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5116 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5116 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e2 78 00 00 00 40 00 00 [Wed Nov 3 06:41:07 2021] blk_update_request: I/O error, dev sdc, sector 11320484472 op 0x1:(WRITE) flags 0x700 phys_seg 8 prio class 0 [Wed Nov 3 06:41:07 2021] zio pool=backups vdev=/dev/sdc1 error=5 type=2 offset=5796087001088 size=32768 flags=180880 [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: attempting task abort!scmd(0x000000000a3a05e3), outstanding for 30533 ms & timeout 30000 ms [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5115 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e2 38 00 00 00 40 00 00 [Wed Nov 3 06:41:07 2021] scsi target0:0:0: handle(0x0006), sas_address(0x50030480180ec340), phy(0) [Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure logical id(0x50030480180ec37f), slot(0) [Wed Nov 3 06:41:07 2021] scsi target0:0:0: enclosure level(0x0000), connector name( ) [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: No reference found at driver, assuming scmd(0x000000000a3a05e3) might have completed [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: task abort: SUCCESS scmd(0x000000000a3a05e3) [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5115 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s [Wed Nov 3 06:41:07 2021] sd 0:0:0:0: [sdc] tag#5115 CDB: Write(16) 8a 00 00 00 00 02 a2 c0 e2 38 00 00 00 40 00 00 [Wed Nov 3 06:41:07 2021] blk_update_request: I/O error, dev sdc, sector 11320484408 op 0x1:(WRITE) flags 0x700 phys_seg 8 prio class 0 [Wed Nov 3 06:41:07 2021] zio pool=backups vdev=/dev/sdc1 error=5 type=2 offset=5796086968320 size=32768 flags=180880 [Wed Nov 3 06:41:08 2021] sd 0:0:0:0: Power-on or device reset occurred | ||||
Steps To Reproduce | Supermicro Almalinux8 Server with ZFS | ||||
Additional Information | 1. Both servers A,B are Broadcom HBA controllers communicate to kernel via mpt3sas driver. 2. We have another server C which has megaraid_sas as driver & not seeing this issue. 3. We have 5 OmniOS (solaris based) servers with both mpt3sas/megaraid_sas drivers but not seeing any IO/disk hiccups at all. 4. smartctl says drives are all good. 5. The IO isn't for one disk but for all the disks randomly. 6. Server has no special IO write or anything, the timing is random too, as if disks are waking up from sleep. Any suggestions will be very greatly appreciated. | ||||
Tags | almalinux8, disk, io-abort, megaraidsas, mpt3sas, smartctl, zfs | ||||
abrt_hash | |||||
URL | |||||
|
Anyone? help |
|
Hello. Have you tried AlmaLinux 8.5? It has updated kernel. And it would be great to try to reproduce the same on CentOS or RHEL to understand if it's AlmaLinux specific or upstream issue. |
|
The current OS is Alma 8.4. We haven't tried to reproduce it on centos/rhel since this happens on a zfs storage with large data sets. It doesn't happen on OmniOS though. Anyway, thanks! |
|
Another thing you can do is to test-install ELRepo's kernel-ml. With this, you'll be testing the latest stable kernel from kernel.org. |
Date Modified | Username | Field | Change |
---|---|---|---|
2021-11-04 05:43 | sbhat | New Issue | |
2021-11-04 05:43 | sbhat | Tag Attached: almalinux8 | |
2021-11-04 05:43 | sbhat | Tag Attached: disk | |
2021-11-04 05:43 | sbhat | Tag Attached: io-abort | |
2021-11-04 05:43 | sbhat | Tag Attached: megaraidsas | |
2021-11-04 05:43 | sbhat | Tag Attached: mpt3sas | |
2021-11-04 05:43 | sbhat | Tag Attached: smartctl | |
2021-11-04 05:43 | sbhat | Tag Attached: zfs | |
2021-11-09 14:13 | sbhat | Note Added: 0000377 | |
2021-12-01 08:32 | alukoshko | Note Added: 0000435 | |
2021-12-01 12:01 | sbhat | Note Added: 0000437 | |
2021-12-12 18:22 | toracat | Note Added: 0000444 |