View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000293 | AlmaLinux-8 | kernel | public | 2022-08-10 15:46 | 2022-09-16 13:01 |
Reporter | jmadeira | Assigned To | |||
Priority | high | Severity | crash | Reproducibility | sometimes |
Status | new | Resolution | open | ||
Platform | VMware ESXi, 7.0.1 | OS | AlmaLinux 8.6 | OS Version | 4.18.0-372.19.1 |
Summary | 0000293: receiving /proc/sys/kernel/hung message OS locks up | ||||
Description | We recently provisioned a number of AlmaLinux 8.6 servers in our VMware virtual environment and after a number of days the servers start hanging an throw the message: Not tainted 4.18.0-372.16.1.e18_6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. INFO: task pool: 22704 blocked for more than 120 seconds. Once the message appears the servers become unresponsive and we have to power cycle to bring them back online. These are servers provisioned in our VMware ESXi, 7.0.1 environment and have not seen this issue with any of our other Linux based servers. We have upgraded the servers to 4.18.0-372.19.1 but the issue is still happening. We can not determine the exact time frame when the issues happens but it appears most lock up within the next few days to weeks. | ||||
Tags | No tags attached. | ||||
Attached Files | |||||
abrt_hash | |||||
URL | |||||
|
I found this post https://forums.centos.org/viewtopic.php?t=79275 and the issue we are having is very similar. The issue seems to appear when updates for kernel 4.18.0-32.19 are installed. |
|
We are having the same problem Please find log below, do you have any solution ? Sep 15 19:02:49 kernel: INFO: task php-fpm73:164951 blocked for more than 120 seconds. Sep 15 19:02:49 kernel: Not tainted 4.18.0-372.26.1.el8_6.x86_64 #1 Sep 15 19:02:49 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 15 19:02:49 kernel: task:php-fpm73 state:D stack: 0 pid:164951 ppid:108420 flags:0x000003a0 Sep 15 19:02:49 kernel: Call Trace: Sep 15 19:02:49 kernel: __schedule+0x2d1/0x830 Sep 15 19:02:49 kernel: schedule+0x35/0xa0 Sep 15 19:02:49 kernel: io_schedule+0x12/0x40 Sep 15 19:02:49 kernel: wait_on_page_bit+0x123/0x220 Sep 15 19:02:49 kernel: ? file_fdatawait_range+0x20/0x20 Sep 15 19:02:49 kernel: shmem_swapin_page+0x23b/0x630 Sep 15 19:02:49 kernel: shmem_getpage_gfp+0x1f8/0x8a0 Sep 15 19:02:49 kernel: shmem_fault+0x78/0x210 Sep 15 19:02:49 kernel: ? filemap_map_pages+0x271/0x410 Sep 15 19:02:49 kernel: __do_fault+0x38/0xb0 Sep 15 19:02:49 kernel: do_fault+0x1a0/0x3c0 Sep 15 19:02:49 kernel: __handle_mm_fault+0x4a3/0x7e0 Sep 15 19:02:49 kernel: handle_mm_fault+0xc1/0x1e0 Sep 15 19:02:49 kernel: do_user_addr_fault+0x1b5/0x440 Sep 15 19:02:49 kernel: do_page_fault+0x37/0x130 Sep 15 19:02:49 kernel: ? page_fault+0x8/0x30 Sep 15 19:02:49 kernel: page_fault+0x1e/0x30 Sep 15 19:02:49 kernel: RIP: 0033:0x7faf3eb48e98 Sep 15 19:02:49 kernel: Code: Unable to access opcode bytes at RIP 0x7faf3eb48e6e. Sep 15 19:02:49 kernel: RSP: 002b:00007ffd142932a0 EFLAGS: 00010212 Sep 15 19:02:49 kernel: RAX: 00000000002f29e8 RBX: 00007faf36e2bad0 RCX: 000000000110ec5c Sep 15 19:02:49 kernel: RDX: 0000000000159ee4 RSI: 0000000000000000 RDI: 00007faf32159220 Sep 15 19:02:49 kernel: RBP: 00007faf32159180 R08: d600eeaf2779d000 R09: 00007faf41274800 Sep 15 19:02:49 kernel: R10: 0000000000000000 R11: 00007faf41274880 R12: ccb0f624a0b59ee6 Sep 15 19:02:49 kernel: R13: 00007ffd142932c0 R14: 00007faf32153280 R15: 00007faf32166038 |
|
Solution: There was a EXT4 hard disk mounted to the server with an external /Home2 partition which was rather slow and messing up the DirectAdmin (control panel) quota and Brute force monitor. Or the disk had an hardware failure. Disk and partition removed and server is running as it should |