0000293: receiving /proc/sys/kernel/hung message OS locks up - MantisBT

ID	Project	Category	View Status	Date Submitted	Last Update

0000293	AlmaLinux-8	kernel	public	2022-08-10 15:46	2022-09-16 13:01

Reporter	jmadeira	Assigned To
Priority	high	Severity	crash	Reproducibility	sometimes
Status	new	Resolution	open
Platform	VMware ESXi, 7.0.1	OS	AlmaLinux 8.6	OS Version	4.18.0-372.19.1

Summary	0000293: receiving /proc/sys/kernel/hung message OS locks up
Description	We recently provisioned a number of AlmaLinux 8.6 servers in our VMware virtual environment and after a number of days the servers start hanging an throw the message: Not tainted 4.18.0-372.16.1.e18_6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. INFO: task pool: 22704 blocked for more than 120 seconds. Once the message appears the servers become unresponsive and we have to power cycle to bring them back online. These are servers provisioned in our VMware ESXi, 7.0.1 environment and have not seen this issue with any of our other Linux based servers. We have upgraded the servers to 4.18.0-372.19.1 but the issue is still happening. We can not determine the exact time frame when the issues happens but it appears most lock up within the next few days to weeks.
Tags	No tags attached.
Attached Files	procsys_kernelhung_message.png (217,407 bytes) procsys_kernelhung_message.png (217,407 bytes)

abrt_hash
URL

jmadeira 2022-08-12 14:29 reporter ~0000666	I found this post https://forums.centos.org/viewtopic.php?t=79275 and the issue we are having is very similar. The issue seems to appear when updates for kernel 4.18.0-32.19 are installed.

Devi 2022-09-15 19:04 reporter ~0000691	We are having the same problem Please find log below, do you have any solution ? Sep 15 19:02:49 kernel: INFO: task php-fpm73:164951 blocked for more than 120 seconds. Sep 15 19:02:49 kernel: Not tainted 4.18.0-372.26.1.el8_6.x86_64 #1 Sep 15 19:02:49 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 15 19:02:49 kernel: task:php-fpm73 state:D stack: 0 pid:164951 ppid:108420 flags:0x000003a0 Sep 15 19:02:49 kernel: Call Trace: Sep 15 19:02:49 kernel: __schedule+0x2d1/0x830 Sep 15 19:02:49 kernel: schedule+0x35/0xa0 Sep 15 19:02:49 kernel: io_schedule+0x12/0x40 Sep 15 19:02:49 kernel: wait_on_page_bit+0x123/0x220 Sep 15 19:02:49 kernel: ? file_fdatawait_range+0x20/0x20 Sep 15 19:02:49 kernel: shmem_swapin_page+0x23b/0x630 Sep 15 19:02:49 kernel: shmem_getpage_gfp+0x1f8/0x8a0 Sep 15 19:02:49 kernel: shmem_fault+0x78/0x210 Sep 15 19:02:49 kernel: ? filemap_map_pages+0x271/0x410 Sep 15 19:02:49 kernel: __do_fault+0x38/0xb0 Sep 15 19:02:49 kernel: do_fault+0x1a0/0x3c0 Sep 15 19:02:49 kernel: __handle_mm_fault+0x4a3/0x7e0 Sep 15 19:02:49 kernel: handle_mm_fault+0xc1/0x1e0 Sep 15 19:02:49 kernel: do_user_addr_fault+0x1b5/0x440 Sep 15 19:02:49 kernel: do_page_fault+0x37/0x130 Sep 15 19:02:49 kernel: ? page_fault+0x8/0x30 Sep 15 19:02:49 kernel: page_fault+0x1e/0x30 Sep 15 19:02:49 kernel: RIP: 0033:0x7faf3eb48e98 Sep 15 19:02:49 kernel: Code: Unable to access opcode bytes at RIP 0x7faf3eb48e6e. Sep 15 19:02:49 kernel: RSP: 002b:00007ffd142932a0 EFLAGS: 00010212 Sep 15 19:02:49 kernel: RAX: 00000000002f29e8 RBX: 00007faf36e2bad0 RCX: 000000000110ec5c Sep 15 19:02:49 kernel: RDX: 0000000000159ee4 RSI: 0000000000000000 RDI: 00007faf32159220 Sep 15 19:02:49 kernel: RBP: 00007faf32159180 R08: d600eeaf2779d000 R09: 00007faf41274800 Sep 15 19:02:49 kernel: R10: 0000000000000000 R11: 00007faf41274880 R12: ccb0f624a0b59ee6 Sep 15 19:02:49 kernel: R13: 00007ffd142932c0 R14: 00007faf32153280 R15: 00007faf32166038

Devi 2022-09-16 13:01 reporter ~0000692	Solution: There was a EXT4 hard disk mounted to the server with an external /Home2 partition which was rather slow and messing up the DirectAdmin (control panel) quota and Brute force monitor. Or the disk had an hardware failure. Disk and partition removed and server is running as it should

Date Modified	Username	Field	Change
2022-08-10 15:46	jmadeira	New Issue
2022-08-10 15:46	jmadeira	File Added: procsys_kernelhung_message.png
2022-08-12 14:29	jmadeira	Note Added: 0000666
2022-09-15 19:04	Devi	Note Added: 0000691
2022-09-16 13:01	Devi	Note Added: 0000692