0000258: qemu-kvm process goes to 100%, VM becomes unresponsive, Triggered by Creating a new VM - MantisBT

ID	Project	Category	View Status	Date Submitted	Last Update

0000258	AlmaLinux-8	qemu-kvm	public	2022-06-03 05:25	2024-03-05 20:15

Reporter	jpbennett	Assigned To
Priority	high	Severity	crash	Reproducibility	sometimes
Status	new	Resolution	open
Platform	Amd Server	OS	AlmaLinux	OS Version	8

Summary	0000258: qemu-kvm process goes to 100%, VM becomes unresponsive, Triggered by Creating a new VM
Description	Multiple times on two separate servers, qemu-kvm processes have suddenly spiked to 100% usage, and the VMs represented by those processes have become unresponsive to the point of needing force-power-cycled. Multiple VMs enter this state at the exact same moment. In two cases, the VM has recovered after multiple minutes, and in both cases, the VM's system clock was set in the future. It seems that the longer the hang goes on, the further in the future, as one machine thought it was the year 2217. Dates were correct before the hangs. These servers were running CentOS 8 before converting to Alma, and didn't exhibit this bug on CentOS. The hanging VMs have also all been running Alma 8, though that's likely coincidence, as that's most of my VMs now. There hasn't been anything notable in the host machine's dmesg or logs that I have seen.
Steps To Reproduce	Run multiple Alma 8 VMs on a server. Migrate an additional VM to that server using a command like virsh migrate --live --persistent --undefinesource --copy-storage-all --verbose Gitlab qemu+ssh://10.10.3.2/system. During the migration process, there is a chance that multiple running VMs will hang in the manner described. Oddly, not every VM hangs, nor does it happen on every migration, but when it occurs, all the affected VMs hang simultaneously. Additionally, this has been reproduced when generating a new VM on a server. One of the other Alma VMs already running on that server entered this odd failure state.
Additional Information	Both servers are SuperMicro builds, one a AMD Opteron(TM) Processor 6272, the other a AMD EPYC 7302P 16-Core Processor. I'm using the Opteron_G3 processor type to host, as this allows live migration.
Tags	almalinux8, QEMU-KVM

abrt_hash
URL

jpbennett 2022-06-03 05:40 reporter ~0000585	Just reproduced this issue by rebooting a running VM, and a sistem VM jumped to 100% CPU and unresponsive.

jpbennett 2022-06-03 06:29 reporter ~0000586	Dmesg from a VM as it hit this bug. log2.png (85,648 bytes) log2.png (85,648 bytes) log1.png (80,747 bytes) log1.png (80,747 bytes)

bgf 2022-12-09 07:37 reporter ~0000757	I believe I am seeing the same effect issue. I'm using Rocky but reporting here just because the situation seems very similar to this issue and it could be rare. The environment is a little complex. Nested virtualization is being used on a Mac. The Outer host is a VirtualBox VM with Rocky 8 installed. The Outer VM hosts inner VMs using KVM. The inner VMs are running Rocky 8 as well. The first two inner VMs work. When starting the third of fourth, all cpus allocated to inner VMs go to 100% and the inner VMs become almost unresponsive. One inner VM has been seen to produce the following message into /var/log/messages: chronyd[7233]: Forward time jump detected! Stopping one of the inner VMs returns the system to a relatively normal state.

mutts 2023-05-08 16:04 reporter ~0000883	Are you both still seeing this issue? I'm encountering an issue which I believe is similar, although it may not necessarily be exactly the same. My issue is just an unexplained freeze of the VM. The qemu-kvm process at or near 100% of given CPU. But the VM itself (running top inside the VM) doesn't show a high load or a lot going on. Both nodes (and VMs) are running Almalinux 8. There's not really a rhyme or reason as to why or when this will happen. Nothing in any logs, that I can find, that indicate any problem. It's happened twice within a couple of weeks on one VM. Another VM has had this happen once in the past couple of weeks. A third AlmaLinux node with an AlmaLinux 8 VM has yet to experience this issue. The one that has had this happen twice is an older Intel E3-1270v2. The other one that's had this happen once is an Intel i9-11900K. The one that has yet to have this happen is an AMD Ryzen 9 3900X. When these freezes happen, my only recourse is to destroy the VM and start it back up.

Date Modified	Username	Field	Change
2022-06-03 05:25	jpbennett	New Issue
2022-06-03 05:25	jpbennett	Tag Attached: almalinux8
2022-06-03 05:25	jpbennett	Tag Attached: QEMU-KVM
2022-06-03 05:40	jpbennett	Note Added: 0000585
2022-06-03 06:29	jpbennett	Note Added: 0000586
2022-06-03 06:29	jpbennett	File Added: log2.png
2022-06-03 06:29	jpbennett	File Added: log1.png
2022-12-09 07:37	bgf	Note Added: 0000757
2023-05-08 16:04	mutts	Note Added: 0000883