View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000445 | AlmaLinux-8 | qemu-kvm | public | 2023-11-27 19:21 | 2023-12-04 02:51 |
Reporter | mutts | Assigned To | alukoshko | ||
Priority | high | Severity | crash | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Platform | x86_64 | OS | Almalinux | OS Version | 8 |
Summary | 0000445: Fix max integer mmu_invalidate_seq hanging vCPUs | ||||
Description | I'm not sure what specific kernel Almalinux 8's kernel is based on, but the kernels starting with 4.18.0-425.3.1.el8.x86_64 and through the current kernel 4.18.0-513.5.1.el8_9.x86_64 are susceptible to this bug. In mainline kernel 6.1 this was addressed in commit 82d811ff566594de3676f35808e8a9e19c5c864c effectively changing mmu_seq from an int to an unsigned long: https://lore.kernel.org/lkml/2023082606-viper-accuracy-b0fd@gregkh/T/ Meanwhile this was fixed in mainline kernel 6.3 through a complete overhaul of the system in commit ba6e3fe25543: https://lore.kernel.org/lkml/2023082644-vaporizer-stuffy-b8bc@gregkh/T/ At any rate, the kernel for Almalinux 8 needs to be updated to resolve this issue in the is_page_fault_stale() function. Kernel 4.18.0-372.26.1.el8_6.x86_64 (and presumably 4.18.0-372.32.1.el8_6.x86_64) is not affected by this because it does not have the is_page_fault_stale() function. | ||||
Steps To Reproduce | Install Almalinux 8 using any kernel between 4.18.0-425.3.1.el8.x86_64 and 4.18.0-513.5.1.el8_9.x86_64 Spin up a KVM guest on that Almalinux node. Do stuff inside the KVM guest that makes it use a lot of memory over and over again. Eventually mmu_notifier_seq will hit max integer - 2,147,483,647 - at which point the KVM guest will freeze up and become unresponsive. | ||||
Additional Information | You can monitor the mmu_notifier_seq count from the host node by running the bpftrace script: --SNIP-- #if defined(CONFIG_FUNCTION_TRACER) #define CC_USING_FENTRY #endif #include <linux/kvm_host.h> kprobe:direct_page_fault { $ctr = ((struct kvm_vcpu*)arg0)->kvm->mmu_notifier_seq; @counts[pid] = $ctr; } interval:s:60 { $ts = nsecs + 300000; printf("%s\n", strftime("%m-%d-%y %H:%M:%S", $ts)); print(@counts); print("---\n"); } --SNIP-- Once this hits 2,147,483,647 (max integer) the guest will become unresponsive. Depending on just how much memory is used on the guest and how often the memory pages are cleared, this may take a while. Some guests that use very little memory may take months or years to hit the 2,147,483,647 max integer number. Running a program on the KVM guest that continuously consumes and dumps memory may allow you to more easily duplicate this issue. Looking at the kernel source packages, it would seem that Almalinux 9 was also susceptible to this up to kernel 5.14.0-284.11.1, but the latest Almalinux 9 kernel 5.14.0-362.8.1 appears to have been refactored based on the mainline kernel 6.3 which fixes this issue. | ||||
Tags | almalinux8, kernel, QEMU-KVM | ||||
abrt_hash | |||||
URL | |||||
|
Kernel build with patch https://lore.kernel.org/lkml/2023082606-viper-accuracy-b0fd@gregkh/T/ included: https://build.almalinux.org/build/8033 |
|
Released https://errata.almalinux.org/8/ALSA-2023-7549.html |
Date Modified | Username | Field | Change |
---|---|---|---|
2023-11-27 19:21 | mutts | New Issue | |
2023-11-27 19:21 | mutts | Tag Attached: almalinux8 | |
2023-11-27 19:21 | mutts | Tag Attached: kernel | |
2023-11-27 19:21 | mutts | Tag Attached: QEMU-KVM | |
2023-11-30 13:03 | alukoshko | Assigned To | => alukoshko |
2023-11-30 13:03 | alukoshko | Status | new => confirmed |
2023-12-03 23:55 | alukoshko | Note Added: 0001003 | |
2023-12-04 02:51 | alukoshko | Note Added: 0001004 | |
2023-12-04 02:51 | alukoshko | Status | confirmed => resolved |
2023-12-04 02:51 | alukoshko | Resolution | open => fixed |