Message ID | 20240117064441.2633784-1-tao1.su@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: selftests: Add a requirement for disabling numa balancing | expand |
On Wed, Jan 17, 2024, Tao Su wrote: > In dirty_log_page_splitting_test, vm_get_stat(vm, "pages_4k") has > probability of gradually reducing to 0 after vm exit. The reason is that > the host triggers numa balancing and unmaps the related spte. So, the > number of pages currently mapped in EPT (kvm->stat.pages) is not equal > to the pages touched by the guest, which causes stats_populated.pages_4k > and stats_repopulated.pages_4k in this test are not same, resulting in > failure. ... > dirty_log_page_splitting_test assumes that kvm->stat.pages and the pages > touched by the guest are the same, but the assumption is no longer true > if numa balancing is enabled. Add a requirement for disabling > numa_balancing to avoid confusing due to test failure. > > Actually, all page migration (including numa balancing) will trigger this > issue, e.g. running script: > ./x86_64/dirty_log_page_splitting_test & > PID=$! > sleep 1 > migratepages $PID 0 1 > It is unusual to create above test environment intentionally, but numa > balancing initiated by the kernel will most likely be triggered, at > least in dirty_log_page_splitting_test. > > Reported-by: Yi Lai <yi1.lai@intel.com> > Signed-off-by: Tao Su <tao1.su@linux.intel.com> > Tested-by: Yi Lai <yi1.lai@intel.com> > --- > .../kvm/x86_64/dirty_log_page_splitting_test.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > index 634c6bfcd572..f2c796111d83 100644 > --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > @@ -212,10 +212,21 @@ static void help(char *name) > > int main(int argc, char *argv[]) > { > + FILE *f; > int opt; > + int ret, numa_balancing; > > TEST_REQUIRE(get_kvm_param_bool("eager_page_split")); > TEST_REQUIRE(get_kvm_param_bool("tdp_mmu")); > + f = fopen("/proc/sys/kernel/numa_balancing", "r"); > + if (f) { > + ret = fscanf(f, "%d", &numa_balancing); > + TEST_ASSERT(ret == 1, "Error reading numa_balancing"); > + TEST_ASSERT(!numa_balancing, "please run " > + "'echo 0 > /proc/sys/kernel/numa_balancing'"); If we go this route, this should be a TEST_REQUIRE(), not a TEST_ASSERT(). The test hasn't failed, rather it has detected an incompatible setup. Something isn't right though. The test defaults to HugeTLB, and the invocation in the changelog doesn't override the backing source. That suggests that NUMA auto-balancing is zapping HugeTLB VMAs, which AFAIK shouldn't happen, e.g. this code in task_numa_work() should cause such VMAs to be skipped: if (!vma_migratable(vma) || !vma_policy_mof(vma) || is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) { trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE); continue; } And the test already warns the user if they opt to use something other than HugeTLB. if (!is_backing_src_hugetlb(backing_src)) { pr_info("This test will only work reliably with HugeTLB memory. " "It can work with THP, but that is best effort.\n"); } If the test is defaulting to something other than HugeTLB, then we should fix that in the test. If the kernel is doing NUMA balancing on HugeTLB VMAs, then we should fix that in the kernel.
On Wed, Jan 17, 2024 at 07:12:08AM -0800, Sean Christopherson wrote: [...] > > diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > > index 634c6bfcd572..f2c796111d83 100644 > > --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > > +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > > @@ -212,10 +212,21 @@ static void help(char *name) > > > > int main(int argc, char *argv[]) > > { > > + FILE *f; > > int opt; > > + int ret, numa_balancing; > > > > TEST_REQUIRE(get_kvm_param_bool("eager_page_split")); > > TEST_REQUIRE(get_kvm_param_bool("tdp_mmu")); > > + f = fopen("/proc/sys/kernel/numa_balancing", "r"); > > + if (f) { > > + ret = fscanf(f, "%d", &numa_balancing); > > + TEST_ASSERT(ret == 1, "Error reading numa_balancing"); > > + TEST_ASSERT(!numa_balancing, "please run " > > + "'echo 0 > /proc/sys/kernel/numa_balancing'"); > > If we go this route, this should be a TEST_REQUIRE(), not a TEST_ASSERT(). The > test hasn't failed, rather it has detected an incompatible setup. Yes, previously I wanted to print a more user-friendly prompt, but TEST_REQUIRE() can’t customize the output… > > Something isn't right though. The test defaults to HugeTLB, and the invocation > in the changelog doesn't override the backing source. That suggests that NUMA > auto-balancing is zapping HugeTLB VMAs, which AFAIK shouldn't happen, e.g. this > code in task_numa_work() should cause such VMAs to be skipped: > > if (!vma_migratable(vma) || !vma_policy_mof(vma) || > is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) { > trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE); > continue; > } > > And the test already warns the user if they opt to use something other than > HugeTLB. > > if (!is_backing_src_hugetlb(backing_src)) { > pr_info("This test will only work reliably with HugeTLB memory. " > "It can work with THP, but that is best effort.\n"); > } > > If the test is defaulting to something other than HugeTLB, then we should fix > that in the test. If the kernel is doing NUMA balancing on HugeTLB VMAs, then > we should fix that in the kernel. HugeTLB VMAs are not affected by NUMA auto-balancing through my observation, but the backing sources of the test code and per-vCPU stacks are not Huge TLB, e.g. __vm_create() invokes vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0); So, some pages are possible to be migrated. In dirty_log_page_splitting_test, if sleep 3s before enabling dirty logging: --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c @@ -124,6 +124,7 @@ static void run_test(enum vm_guest_mode mode, void *unused) memstress_start_vcpu_threads(VCPUS, vcpu_worker); run_vcpu_iteration(vm); + sleep(3); get_page_stats(vm, &stats_populated, "populating memory"); /* Enable dirty logging */ I got these logs: # ./x86_64/dirty_log_page_splitting_test Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages __vm_create: mode='PA-bits:ANY, VA-bits:48, 4K pages' pages='2710' Guest physical address width detected: 52 guest physical test memory: [0xfffff7fe00000, 0xfffffffe00000) Added VCPU 0 with test mem gpa [fffff7fe00000, fffffffe00000) Added VCPU 1 with test mem gpa [fffff7fe00000, fffffffe00000) Page stats after populating memory: 4K: 0 2M: 1024 1G: 0 huge: 1024 Page stats after enabling dirty logging: 4K: 524288 2M: 0 1G: 0 huge: 0 Page stats after dirtying memory: 4K: 525334 2M: 0 1G: 0 huge: 0 Page stats after dirtying memory: 4K: 525334 2M: 0 1G: 0 huge: 0 Page stats after disabling dirty logging: 4K: 1046 2M: 0 1G: 0 huge: 0 Page stats after repopulating memory: 4K: 1046 2M: 1024 1G: 0 huge: 1024 ==== Test Assertion Failure ==== x86_64/dirty_log_page_splitting_test.c:196: stats_populated.pages_4k == stats_repopulated.pages_4k pid=2660413 tid=2660413 errno=0 - Success 1 0x0000000000402d4b: run_test at dirty_log_page_splitting_test.c:196 (discriminator 1) 2 0x0000000000403724: for_each_guest_mode at guest_modes.c:100 3 0x00000000004024ef: main at dirty_log_page_splitting_test.c:246 4 0x00007fd72c229d8f: ?? ??:0 5 0x00007fd72c229e3f: ?? ??:0 6 0x00000000004025b4: _start at ??:? 0 != 0x416 (stats_populated.pages_4k != stats_repopulated.pages_4k) Thanks, Tao
+David On Thu, Jan 18, 2024, Tao Su wrote: > On Wed, Jan 17, 2024 at 07:12:08AM -0800, Sean Christopherson wrote: > > [...] > > > > diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > > > index 634c6bfcd572..f2c796111d83 100644 > > > --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > > > +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > > > @@ -212,10 +212,21 @@ static void help(char *name) > > > > > > int main(int argc, char *argv[]) > > > { > > > + FILE *f; > > > int opt; > > > + int ret, numa_balancing; > > > > > > TEST_REQUIRE(get_kvm_param_bool("eager_page_split")); > > > TEST_REQUIRE(get_kvm_param_bool("tdp_mmu")); > > > + f = fopen("/proc/sys/kernel/numa_balancing", "r"); > > > + if (f) { > > > + ret = fscanf(f, "%d", &numa_balancing); > > > + TEST_ASSERT(ret == 1, "Error reading numa_balancing"); > > > + TEST_ASSERT(!numa_balancing, "please run " > > > + "'echo 0 > /proc/sys/kernel/numa_balancing'"); > > > > If we go this route, this should be a TEST_REQUIRE(), not a TEST_ASSERT(). The > > test hasn't failed, rather it has detected an incompatible setup. > > Yes, previously I wanted to print a more user-friendly prompt, but TEST_REQUIRE() > can’t customize the output… __TEST_REQUIRE() > > Something isn't right though. The test defaults to HugeTLB, and the invocation > > in the changelog doesn't override the backing source. That suggests that NUMA > > auto-balancing is zapping HugeTLB VMAs, which AFAIK shouldn't happen, e.g. this > > code in task_numa_work() should cause such VMAs to be skipped: > > > > if (!vma_migratable(vma) || !vma_policy_mof(vma) || > > is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) { > > trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE); > > continue; > > } > > > > And the test already warns the user if they opt to use something other than > > HugeTLB. > > > > if (!is_backing_src_hugetlb(backing_src)) { > > pr_info("This test will only work reliably with HugeTLB memory. " > > "It can work with THP, but that is best effort.\n"); > > } > > > > If the test is defaulting to something other than HugeTLB, then we should fix > > that in the test. If the kernel is doing NUMA balancing on HugeTLB VMAs, then > > we should fix that in the kernel. > > HugeTLB VMAs are not affected by NUMA auto-balancing through my observation, but > the backing sources of the test code and per-vCPU stacks are not Huge TLB, e.g. > __vm_create() invokes > > vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0); > > So, some pages are possible to be migrated. Ah, hmm. Requiring NUMA balancing be disabled isn't going to fix the underlying issue, it's just guarding against one of the more likely culprits. The best fix is likely to have the test precisely validate _only_ the test data pages. E.g. if we double down on requiring HugeTLB, then the test should be able to assert that it has at least N hugepages when dirty logging is disabled, and at least M 4KiB pages when dirty logging is enabled.
On Thu, Jan 18, 2024 at 09:33:29AM -0800, Sean Christopherson wrote: > +David > > On Thu, Jan 18, 2024, Tao Su wrote: > > On Wed, Jan 17, 2024 at 07:12:08AM -0800, Sean Christopherson wrote: > > > > [...] > > > > > > diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > > > > index 634c6bfcd572..f2c796111d83 100644 > > > > --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > > > > +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c > > > > @@ -212,10 +212,21 @@ static void help(char *name) > > > > > > > > int main(int argc, char *argv[]) > > > > { > > > > + FILE *f; > > > > int opt; > > > > + int ret, numa_balancing; > > > > > > > > TEST_REQUIRE(get_kvm_param_bool("eager_page_split")); > > > > TEST_REQUIRE(get_kvm_param_bool("tdp_mmu")); > > > > + f = fopen("/proc/sys/kernel/numa_balancing", "r"); > > > > + if (f) { > > > > + ret = fscanf(f, "%d", &numa_balancing); > > > > + TEST_ASSERT(ret == 1, "Error reading numa_balancing"); > > > > + TEST_ASSERT(!numa_balancing, "please run " > > > > + "'echo 0 > /proc/sys/kernel/numa_balancing'"); > > > > > > If we go this route, this should be a TEST_REQUIRE(), not a TEST_ASSERT(). The > > > test hasn't failed, rather it has detected an incompatible setup. > > > > Yes, previously I wanted to print a more user-friendly prompt, but TEST_REQUIRE() > > can’t customize the output… > > __TEST_REQUIRE() Got it. > > > > Something isn't right though. The test defaults to HugeTLB, and the invocation > > > in the changelog doesn't override the backing source. That suggests that NUMA > > > auto-balancing is zapping HugeTLB VMAs, which AFAIK shouldn't happen, e.g. this > > > code in task_numa_work() should cause such VMAs to be skipped: > > > > > > if (!vma_migratable(vma) || !vma_policy_mof(vma) || > > > is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) { > > > trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE); > > > continue; > > > } > > > > > > And the test already warns the user if they opt to use something other than > > > HugeTLB. > > > > > > if (!is_backing_src_hugetlb(backing_src)) { > > > pr_info("This test will only work reliably with HugeTLB memory. " > > > "It can work with THP, but that is best effort.\n"); > > > } > > > > > > If the test is defaulting to something other than HugeTLB, then we should fix > > > that in the test. If the kernel is doing NUMA balancing on HugeTLB VMAs, then > > > we should fix that in the kernel. > > > > HugeTLB VMAs are not affected by NUMA auto-balancing through my observation, but > > the backing sources of the test code and per-vCPU stacks are not Huge TLB, e.g. > > __vm_create() invokes > > > > vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0); > > > > So, some pages are possible to be migrated. > > Ah, hmm. Requiring NUMA balancing be disabled isn't going to fix the underlying > issue, it's just guarding against one of the more likely culprits. The best fix > is likely to have the test precisely validate _only_ the test data pages. E.g. > if we double down on requiring HugeTLB, then the test should be able to assert > that it has at least N hugepages when dirty logging is disabled, and at least M > 4KiB pages when dirty logging is enabled. I see, I will update the ASSERT conditions, thanks for your guidance. Thanks, Tao
diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c index 634c6bfcd572..f2c796111d83 100644 --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c @@ -212,10 +212,21 @@ static void help(char *name) int main(int argc, char *argv[]) { + FILE *f; int opt; + int ret, numa_balancing; TEST_REQUIRE(get_kvm_param_bool("eager_page_split")); TEST_REQUIRE(get_kvm_param_bool("tdp_mmu")); + f = fopen("/proc/sys/kernel/numa_balancing", "r"); + if (f) { + ret = fscanf(f, "%d", &numa_balancing); + TEST_ASSERT(ret == 1, "Error reading numa_balancing"); + TEST_ASSERT(!numa_balancing, "please run " + "'echo 0 > /proc/sys/kernel/numa_balancing'"); + fclose(f); + f = NULL; + } while ((opt = getopt(argc, argv, "b:hs:")) != -1) { switch (opt) {