Message ID | 20220915101049.187325-6-shivam.kumar1@nutanix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: Dirty quota-based throttling | expand |
On Thu, Sep 15, 2022, Shivam Kumar wrote: > #endif /* SELFTEST_KVM_UTIL_BASE_H */ > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c > index 9889fe0d8919..4c7bd9807d0b 100644 > --- a/tools/testing/selftests/kvm/lib/kvm_util.c > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c > @@ -18,6 +18,7 @@ > #include <linux/kernel.h> > > #define KVM_UTIL_MIN_PFN 2 > +#define PML_BUFFER_SIZE 512 ... > + /* > + * Due to PML, number of pages dirtied by the vcpu can exceed its dirty > + * quota by PML buffer size. > + */ Buffering for PML is very Intel centric, and even then it's not guaranteed. In x86, PML can and should be detected programmatically: bool kvm_is_pml_enabled(void) { return is_intel_cpu() && get_kvm_intel_param_bool("pml"); } Not sure if it's worth a generic arch hook to get the CPU buffer size, e.g. the test could do: /* * Allow ??? pages of overrun, KVM is allowed to dirty multiple pages * before exiting to userspace, e.g. when emulating an instruction that * performs multiple memory accesses. */ uint64_t buffer = ???; /* * When Intel's Page-Modification Logging (PML) is enabled, the CPU may * dirty up to 512 pages (number of entries in the PML buffer) without * exiting, thus KVM may effectively dirty that many pages before * enforcing the dirty quota. */ #ifdef __x86_64__ if (kvm_is_pml_enabled(void) buffer = PML_BUFFER_SIZE; #endif > + TEST_ASSERT(count <= quota + PML_BUFFER_SIZE, "Invalid number of pages Clarify _why_ the count is invalid. > + dirtied: count=%"PRIu64", quota=%"PRIu64"\n", count, quota); Don't split strings unless it's truly necessary. This is one of those cases where running over the 80 char soft limit is preferable to wrapping. And no need to use PRIu64, selftests are 64-bit only. E.g. TEST_ASSERT(count <= (quota + buffer), "KVM dirtied too many pages: count = %lx, quota = %lx, buffer = %lx", count, quota, buffer); > + > + TEST_ASSERT(count >= quota, "Dirty quota exit happened with quota yet to > + be exhausted: count=%"PRIu64", quota=%"PRIu64"\n", count, quota); Similar comments here. > + > + if (count > quota) > + pr_info("Dirty quota exit with unequal quota and count: > + count=%"PRIu64", quota=%"PRIu64"\n", count, quota); Not sure I'd bother with this. Realistically, is anyone ever going to be able to do anything useful with this information? E.g. is this just going to be a "PML is enabled!" message? > + > + run->dirty_quota = count + test_dirty_quota_increment; > +} > -- > 2.22.3 >
On 07/10/22 11:59 pm, Sean Christopherson wrote: > On Thu, Sep 15, 2022, Shivam Kumar wrote: >> #endif /* SELFTEST_KVM_UTIL_BASE_H */ >> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c >> index 9889fe0d8919..4c7bd9807d0b 100644 >> --- a/tools/testing/selftests/kvm/lib/kvm_util.c >> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c >> @@ -18,6 +18,7 @@ >> #include <linux/kernel.h> >> >> #define KVM_UTIL_MIN_PFN 2 >> +#define PML_BUFFER_SIZE 512 > > ... > >> + /* >> + * Due to PML, number of pages dirtied by the vcpu can exceed its dirty >> + * quota by PML buffer size. >> + */ > > Buffering for PML is very Intel centric, and even then it's not guaranteed. In > x86, PML can and should be detected programmatically: > > bool kvm_is_pml_enabled(void) > { > return is_intel_cpu() && get_kvm_intel_param_bool("pml"); > } > Yes, I was looking for something like this. Thanks. > > Not sure if it's worth a generic arch hook to get the CPU buffer size, e.g. the > test could do: > I can't think of any usecase for a custom buffer as of now. We are working on a proposal for dynamically adjusting the PML buffer size in order to throttle effectively with dirty quota. A potential usecase. > > /* > * Allow ??? pages of overrun, KVM is allowed to dirty multiple pages > * before exiting to userspace, e.g. when emulating an instruction that > * performs multiple memory accesses. > */ > uint64_t buffer = ???; > > /* > * When Intel's Page-Modification Logging (PML) is enabled, the CPU may > * dirty up to 512 pages (number of entries in the PML buffer) without > * exiting, thus KVM may effectively dirty that many pages before > * enforcing the dirty quota. > */ > #ifdef __x86_64__ > if (kvm_is_pml_enabled(void) > buffer = PML_BUFFER_SIZE; > #endif > > >> + TEST_ASSERT(count <= quota + PML_BUFFER_SIZE, "Invalid number of pages > > Clarify _why_ the count is invalid. In the worst case, the vcpu will be able to dirty 512 more pages than its dirty quota due to PML. > >> + dirtied: count=%"PRIu64", quota=%"PRIu64"\n", count, quota); > > Don't split strings unless it's truly necessary. This is one of those cases where > running over the 80 char soft limit is preferable to wrapping. And no need to use > PRIu64, selftests are 64-bit only. E.g. Got it, thanks. > > TEST_ASSERT(count <= (quota + buffer), > "KVM dirtied too many pages: count = %lx, quota = %lx, buffer = %lx", > count, quota, buffer); > > >> + >> + TEST_ASSERT(count >= quota, "Dirty quota exit happened with quota yet to >> + be exhausted: count=%"PRIu64", quota=%"PRIu64"\n", count, quota); > > Similar comments here. > >> + >> + if (count > quota) >> + pr_info("Dirty quota exit with unequal quota and count: >> + count=%"PRIu64", quota=%"PRIu64"\n", count, quota); > > Not sure I'd bother with this. Realistically, is anyone ever going to be able to > do anything useful with this information? E.g. is this just going to be a "PML is > enabled!" message? I thought this information could help detect anomalies when PML is disabled. If PML is disabled, I am expecting that the exit would be case when the quota equals count and count wouldn't ever exceed quota. Now, when I think of, I think I should move this statement inside an 'if' block and make it too an assertion. if (!kvm_is_pml_enabled(void)) TEST_ASSERT(count == quota, "Dirty quota exit with unequal quota and count."); > >> + >> + run->dirty_quota = count + test_dirty_quota_increment; >> +} >> -- >> 2.22.3 >>
On Mon, Oct 10, 2022, Shivam Kumar wrote: > > > On 07/10/22 11:59 pm, Sean Christopherson wrote: > > On Thu, Sep 15, 2022, Shivam Kumar wrote: > > > + TEST_ASSERT(count <= quota + PML_BUFFER_SIZE, "Invalid number of pages > > > > Clarify _why_ the count is invalid. > In the worst case, the vcpu will be able to dirty 512 more pages than its > dirty quota due to PML. Heh, I know that, I wasn't asking a question. I was saying that the assert should tell the user/developer running the test why the count is "invalid", i.e. why the assert failed. Put yourself in the shoes of a random developer that's new to KVM and knows almost nothing about dirty logging and isn't even aware PML is a thing. Write the assert message for _that_ person.
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 9c883c94d478..81c46eff7e1d 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -63,6 +63,8 @@ #define SIG_IPI SIGUSR1 +#define TEST_DIRTY_QUOTA_INCREMENT 8 + /* * Guest/Host shared variables. Ensure addr_gva2hva() and/or * sync_global_to/from_guest() are used when accessing from @@ -189,6 +191,7 @@ static enum log_mode_t host_log_mode_option = LOG_MODE_ALL; static enum log_mode_t host_log_mode; static pthread_t vcpu_thread; static uint32_t test_dirty_ring_count = TEST_DIRTY_RING_COUNT; +static uint64_t test_dirty_quota_increment = TEST_DIRTY_QUOTA_INCREMENT; static void vcpu_kick(void) { @@ -208,6 +211,13 @@ static void sem_wait_until(sem_t *sem) while (ret == -1 && errno == EINTR); } +static void set_dirty_quota(struct kvm_vm *vm, uint64_t dirty_quota) +{ + struct kvm_run *run = vcpu_state(vm, VCPU_ID); + + vcpu_set_dirty_quota(run, dirty_quota); +} + static bool clear_log_supported(void) { return kvm_has_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2); @@ -255,7 +265,11 @@ static void default_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err) TEST_ASSERT(ret == 0 || (ret == -1 && err == EINTR), "vcpu run failed: errno=%d", err); - TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC, + if (test_dirty_quota_increment && + run->exit_reason == KVM_EXIT_DIRTY_QUOTA_EXHAUSTED) + vcpu_handle_dirty_quota_exit(run, test_dirty_quota_increment); + else + TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC, "Invalid guest sync status: exit_reason=%s\n", exit_reason_str(run->exit_reason)); @@ -372,6 +386,9 @@ static void dirty_ring_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err) if (get_ucall(vcpu, NULL) == UCALL_SYNC) { /* We should allow this to continue */ ; + } else if (test_dirty_quota_increment && + run->exit_reason == KVM_EXIT_DIRTY_QUOTA_EXHAUSTED) { + vcpu_handle_dirty_quota_exit(run, test_dirty_quota_increment); } else if (run->exit_reason == KVM_EXIT_DIRTY_RING_FULL || (ret == -1 && err == EINTR)) { /* Update the flag first before pause */ @@ -762,6 +779,10 @@ static void run_test(enum vm_guest_mode mode, void *arg) sync_global_to_guest(vm, guest_test_virt_mem); sync_global_to_guest(vm, guest_num_pages); + /* Initialise dirty quota */ + if (test_dirty_quota_increment) + set_dirty_quota(vm, test_dirty_quota_increment); + /* Start the iterations */ iteration = 1; sync_global_to_guest(vm, iteration); @@ -803,6 +824,9 @@ static void run_test(enum vm_guest_mode mode, void *arg) /* Tell the vcpu thread to quit */ host_quit = true; log_mode_before_vcpu_join(); + /* Terminate dirty quota throttling */ + if (test_dirty_quota_increment) + set_dirty_quota(vm, 0); pthread_join(vcpu_thread, NULL); pr_info("Total bits checked: dirty (%"PRIu64"), clear (%"PRIu64"), " @@ -824,6 +848,8 @@ static void help(char *name) printf(" -c: specify dirty ring size, in number of entries\n"); printf(" (only useful for dirty-ring test; default: %"PRIu32")\n", TEST_DIRTY_RING_COUNT); + printf(" -q: specify incemental dirty quota (default: %"PRIu32")\n", + TEST_DIRTY_QUOTA_INCREMENT); printf(" -i: specify iteration counts (default: %"PRIu64")\n", TEST_HOST_LOOP_N); printf(" -I: specify interval in ms (default: %"PRIu64" ms)\n", @@ -852,11 +878,14 @@ int main(int argc, char *argv[]) guest_modes_append_default(); - while ((opt = getopt(argc, argv, "c:hi:I:p:m:M:")) != -1) { + while ((opt = getopt(argc, argv, "c:q:hi:I:p:m:M:")) != -1) { switch (opt) { case 'c': test_dirty_ring_count = strtol(optarg, NULL, 10); break; + case 'q': + test_dirty_quota_increment = strtol(optarg, NULL, 10); + break; case 'i': p.iterations = strtol(optarg, NULL, 10); break; diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index 24fde97f6121..68be66b9ac67 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -834,4 +834,8 @@ static inline int __vm_disable_nx_huge_pages(struct kvm_vm *vm) return __vm_enable_cap(vm, KVM_CAP_VM_DISABLE_NX_HUGE_PAGES, 0); } +void vcpu_set_dirty_quota(struct kvm_run *run, uint64_t dirty_quota); +void vcpu_handle_dirty_quota_exit(struct kvm_run *run, + uint64_t test_dirty_quota_increment); + #endif /* SELFTEST_KVM_UTIL_BASE_H */ diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 9889fe0d8919..4c7bd9807d0b 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -18,6 +18,7 @@ #include <linux/kernel.h> #define KVM_UTIL_MIN_PFN 2 +#define PML_BUFFER_SIZE 512 static int vcpu_mmap_sz(void); @@ -1703,6 +1704,7 @@ static struct exit_reason { {KVM_EXIT_X86_RDMSR, "RDMSR"}, {KVM_EXIT_X86_WRMSR, "WRMSR"}, {KVM_EXIT_XEN, "XEN"}, + {KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, "DIRTY_QUOTA_EXHAUSTED"}, #ifdef KVM_EXIT_MEMORY_NOT_PRESENT {KVM_EXIT_MEMORY_NOT_PRESENT, "MEMORY_NOT_PRESENT"}, #endif @@ -1979,3 +1981,37 @@ void __vm_get_stat(struct kvm_vm *vm, const char *stat_name, uint64_t *data, break; } } + +void vcpu_set_dirty_quota(struct kvm_run *run, uint64_t dirty_quota) +{ + run->dirty_quota = dirty_quota; + + if (dirty_quota) + pr_info("Dirty quota throttling enabled with initial quota %"PRIu64"\n", + dirty_quota); + else + pr_info("Dirty quota throttling disabled\n"); +} + +void vcpu_handle_dirty_quota_exit(struct kvm_run *run, + uint64_t test_dirty_quota_increment) +{ + uint64_t quota = run->dirty_quota_exit.quota; + uint64_t count = run->dirty_quota_exit.count; + + /* + * Due to PML, number of pages dirtied by the vcpu can exceed its dirty + * quota by PML buffer size. + */ + TEST_ASSERT(count <= quota + PML_BUFFER_SIZE, "Invalid number of pages + dirtied: count=%"PRIu64", quota=%"PRIu64"\n", count, quota); + + TEST_ASSERT(count >= quota, "Dirty quota exit happened with quota yet to + be exhausted: count=%"PRIu64", quota=%"PRIu64"\n", count, quota); + + if (count > quota) + pr_info("Dirty quota exit with unequal quota and count: + count=%"PRIu64", quota=%"PRIu64"\n", count, quota); + + run->dirty_quota = count + test_dirty_quota_increment; +}