Message ID | 20240221195125.102479-2-shivam.kumar1@nutanix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Per-vCPU dirty quota-based throttling | expand |
I just saw this on the mailing list and had a couple minor thoughts, apologies if I'm contradicting any of the feedback you've received on previous versions On Wed, Feb 21, 2024 at 12:01 PM Shivam Kumar <shivam.kumar1@nutanix.com> wrote: > > Define dirty_quota_bytes variable to track and throttle memory > dirtying for every vcpu. This variable stores the number of bytes the > vcpu is allowed to dirty. To dirty more, the vcpu needs to request > more quota by exiting to userspace. > > Implement update_dirty_quota function which Tiny nit, but can we just rename this to "reduce_dirty_quota"? It's easy to see what an "update" is, but might as well make it even clearer. > +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA > +void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes); > +#else > +static inline void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes) > +{ > +} > +#endif Is there a reason to #ifdef like this instead of just having a single definition and doing > void update_dirty_quota(,,,) { > if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_QUOTA)) return; > // actual body here > } in the body? I figure the compiler elides the no-op call, though I've never bothered to check... > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 10bfc88a69f7..9a1e67187735 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -3626,6 +3626,19 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) > } > EXPORT_SYMBOL_GPL(kvm_clear_guest); > > +void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes) > +{ > + struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); Can we just make update_dirty_quota() take a kvm_vcpu* instead of a kvm* as its first parameter? Since the quota is per-vcpu, that seems to make sense, and most of the callers of this function look like > update_dirty_quota(vcpu->kvm, some_size_here); anyways. The only one that's not is the addition in mark_page_dirty() > void mark_page_dirty_in_slot(struct kvm *kvm, > const struct kvm_memory_slot *memslot, > gfn_t gfn) > @@ -3656,6 +3669,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn) > struct kvm_memory_slot *memslot; > > memslot = gfn_to_memslot(kvm, gfn); > + update_dirty_quota(kvm, PAGE_SIZE); > mark_page_dirty_in_slot(kvm, memslot, gfn); > } Is mark_page_dirty() allowed to be used outside of a vCPU context? The lack of a vcpu* makes me think it is- I assume we don't want to charge vCPUs for accesses they're not making. Unfortunately we do seem to use it *in* vCPU contexts (see kvm_update_stolen_time() on arm64?), although not on x86 AFAICT.
On Wed, Feb 21, 2024, Anish Moorthy wrote: > > @@ -3656,6 +3669,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn) > > struct kvm_memory_slot *memslot; > > > > memslot = gfn_to_memslot(kvm, gfn); > > + update_dirty_quota(kvm, PAGE_SIZE); > > mark_page_dirty_in_slot(kvm, memslot, gfn); > > } > > Is mark_page_dirty() allowed to be used outside of a vCPU context? It's allowed, but only because we don't have a better option, i.e. it's more tolerated than allowed. :-) > The lack of a vcpu* makes me think it is- I assume we don't want to charge > vCPUs for accesses they're not making. > > Unfortunately we do seem to use it *in* vCPU contexts (see > kvm_update_stolen_time() on arm64?), although not on x86 AFAICT. Use what? mark_page_dirty_in_slot()? x86 _only_ uses it from vCPU context.
On Wed, Feb 21, 2024, Shivam Kumar wrote: > @@ -1291,6 +1293,13 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); > bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); > bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); > unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn); > +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA > +void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes); > +#else > +static inline void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes) > +{ > +} > +#endif > void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn); > void mark_page_dirty(struct kvm *kvm, gfn_t gfn); > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index c3308536482b..217f19100003 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -210,6 +210,7 @@ struct kvm_xen_exit { > #define KVM_EXIT_NOTIFY 37 > #define KVM_EXIT_LOONGARCH_IOCSR 38 > #define KVM_EXIT_MEMORY_FAULT 39 > +#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 40 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -491,6 +492,12 @@ struct kvm_run { > struct kvm_sync_regs regs; > char padding[SYNC_REGS_SIZE_BYTES]; > } s; > + /* > + * Number of bytes the vCPU is allowed to dirty if KVM_CAP_DIRTY_QUOTA is > + * enabled. KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if this quota > + * is exhausted, i.e. dirty_quota_bytes <= 0. > + */ > + long dirty_quota_bytes; This needs to be a u64 so that the size is consistent for 32-bit and 64-bit userspace vs. kernel.
> On 16-Apr-2024, at 10:29 PM, Sean Christopherson <seanjc@google.com> wrote: > On Wed, Feb 21, 2024, Shivam Kumar wrote: >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> index c3308536482b..217f19100003 100644 >> --- a/include/uapi/linux/kvm.h >> +++ b/include/uapi/linux/kvm.h >> @@ -210,6 +210,7 @@ struct kvm_xen_exit { >> #define KVM_EXIT_NOTIFY 37 >> #define KVM_EXIT_LOONGARCH_IOCSR 38 >> #define KVM_EXIT_MEMORY_FAULT 39 >> +#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 40 >> >> /* For KVM_EXIT_INTERNAL_ERROR */ >> /* Emulate instruction failed. */ >> @@ -491,6 +492,12 @@ struct kvm_run { >> struct kvm_sync_regs regs; >> char padding[SYNC_REGS_SIZE_BYTES]; >> } s; >> + /* >> + * Number of bytes the vCPU is allowed to dirty if KVM_CAP_DIRTY_QUOTA is >> + * enabled. KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if this quota >> + * is exhausted, i.e. dirty_quota_bytes <= 0. >> + */ >> + long dirty_quota_bytes; > > This needs to be a u64 so that the size is consistent for 32-bit and 64-bit > userspace vs. kernel. Ack. Thanks, Shivam.
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 3ec0b7a455a0..1858db8b0698 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -7031,6 +7031,23 @@ Please note that the kernel is allowed to use the kvm_run structure as the primary storage for certain register types. Therefore, the kernel may use the values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. +:: + + /* + * Number of bytes the vCPU is allowed to dirty if KVM_CAP_DIRTY_QUOTA is + * enabled. KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if this quota + * is exhausted, i.e. dirty_quota_bytes <= 0. + */ + long dirty_quota_bytes; + +Please note that enforcing the quota is best effort. Dirty quota is reduced by +arch-specific page size when any guest page is dirtied. Also, the guest may dirty +multiple pages before KVM can recheck the quota, e.g. when PML is enabled. + +:: + }; + + 6. Capabilities that can be enabled on vCPUs ============================================ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7e7fd25b09b3..994ecc4e5194 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -167,6 +167,7 @@ static inline bool is_error_page(struct page *page) #define KVM_REQ_VM_DEAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_UNBLOCK 2 #define KVM_REQ_DIRTY_RING_SOFT_FULL 3 +#define KVM_REQ_DIRTY_QUOTA_EXIT 4 #define KVM_REQUEST_ARCH_BASE 8 /* @@ -831,6 +832,7 @@ struct kvm { bool dirty_ring_with_bitmap; bool vm_bugged; bool vm_dead; + bool dirty_quota_enabled; #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER struct notifier_block pm_notifier; @@ -1291,6 +1293,13 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn); +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA +void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes); +#else +static inline void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes) +{ +} +#endif void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn); void mark_page_dirty(struct kvm *kvm, gfn_t gfn); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index c3308536482b..217f19100003 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -210,6 +210,7 @@ struct kvm_xen_exit { #define KVM_EXIT_NOTIFY 37 #define KVM_EXIT_LOONGARCH_IOCSR 38 #define KVM_EXIT_MEMORY_FAULT 39 +#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 40 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -491,6 +492,12 @@ struct kvm_run { struct kvm_sync_regs regs; char padding[SYNC_REGS_SIZE_BYTES]; } s; + /* + * Number of bytes the vCPU is allowed to dirty if KVM_CAP_DIRTY_QUOTA is + * enabled. KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if this quota + * is exhausted, i.e. dirty_quota_bytes <= 0. + */ + long dirty_quota_bytes; }; /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */ @@ -1155,6 +1162,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_MEMORY_ATTRIBUTES 233 #define KVM_CAP_GUEST_MEMFD 234 #define KVM_CAP_VM_TYPES 235 +#define KVM_CAP_DIRTY_QUOTA 236 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index c3308536482b..cf880e26f55f 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -1155,6 +1155,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_MEMORY_ATTRIBUTES 233 #define KVM_CAP_GUEST_MEMFD 234 #define KVM_CAP_VM_TYPES 235 +#define KVM_CAP_DIRTY_QUOTA 236 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 184dab4ee871..c4071cb14d15 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -22,6 +22,9 @@ config HAVE_KVM_IRQ_ROUTING config HAVE_KVM_DIRTY_RING bool +config HAVE_KVM_DIRTY_QUOTA + bool + # Only strongly ordered architectures can select this, as it doesn't # put any explicit constraint on userspace ordering. They can also # select the _ACQ_REL version. diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 10bfc88a69f7..9a1e67187735 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3626,6 +3626,19 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) } EXPORT_SYMBOL_GPL(kvm_clear_guest); +void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes) +{ + struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); + + if (!vcpu || (vcpu->kvm != kvm) || !READ_ONCE(kvm->dirty_quota_enabled)) + return; + + vcpu->run->dirty_quota_bytes -= page_size_bytes; + if (vcpu->run->dirty_quota_bytes <= 0) + kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu); +} +EXPORT_SYMBOL_GPL(update_dirty_quota); + void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn) @@ -3656,6 +3669,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn) struct kvm_memory_slot *memslot; memslot = gfn_to_memslot(kvm, gfn); + update_dirty_quota(kvm, PAGE_SIZE); mark_page_dirty_in_slot(kvm, memslot, gfn); } EXPORT_SYMBOL_GPL(mark_page_dirty); @@ -3665,6 +3679,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) struct kvm_memory_slot *memslot; memslot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); + update_dirty_quota(vcpu->kvm, PAGE_SIZE); mark_page_dirty_in_slot(vcpu->kvm, memslot, gfn); } EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); @@ -4877,6 +4892,8 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) case KVM_CAP_GUEST_MEMFD: return !kvm || kvm_arch_has_private_mem(kvm); #endif + case KVM_CAP_DIRTY_QUOTA: + return !!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_QUOTA); default: break; } @@ -5027,6 +5044,16 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, return r; } + case KVM_CAP_DIRTY_QUOTA: { + int r = -EINVAL; + + if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_QUOTA)) { + WRITE_ONCE(kvm->dirty_quota_enabled, cap->args[0]); + r = 0; + } + + return r; + } default: return kvm_vm_ioctl_enable_cap(kvm, cap); }