Message ID | 20211114145721.209219-4-shivam.kumar1@nutanix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: Dirty Quota-Based VM Live Migration Auto-Converge | expand |
On Sun, Nov 14, 2021, Shivam Kumar wrote: > +static int kvm_vm_ioctl_enable_dirty_quota_migration(struct kvm *kvm, > + bool enabled) > +{ > + if (!KVM_DIRTY_LOG_PAGE_OFFSET) I don't think we should force architectures to opt in. It would be trivial to add if (kvm_dirty_quota_is_full(vcpu)) { vcpu->run->exit_reason = KVM_EXIT_DIRTY_QUOTA_FULL; r = 0; break; } in the run loops of each architecture. And we can do that in incremental patches without #ifdeffery since it's only the exiting aspect that requires arch help. > + return -EINVAL; > + > + /* > + * For now, dirty quota migration works with dirty bitmap so don't > + * enable it if dirty ring interface is enabled. In future, dirty > + * quota migration may work with dirty ring interface was well. > + */ Why does KVM care? This is a very simple concept. QEMU not using it for the dirty ring doesn't mean KVM can't support it. > + if (kvm->dirty_ring_size) > + return -EINVAL; > + > + /* Return if no change */ > + if (kvm->dirty_quota_migration_enabled == enabled) Needs to be check under lock. > + return -EINVAL; Probably more idiomatic to return 0 if the desired value is the current value. > + mutex_lock(&kvm->lock); > + kvm->dirty_quota_migration_enabled = enabled; Needs to check vCPU creation. > + mutex_unlock(&kvm->lock); > + > + return 0; > +} > + > int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm, > struct kvm_enable_cap *cap) > { > @@ -4305,6 +4339,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, > } > case KVM_CAP_DIRTY_LOG_RING: > return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]); > + case KVM_CAP_DIRTY_QUOTA_MIGRATION: > + return kvm_vm_ioctl_enable_dirty_quota_migration(kvm, > + cap->args[0]); > default: > return kvm_vm_ioctl_enable_cap(kvm, cap); > } > -- > 2.22.3 >
On 18/11/21 11:27 pm, Sean Christopherson wrote: > On Sun, Nov 14, 2021, Shivam Kumar wrote: >> +static int kvm_vm_ioctl_enable_dirty_quota_migration(struct kvm *kvm, >> + bool enabled) >> +{ >> + if (!KVM_DIRTY_LOG_PAGE_OFFSET) > I don't think we should force architectures to opt in. It would be trivial to > add > > if (kvm_dirty_quota_is_full(vcpu)) { > vcpu->run->exit_reason = KVM_EXIT_DIRTY_QUOTA_FULL; > r = 0; > break; > } > > in the run loops of each architecture. And we can do that in incremental patches > without #ifdeffery since it's only the exiting aspect that requires arch help. Noted. Thanks. > >> + return -EINVAL; >> + >> + /* >> + * For now, dirty quota migration works with dirty bitmap so don't >> + * enable it if dirty ring interface is enabled. In future, dirty >> + * quota migration may work with dirty ring interface was well. >> + */ > Why does KVM care? This is a very simple concept. QEMU not using it for the > dirty ring doesn't mean KVM can't support it. > The dirty ring interface, if enabled, blocks the path that updates the dirty bitmap. Our current implementation depends on that path. We were planning to make the required changes in our implementation for it work with dirty ring as well in upcoming patches. Will explore the possibility of doing it in the next patchset only. >> + if (kvm->dirty_ring_size) >> + return -EINVAL; >> + >> + /* Return if no change */ >> + if (kvm->dirty_quota_migration_enabled == enabled) > Needs to be check under lock. Noted. Thanks. > >> + return -EINVAL; > Probably more idiomatic to return 0 if the desired value is the current value. Keeping the case in mind when the userspace is trying to enable it while the migration is already going on(which shouldn't happen), we are returning -EINVAL. Please let me know if 0 still makes more sense. > >> + mutex_lock(&kvm->lock); >> + kvm->dirty_quota_migration_enabled = enabled; > Needs to check vCPU creation. In our current implementation, we are using the KVM_CAP_DIRTY_QUOTA_MIGRATION ioctl to start dirty logging (through dirty counter) on the kernel side. This ioctl is called each time a new migration starts and ends. The dirty quota context of each vCPU is stored in two variables dirty counter and quota which we are allocating at vCPU creation and freeing at vCPU destroy. > >> + mutex_unlock(&kvm->lock); >> + >> + return 0; >> +} >> + >> int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm, >> struct kvm_enable_cap *cap) >> { >> @@ -4305,6 +4339,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, >> } >> case KVM_CAP_DIRTY_LOG_RING: >> return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]); >> + case KVM_CAP_DIRTY_QUOTA_MIGRATION: >> + return kvm_vm_ioctl_enable_dirty_quota_migration(kvm, >> + cap->args[0]); >> default: >> return kvm_vm_ioctl_enable_cap(kvm, cap); >> } >> -- >> 2.22.3 >> Thank you very much for your feedback.
On 20/11/21 1:36 am, Sean Christopherson wrote: > On Sat, Nov 20, 2021, Shivam Kumar wrote: >> On 18/11/21 11:27 pm, Sean Christopherson wrote: >>>> + return -EINVAL; >>> Probably more idiomatic to return 0 if the desired value is the current value. >> Keeping the case in mind when the userspace is trying to enable it while the >> migration is already going on(which shouldn't happen), we are returning >> -EINVAL. Please let me know if 0 still makes more sense. > If the semantics are not "enable/disable", but rather "(re)set the quota", > then it makes sense to allow changing the quota arbitrarily. I agree that the semantics are not apt. Will modify it. Thanks. > >>>> + mutex_lock(&kvm->lock); >>>> + kvm->dirty_quota_migration_enabled = enabled; >>> Needs to check vCPU creation. >> In our current implementation, we are using the >> KVM_CAP_DIRTY_QUOTA_MIGRATION ioctl to start dirty logging (through dirty >> counter) on the kernel side. This ioctl is called each time a new migration >> starts and ends. > Ah, and from the cover letter discussion, you want the count and quota to be > reset when a new migration occurs. That makes sense. > > Actually, if we go the route of using kvm_run to report and update the count/quota, > we don't even need a capability. Userspace can signal each vCPU to induce an exit > to userspace, e.g. at the start of migration, then set the desired quota/count in > vcpu->kvm_run and stuff exit_reason so that KVM updates the quota/count on the > subsequent KVM_RUN. No locking or requests needed, and userspace can reset the > count at will, it just requires a signal. > > It's a little weird to overload exit_reason like that, but if that's a sticking > point we could add a flag in kvm_run somewhere. Requiring an exit to userspace > at the start of migration doesn't seem too onerous. Yes, this path looks flaw-free. We will explore the complexity and how we can simplify its implementation.
On 20/11/21 1:51 am, Shivam Kumar wrote: > > On 20/11/21 1:36 am, Sean Christopherson wrote: >> On Sat, Nov 20, 2021, Shivam Kumar wrote: >>> On 18/11/21 11:27 pm, Sean Christopherson wrote: >>>>> + return -EINVAL; >>>> Probably more idiomatic to return 0 if the desired value is the >>>> current value. >>> Keeping the case in mind when the userspace is trying to enable it >>> while the >>> migration is already going on(which shouldn't happen), we are returning >>> -EINVAL. Please let me know if 0 still makes more sense. >> If the semantics are not "enable/disable", but rather "(re)set the >> quota", >> then it makes sense to allow changing the quota arbitrarily. > I agree that the semantics are not apt. Will modify it. Thanks. >> >>>>> + mutex_lock(&kvm->lock); >>>>> + kvm->dirty_quota_migration_enabled = enabled; >>>> Needs to check vCPU creation. >>> In our current implementation, we are using the >>> KVM_CAP_DIRTY_QUOTA_MIGRATION ioctl to start dirty logging (through >>> dirty >>> counter) on the kernel side. This ioctl is called each time a new >>> migration >>> starts and ends. >> Ah, and from the cover letter discussion, you want the count and >> quota to be >> reset when a new migration occurs. That makes sense. >> >> Actually, if we go the route of using kvm_run to report and update >> the count/quota, >> we don't even need a capability. Userspace can signal each vCPU to >> induce an exit >> to userspace, e.g. at the start of migration, then set the desired >> quota/count in >> vcpu->kvm_run and stuff exit_reason so that KVM updates the >> quota/count on the >> subsequent KVM_RUN. No locking or requests needed, and userspace can >> reset the >> count at will, it just requires a signal. >> >> It's a little weird to overload exit_reason like that, but if that's >> a sticking >> point we could add a flag in kvm_run somewhere. Requiring an exit to >> userspace >> at the start of migration doesn't seem too onerous. > Yes, this path looks flaw-free. We will explore the complexity and how > we can simplify its implementation. Is it okay to define the per-vcpu dirty quota and dirty count in the kvm_run structure itself? It can save space and reduce the complexity of the implemenation by large margin.
On Thu, Nov 25, 2021, Shivam Kumar wrote: > > On 20/11/21 1:51 am, Shivam Kumar wrote: > > > > On 20/11/21 1:36 am, Sean Christopherson wrote: > > > Actually, if we go the route of using kvm_run to report and update the > > > count/quota, we don't even need a capability. Userspace can signal each > > > vCPU to induce an exit to userspace, e.g. at the start of migration, then > > > set the desired quota/count in vcpu->kvm_run and stuff exit_reason so > > > that KVM updates the quota/count on the subsequent KVM_RUN. No locking > > > or requests needed, and userspace can reset the count at will, it just > > > requires a signal. > > > > > > It's a little weird to overload exit_reason like that, but if that's a > > > sticking point we could add a flag in kvm_run somewhere. Requiring an > > > exit to userspace at the start of migration doesn't seem too onerous. > > > > Yes, this path looks flaw-free. We will explore the complexity and how > > we can simplify its implementation. > > Is it okay to define the per-vcpu dirty quota and dirty count in the kvm_run > structure itself? It can save space and reduce the complexity of the > implemenation by large margin. Paolo, I'm guessing this question is directed at you since I made the suggestion :-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index aeeb071c7688..6679bceee649 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -277,6 +277,10 @@ the VCPU file descriptor can be mmap-ed, including: KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE. For more information on KVM_CAP_DIRTY_LOG_RING, see section 8.3. +- if KVM_CAP_DIRTY_QUOTA_MIGRATION is available, a number of pages at + KVM_DIRTY_QUOTA_PAGE_OFFSET * PAGE_SIZE. For more information on + KVM_CAP_DIRTY_QUOTA_MIGRATION, see section 8.35. + 4.6 KVM_SET_MEMORY_REGION ------------------------- @@ -7484,3 +7488,38 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace the hypercalls whose corresponding bit is in the argument, and return ENOSYS for the others. + +8.35 KVM_CAP_DIRTY_QUOTA_MIGRATION +--------------------------- + +:Architectures: x86 +:Parameters: args[0] - boolean value specifying whether to enable or +disable dirty quota migration (true and false respectively) + +With dirty quota migration, memory dirtying is throttled by setting a +limit on the number of pages a vCPU can dirty in given fixed microscopic +size time intervals. This limit depends on the network throughput +calculated over the last few intervals so as to throttle the vCPUs based +on available network bandwidth. We are referring to this limit as the +"dirty quota" of a vCPU and the fixed size intervals as the "dirty quota +intervals". + +vCPUDirtyQuotaContext keeps the dirty quota context for each vCPU. It +keeps the number of pages the vCPU has dirtied (dirty_counter) in the +ongoing dirty quota interval and the maximum number of dirties allowed for +the vCPU (dirty_quota) in the ongoing dirty quota interval. + + struct vCPUDirtyQuotaContext { + u64 dirty_counter; + u64 dirty_quota; + }; + +The flag dirty_quota_migration_enabled determines whether dirty quota- +based throttling is enabled for an ongoing migration or not. + +When the guest tries to dirty a page, it leads to a vmexit as each page +is write-protected. In the vmexit path, we increment the dirty_counter +for the corresponding vCPU. Then, we check if the vCPU has exceeded its +quota. If yes, we exit to userspace with a new exit reason +KVM_EXIT_DIRTY_QUOTA_FULL. This "quota full" event is further handled on +the userspace side. diff --git a/include/linux/dirty_quota_migration.h b/include/linux/dirty_quota_migration.h index 8c12fa428436..b6c6f5f896dd 100644 --- a/include/linux/dirty_quota_migration.h +++ b/include/linux/dirty_quota_migration.h @@ -24,9 +24,17 @@ static inline int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPU return 0; } +static inline struct page *kvm_dirty_quota_context_get_page( + struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset) +{ + return NULL; +} + #else /* KVM_DIRTY_QUOTA_PAGE_OFFSET == 0 */ int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx); +struct page *kvm_dirty_quota_context_get_page( + struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset); #endif /* KVM_DIRTY_QUOTA_PAGE_OFFSET == 0 */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 647f7e1a04dc..a6785644bf47 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204 #define KVM_CAP_ARM_MTE 205 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206 +#define KVM_CAP_DIRTY_QUOTA_MIGRATION 207 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/dirty_quota_migration.c b/virt/kvm/dirty_quota_migration.c index 262f071aac0c..7e9ace760939 100644 --- a/virt/kvm/dirty_quota_migration.c +++ b/virt/kvm/dirty_quota_migration.c @@ -12,3 +12,9 @@ int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx) memset((*vCPUdqctx), 0, size); return 0; } + +struct page *kvm_dirty_quota_context_get_page( + struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset) +{ + return vmalloc_to_page((void *)vCPUdqctx + offset * PAGE_SIZE); +} diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5626ae1b92ce..1564d3a3f608 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3519,6 +3519,9 @@ static vm_fault_t kvm_vcpu_fault(struct vm_fault *vmf) page = kvm_dirty_ring_get_page( &vcpu->dirty_ring, vmf->pgoff - KVM_DIRTY_LOG_PAGE_OFFSET); + else if (vmf->pgoff == KVM_DIRTY_QUOTA_PAGE_OFFSET) + page = kvm_dirty_quota_context_get_page(vcpu->vCPUdqctx, + vmf->pgoff - KVM_DIRTY_QUOTA_PAGE_OFFSET); else return kvm_arch_vcpu_fault(vcpu, vmf); get_page(page); @@ -4207,6 +4210,12 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) #endif case KVM_CAP_BINARY_STATS_FD: return 1; + case KVM_CAP_DIRTY_QUOTA_MIGRATION: +#if KVM_DIRTY_QUOTA_PAGE_OFFSET > 0 + return 1; +#else + return 0; +#endif default: break; } @@ -4273,6 +4282,31 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm) return cleared; } +static int kvm_vm_ioctl_enable_dirty_quota_migration(struct kvm *kvm, + bool enabled) +{ + if (!KVM_DIRTY_LOG_PAGE_OFFSET) + return -EINVAL; + + /* + * For now, dirty quota migration works with dirty bitmap so don't + * enable it if dirty ring interface is enabled. In future, dirty + * quota migration may work with dirty ring interface was well. + */ + if (kvm->dirty_ring_size) + return -EINVAL; + + /* Return if no change */ + if (kvm->dirty_quota_migration_enabled == enabled) + return -EINVAL; + + mutex_lock(&kvm->lock); + kvm->dirty_quota_migration_enabled = enabled; + mutex_unlock(&kvm->lock); + + return 0; +} + int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { @@ -4305,6 +4339,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, } case KVM_CAP_DIRTY_LOG_RING: return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]); + case KVM_CAP_DIRTY_QUOTA_MIGRATION: + return kvm_vm_ioctl_enable_dirty_quota_migration(kvm, + cap->args[0]); default: return kvm_vm_ioctl_enable_cap(kvm, cap); }