Message ID | 20230602161921.208564-6-amoorthy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improve scalability of KVM + userfaultfd live migration via annotated memory faults. | expand |
On Fri, Jun 02, 2023, Anish Moorthy wrote: > Implement KVM_CAP_MEMORY_FAULT_INFO for uaccess failures in > kvm_vcpu_write_guest_page() > > Signed-off-by: Anish Moorthy <amoorthy@google.com> > --- > virt/kvm/kvm_main.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index d9c0fa7c907f..ea27a8178f1a 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -3090,8 +3090,10 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic); > > /* > * Copy 'len' bytes from 'data' into guest memory at '(gfn * PAGE_SIZE) + offset' > + * If 'vcpu' is non-null, then may fill its run struct for a > + * KVM_EXIT_MEMORY_FAULT on uaccess failure. > */ > -static int __kvm_write_guest_page(struct kvm *kvm, > +static int __kvm_write_guest_page(struct kvm *kvm, struct kvm_vcpu *vcpu, > struct kvm_memory_slot *memslot, gfn_t gfn, > const void *data, int offset, int len) > { > @@ -3102,8 +3104,13 @@ static int __kvm_write_guest_page(struct kvm *kvm, > if (kvm_is_error_hva(addr)) > return -EFAULT; > r = __copy_to_user((void __user *)addr + offset, data, len); > - if (r) > + if (r) { > + if (vcpu) As mentioned in a previous mail, put this in the (one) caller. If more callers come along, which is highly unlikely, we can revisit that decision. Right now, it just adds noise, both here and in the helper. > + kvm_populate_efault_info(vcpu, gfn * PAGE_SIZE + offset, > + len, > + KVM_MEMORY_FAULT_FLAG_WRITE); For future reference, the 80 char limit is a soft limit, and with a lot of subjectivity, can be breached. In this case, this kvm_populate_efault_info(vcpu, gfn * PAGE_SIZE + offset, len, KVM_MEMORY_FAULT_FLAG_WRITE); is subjectively more readable than kvm_populate_efault_info(vcpu, gfn * PAGE_SIZE + offset, len, KVM_MEMORY_FAULT_FLAG_WRITE); > return -EFAULT; > + } > mark_page_dirty_in_slot(kvm, memslot, gfn); > return 0; > } > @@ -3113,7 +3120,7 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, > { > struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); > > - return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len); > + return __kvm_write_guest_page(kvm, NULL, slot, gfn, data, offset, len); > } > EXPORT_SYMBOL_GPL(kvm_write_guest_page); > > @@ -3121,8 +3128,8 @@ int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, > const void *data, int offset, int len) > { > struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); > - Newline after variable declarations. Double demerits for breaking what was originally correct :-) > - return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len); > + return __kvm_write_guest_page(vcpu->kvm, vcpu, slot, gfn, data, > + offset, len); With my various suggestions, something like struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); int r; r = __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len); if (r) kvm_handle_guest_uaccess_fault(...); return r; Side topic, "uaccess", and thus any "userfault" variants, is probably a bad name for the API, because that will fall apart when guest private memory comes along.
On Wed, Jun 14, 2023 at 12:10 PM Sean Christopherson <seanjc@google.com> wrote: > > For future reference, the 80 char limit is a soft limit, and with a lot of > subjectivity, can be breached. In this case, this... Oh neat, I thought it looked pretty ugly too: done. > Newline after variable declarations. Double demerits for breaking what was > originally correct :-) :( > > As mentioned in a previous mail, put this in the (one) caller. If more callers > come along, which is highly unlikely, we can revisit that decision. Right now, > it just adds noise, both here and in the helper. > > ... > > With my various suggestions, something like > > struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); > int r; > > r = __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len); > if (r) > kvm_handle_guest_uaccess_fault(...); > return r; > So, the reason I had the logic within the helper was that it also returns -EFAULT on gfn_to_hva_memslot() failures. > static int __kvm_write_guest_page(struct kvm *kvm, > struct kvm_memory_slot *memslot, gfn_t gfn, > const void *data, int offset, int len) > { > int r; > unsigned long addr; > > addr = gfn_to_hva_memslot(memslot, gfn); > if (kvm_is_error_hva(addr)) > return -EFAULT; > ... Is it ok to catch these with an annotated efault? Strictly speaking they don't seem to arise from a failed user access (perhaps my definition is wrong: I'm looking at uaccess.h) so I'm not sure that annotating them is valid.
On Thu, Jul 06, 2023, Anish Moorthy wrote: > On Wed, Jun 14, 2023 at 12:10 PM Sean Christopherson <seanjc@google.com> wrote: > > static int __kvm_write_guest_page(struct kvm *kvm, > > struct kvm_memory_slot *memslot, gfn_t gfn, > > const void *data, int offset, int len) > > { > > int r; > > unsigned long addr; > > > > addr = gfn_to_hva_memslot(memslot, gfn); > > if (kvm_is_error_hva(addr)) > > return -EFAULT; > > ... > > Is it ok to catch these with an annotated efault? Strictly speaking > they don't seem to arise from a failed user access (perhaps my > definition is wrong: I'm looking at uaccess.h) so I'm not sure that > annotating them is valid. IMO it's ok, and even desirable, to annotate them. This is a judgment call we need to make as gfn=>hva translations are a KVM concept, i.e. aren't covered by uaccess or anything else in the kernel. Userspace would need to be aware that an annotated -EFAULT may have failed due to the memslot lookup, but I don't see that as being problematic, e.g. userspace will likely need to do its own memslot lookup anyways. In an ideal world, KVM would flag such "faults" as failing the hva lookup, and provide the hva when it's a uaccess (or gup()) that fails, i.e. provide userspace with precise details on the unresolved fault. But I don't think I want to take that on in KVM unless it proves to be absolutely necessary. Userspace *should* be able to derive the same information, and I'm concerned that providing precise *and accurate* reporting in KVM would be a maintenance burden.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d9c0fa7c907f..ea27a8178f1a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3090,8 +3090,10 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic); /* * Copy 'len' bytes from 'data' into guest memory at '(gfn * PAGE_SIZE) + offset' + * If 'vcpu' is non-null, then may fill its run struct for a + * KVM_EXIT_MEMORY_FAULT on uaccess failure. */ -static int __kvm_write_guest_page(struct kvm *kvm, +static int __kvm_write_guest_page(struct kvm *kvm, struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, gfn_t gfn, const void *data, int offset, int len) { @@ -3102,8 +3104,13 @@ static int __kvm_write_guest_page(struct kvm *kvm, if (kvm_is_error_hva(addr)) return -EFAULT; r = __copy_to_user((void __user *)addr + offset, data, len); - if (r) + if (r) { + if (vcpu) + kvm_populate_efault_info(vcpu, gfn * PAGE_SIZE + offset, + len, + KVM_MEMORY_FAULT_FLAG_WRITE); return -EFAULT; + } mark_page_dirty_in_slot(kvm, memslot, gfn); return 0; } @@ -3113,7 +3120,7 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); - return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len); + return __kvm_write_guest_page(kvm, NULL, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_write_guest_page); @@ -3121,8 +3128,8 @@ int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, const void *data, int offset, int len) { struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - - return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len); + return __kvm_write_guest_page(vcpu->kvm, vcpu, slot, gfn, data, + offset, len); } EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page);
Implement KVM_CAP_MEMORY_FAULT_INFO for uaccess failures in kvm_vcpu_write_guest_page() Signed-off-by: Anish Moorthy <amoorthy@google.com> --- virt/kvm/kvm_main.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)