diff mbox series

[v4,05/16] KVM: Annotate -EFAULTs from kvm_vcpu_write_guest_page()

Message ID 20230602161921.208564-6-amoorthy@google.com (mailing list archive)
State New, archived
Headers show
Series Improve scalability of KVM + userfaultfd live migration via annotated memory faults. | expand

Commit Message

Anish Moorthy June 2, 2023, 4:19 p.m. UTC
Implement KVM_CAP_MEMORY_FAULT_INFO for uaccess failures in
kvm_vcpu_write_guest_page()

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 virt/kvm/kvm_main.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Sean Christopherson June 14, 2023, 7:10 p.m. UTC | #1
On Fri, Jun 02, 2023, Anish Moorthy wrote:
> Implement KVM_CAP_MEMORY_FAULT_INFO for uaccess failures in
> kvm_vcpu_write_guest_page()
> 
> Signed-off-by: Anish Moorthy <amoorthy@google.com>
> ---
>  virt/kvm/kvm_main.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d9c0fa7c907f..ea27a8178f1a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3090,8 +3090,10 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic);
>  
>  /*
>   * Copy 'len' bytes from 'data' into guest memory at '(gfn * PAGE_SIZE) + offset'
> + * If 'vcpu' is non-null, then may fill its run struct for a
> + * KVM_EXIT_MEMORY_FAULT on uaccess failure.
>   */
> -static int __kvm_write_guest_page(struct kvm *kvm,
> +static int __kvm_write_guest_page(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  				  struct kvm_memory_slot *memslot, gfn_t gfn,
>  			          const void *data, int offset, int len)
>  {
> @@ -3102,8 +3104,13 @@ static int __kvm_write_guest_page(struct kvm *kvm,
>  	if (kvm_is_error_hva(addr))
>  		return -EFAULT;
>  	r = __copy_to_user((void __user *)addr + offset, data, len);
> -	if (r)
> +	if (r) {
> +		if (vcpu)

As mentioned in a previous mail, put this in the (one) caller.  If more callers
come along, which is highly unlikely, we can revisit that decision.  Right now,
it just adds noise, both here and in the helper.

> +			kvm_populate_efault_info(vcpu, gfn * PAGE_SIZE + offset,
> +						 len,
> +						 KVM_MEMORY_FAULT_FLAG_WRITE);

For future reference, the 80 char limit is a soft limit, and with a lot of
subjectivity, can be breached.  In this case, this

			kvm_populate_efault_info(vcpu, gfn * PAGE_SIZE + offset,
						 len, KVM_MEMORY_FAULT_FLAG_WRITE);

is subjectively more readable than

			kvm_populate_efault_info(vcpu, gfn * PAGE_SIZE + offset,
						 len,
						 KVM_MEMORY_FAULT_FLAG_WRITE);
>  		return -EFAULT;
> +	}
>  	mark_page_dirty_in_slot(kvm, memslot, gfn);
>  	return 0;
>  }
> @@ -3113,7 +3120,7 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn,
>  {
>  	struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
>  
> -	return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len);
> +	return __kvm_write_guest_page(kvm, NULL, slot, gfn, data, offset, len);
>  }
>  EXPORT_SYMBOL_GPL(kvm_write_guest_page);
>  
> @@ -3121,8 +3128,8 @@ int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn,
>  			      const void *data, int offset, int len)
>  {
>  	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
> -

Newline after variable declarations.  Double demerits for breaking what was
originally correct :-)

> -	return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
> +	return __kvm_write_guest_page(vcpu->kvm, vcpu, slot, gfn, data,
> +				      offset, len);

With my various suggestions, something like

	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
	int r;

	r = __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
	if (r)
		kvm_handle_guest_uaccess_fault(...);
	return r;

Side topic, "uaccess", and thus any "userfault" variants, is probably a bad name
for the API, because that will fall apart when guest private memory comes along.
Anish Moorthy July 6, 2023, 10:51 p.m. UTC | #2
On Wed, Jun 14, 2023 at 12:10 PM Sean Christopherson <seanjc@google.com> wrote:
>
> For future reference, the 80 char limit is a soft limit, and with a lot of
> subjectivity, can be breached.  In this case, this...

Oh neat, I thought it looked pretty ugly too: done.

> Newline after variable declarations.  Double demerits for breaking what was
> originally correct :-)

:(

>
> As mentioned in a previous mail, put this in the (one) caller.  If more callers
> come along, which is highly unlikely, we can revisit that decision.  Right now,
> it just adds noise, both here and in the helper.
>
> ...
>
> With my various suggestions, something like
>
>         struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
>         int r;
>
>         r = __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
>         if (r)
>                 kvm_handle_guest_uaccess_fault(...);
>         return r;
>

So, the reason I had the logic within the helper was that it also
returns -EFAULT on gfn_to_hva_memslot() failures.

> static int __kvm_write_guest_page(struct kvm *kvm,
>     struct kvm_memory_slot *memslot, gfn_t gfn,
>     const void *data, int offset, int len)
> {
>     int r;
>     unsigned long addr;
>
>     addr = gfn_to_hva_memslot(memslot, gfn);
>     if (kvm_is_error_hva(addr))
>         return -EFAULT;
> ...

Is it ok to catch these with an annotated efault? Strictly speaking
they don't seem to arise from a failed user access (perhaps my
definition is wrong: I'm looking at uaccess.h) so I'm not sure that
annotating them is valid.
Sean Christopherson July 12, 2023, 2:08 p.m. UTC | #3
On Thu, Jul 06, 2023, Anish Moorthy wrote:
> On Wed, Jun 14, 2023 at 12:10 PM Sean Christopherson <seanjc@google.com> wrote:
> > static int __kvm_write_guest_page(struct kvm *kvm,
> >     struct kvm_memory_slot *memslot, gfn_t gfn,
> >     const void *data, int offset, int len)
> > {
> >     int r;
> >     unsigned long addr;
> >
> >     addr = gfn_to_hva_memslot(memslot, gfn);
> >     if (kvm_is_error_hva(addr))
> >         return -EFAULT;
> > ...
> 
> Is it ok to catch these with an annotated efault? Strictly speaking
> they don't seem to arise from a failed user access (perhaps my
> definition is wrong: I'm looking at uaccess.h) so I'm not sure that
> annotating them is valid.

IMO it's ok, and even desirable, to annotate them.  This is a judgment call we
need to make as gfn=>hva translations are a KVM concept, i.e. aren't covered by
uaccess or anything else in the kernel.  Userspace would need to be aware that
an annotated -EFAULT may have failed due to the memslot lookup, but I don't see
that as being problematic, e.g. userspace will likely need to do its own memslot
lookup anyways.

In an ideal world, KVM would flag such "faults" as failing the hva lookup, and
provide the hva when it's a uaccess (or gup()) that fails, i.e. provide userspace
with precise details on the unresolved fault.  But I don't think I want to take
that on in KVM unless it proves to be absolutely necessary.  Userspace *should*
be able to derive the same information, and I'm concerned that providing precise
*and accurate* reporting in KVM would be a maintenance burden.
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d9c0fa7c907f..ea27a8178f1a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3090,8 +3090,10 @@  EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic);
 
 /*
  * Copy 'len' bytes from 'data' into guest memory at '(gfn * PAGE_SIZE) + offset'
+ * If 'vcpu' is non-null, then may fill its run struct for a
+ * KVM_EXIT_MEMORY_FAULT on uaccess failure.
  */
-static int __kvm_write_guest_page(struct kvm *kvm,
+static int __kvm_write_guest_page(struct kvm *kvm, struct kvm_vcpu *vcpu,
 				  struct kvm_memory_slot *memslot, gfn_t gfn,
 			          const void *data, int offset, int len)
 {
@@ -3102,8 +3104,13 @@  static int __kvm_write_guest_page(struct kvm *kvm,
 	if (kvm_is_error_hva(addr))
 		return -EFAULT;
 	r = __copy_to_user((void __user *)addr + offset, data, len);
-	if (r)
+	if (r) {
+		if (vcpu)
+			kvm_populate_efault_info(vcpu, gfn * PAGE_SIZE + offset,
+						 len,
+						 KVM_MEMORY_FAULT_FLAG_WRITE);
 		return -EFAULT;
+	}
 	mark_page_dirty_in_slot(kvm, memslot, gfn);
 	return 0;
 }
@@ -3113,7 +3120,7 @@  int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn,
 {
 	struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
 
-	return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len);
+	return __kvm_write_guest_page(kvm, NULL, slot, gfn, data, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_write_guest_page);
 
@@ -3121,8 +3128,8 @@  int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn,
 			      const void *data, int offset, int len)
 {
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
-
-	return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
+	return __kvm_write_guest_page(vcpu->kvm, vcpu, slot, gfn, data,
+				      offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page);