Message ID | 20230412213510.1220557-6-amoorthy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improve scalability of KVM + userfaultfd live migration via annotated memory faults. | expand |
On 4/13/2023 5:34 AM, Anish Moorthy wrote: > KVM_CAP_MEMORY_FAULT_INFO allows kvm_run to return useful information > besides a return value of -1 and errno of EFAULT when a vCPU fails an > access to guest memory. > > Add documentation, updates to the KVM headers, and a helper function > (kvm_populate_efault_info) for implementing the capability. kvm_populate_efault_info(), function name. > > Besides simply filling the run struct, kvm_populate_efault_info takes Ditto > two safety measures > > a. It tries to prevent concurrent fills on a single vCPU run struct > by checking that the run struct being modified corresponds to the > currently loaded vCPU. > b. It tries to avoid filling an already-populated run struct by > checking whether the exit reason has been modified since entry > into KVM_RUN. > > Finally, mark KVM_CAP_MEMORY_FAULT_INFO as available on arm64 and x86, > even though EFAULT annotation are currently totally absent. Picking a > point to declare the implementation "done" is difficult because > > 1. Annotations will be performed incrementally in subsequent commits > across both core and arch-specific KVM. > 2. The initial series will very likely miss some cases which need > annotation. Although these omissions are to be fixed in the future, > userspace thus still needs to expect and be able to handle > unannotated EFAULTs. > > Given these qualifications, just marking it available here seems the > least arbitrary thing to do. > > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Anish Moorthy <amoorthy@google.com> > --- > Documentation/virt/kvm/api.rst | 35 +++++++++++++++++++++++++++ > arch/arm64/kvm/arm.c | 1 + > arch/x86/kvm/x86.c | 1 + > include/linux/kvm_host.h | 12 ++++++++++ > include/uapi/linux/kvm.h | 16 +++++++++++++ > tools/include/uapi/linux/kvm.h | 11 +++++++++ > virt/kvm/kvm_main.c | 44 ++++++++++++++++++++++++++++++++++ > 7 files changed, 120 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 48fad65568227..f174f43c38d45 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6637,6 +6637,18 @@ array field represents return values. The userspace should update the return > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > spec refer, https://github.com/riscv/riscv-sbi-doc. > > +:: > + > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > + __u64 flags; > + __u64 gpa; > + __u64 len; /* in bytes */ > + } memory_fault; > + > +Indicates a vCPU memory fault on the guest physical address range > +[gpa, gpa + len). See KVM_CAP_MEMORY_FAULT_INFO for more details. > + > :: > > /* KVM_EXIT_NOTIFY */ > @@ -7670,6 +7682,29 @@ This capability is aimed to mitigate the threat that malicious VMs can > cause CPU stuck (due to event windows don't open up) and make the CPU > unavailable to host or other VMs. > > +7.34 KVM_CAP_MEMORY_FAULT_INFO > +------------------------------ > + > +:Architectures: x86, arm64 > +:Parameters: args[0] - KVM_MEMORY_FAULT_INFO_ENABLE|DISABLE to enable/disable > + the capability. > +:Returns: 0 on success, or -EINVAL if unsupported or invalid args[0]. > + > +When enabled, EFAULTs "returned" by KVM_RUN in response to failed vCPU guest > +memory accesses may be annotated with additional information. When KVM_RUN > +returns an error with errno=EFAULT, userspace may check the exit reason: if it > +is KVM_EXIT_MEMORY_FAULT, userspace is then permitted to read the 'memory_fault' > +member of the run struct. > + > +The 'gpa' and 'len' (in bytes) fields describe the range of guest > +physical memory to which access failed, i.e. [gpa, gpa + len). 'flags' is > +currently always zero. > + > +NOTE: The implementation of this capability is incomplete. Even with it enabled, > +userspace may receive "bare" EFAULTs (i.e. exit reason != > +KVM_EXIT_MEMORY_FAULT) from KVM_RUN. These should be considered bugs and > +reported to the maintainers. > + > 8. Other capabilities. > ====================== > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index a43e1cb3b7e97..a932346b59f61 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -220,6 +220,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_VCPU_ATTRIBUTES: > case KVM_CAP_PTP_KVM: > case KVM_CAP_ARM_SYSTEM_SUSPEND: > + case KVM_CAP_MEMORY_FAULT_INFO: > r = 1; > break; > case KVM_CAP_SET_GUEST_DEBUG2: > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index ca73eb066af81..0925678e741de 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4432,6 +4432,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_VAPIC: > case KVM_CAP_ENABLE_CAP: > case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: > + case KVM_CAP_MEMORY_FAULT_INFO: > r = 1; > break; > case KVM_CAP_EXIT_HYPERCALL: > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 90edc16d37e59..776f9713f3921 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -805,6 +805,8 @@ struct kvm { > struct notifier_block pm_notifier; > #endif > char stats_id[KVM_STATS_NAME_SIZE]; > + > + bool fill_efault_info; > }; > > #define kvm_err(fmt, ...) \ > @@ -2277,4 +2279,14 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) > /* Max number of entries allowed for each kvm dirty ring */ > #define KVM_DIRTY_RING_MAX_ENTRIES 65536 > > +/* > + * Attempts to set the run struct's exit reason to KVM_EXIT_MEMORY_FAULT and > + * populate the memory_fault field with the given information. > + * > + * Does nothing if KVM_CAP_MEMORY_FAULT_INFO is not enabled. WARNs and does > + * nothing if the exit reason is not KVM_EXIT_UNKNOWN, or if 'vcpu' is not > + * the current running vcpu. > + */ > +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu, > + uint64_t gpa, uint64_t len); > #endif > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 4003a166328cc..bc73e8381a2bb 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -264,6 +264,7 @@ struct kvm_xen_exit { > #define KVM_EXIT_RISCV_SBI 35 > #define KVM_EXIT_RISCV_CSR 36 > #define KVM_EXIT_NOTIFY 37 > +#define KVM_EXIT_MEMORY_FAULT 38 struct exit_reason[] string for KVM_EXIT_MEMORY_FAULT can be added as well. > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -505,6 +506,16 @@ struct kvm_run { > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) > __u32 flags; > } notify; > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > + /* > + * Indicates a memory fault on the guest physical address range > + * [gpa, gpa + len). flags is always zero for now. > + */ > + __u64 flags; > + __u64 gpa; > + __u64 len; /* in bytes */ > + } memory_fault; > /* Fix the size of the union. */ > char padding[256]; > }; > @@ -1184,6 +1195,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224 > #define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225 > #define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226 > +#define KVM_CAP_MEMORY_FAULT_INFO 227 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -2237,4 +2249,8 @@ struct kvm_s390_zpci_op { > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > +/* flags for KVM_CAP_MEMORY_FAULT_INFO */ > +#define KVM_MEMORY_FAULT_INFO_DISABLE 0 > +#define KVM_MEMORY_FAULT_INFO_ENABLE 1 > + > #endif /* __LINUX_KVM_H */ > diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h > index 4003a166328cc..5c57796364d65 100644 > --- a/tools/include/uapi/linux/kvm.h > +++ b/tools/include/uapi/linux/kvm.h > @@ -264,6 +264,7 @@ struct kvm_xen_exit { > #define KVM_EXIT_RISCV_SBI 35 > #define KVM_EXIT_RISCV_CSR 36 > #define KVM_EXIT_NOTIFY 37 > +#define KVM_EXIT_MEMORY_FAULT 38 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -505,6 +506,16 @@ struct kvm_run { > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) > __u32 flags; > } notify; > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > + /* > + * Indicates a memory fault on the guest physical address range > + * [gpa, gpa + len). flags is always zero for now. > + */ > + __u64 flags; > + __u64 gpa; > + __u64 len; /* in bytes */ > + } memory_fault; > /* Fix the size of the union. */ > char padding[256]; > }; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index cf7d3de6f3689..f3effc93cbef3 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1142,6 +1142,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) > spin_lock_init(&kvm->mn_invalidate_lock); > rcuwait_init(&kvm->mn_memslots_update_rcuwait); > xa_init(&kvm->vcpu_array); > + kvm->fill_efault_info = false; > > INIT_LIST_HEAD(&kvm->gpc_list); > spin_lock_init(&kvm->gpc_lock); > @@ -4096,6 +4097,8 @@ static long kvm_vcpu_ioctl(struct file *filp, > put_pid(oldpid); > } > r = kvm_arch_vcpu_ioctl_run(vcpu); > + WARN_ON_ONCE(r == -EFAULT && > + vcpu->run->exit_reason != KVM_EXIT_MEMORY_FAULT); > trace_kvm_userspace_exit(vcpu->run->exit_reason, r); > break; > } > @@ -4672,6 +4675,15 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, > > return r; > } > + case KVM_CAP_MEMORY_FAULT_INFO: { > + if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap) > + || (cap->args[0] != KVM_MEMORY_FAULT_INFO_ENABLE > + && cap->args[0] != KVM_MEMORY_FAULT_INFO_DISABLE)) { > + return -EINVAL; > + } > + kvm->fill_efault_info = cap->args[0] == KVM_MEMORY_FAULT_INFO_ENABLE; > + return 0; > + } > default: > return kvm_vm_ioctl_enable_cap(kvm, cap); > } > @@ -6173,3 +6185,35 @@ int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn, > > return init_context.err; > } > + > +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu, > + uint64_t gpa, uint64_t len) > +{ > + if (!vcpu->kvm->fill_efault_info) > + return; > + > + preempt_disable(); > + /* > + * Ensure the this vCPU isn't modifying another vCPU's run struct, which > + * would open the door for races between concurrent calls to this > + * function. > + */ > + if (WARN_ON_ONCE(vcpu != __this_cpu_read(kvm_running_vcpu))) > + goto out; > + /* > + * Try not to overwrite an already-populated run struct. > + * This isn't a perfect solution, as there's no guarantee that the exit > + * reason is set before the run struct is populated, but it should prevent > + * at least some bugs. > + */ > + else if (WARN_ON_ONCE(vcpu->run->exit_reason != KVM_EXIT_UNKNOWN)) > + goto out; > + > + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; > + vcpu->run->memory_fault.gpa = gpa; > + vcpu->run->memory_fault.len = len; > + vcpu->run->memory_fault.flags = 0; > + > +out: > + preempt_enable(); > +}
On Wed, Apr 19, 2023 at 6:57 AM Hoo Robert <robert.hoo.linux@gmail.com> wrote: > > kvm_populate_efault_info(), function name. > ... > Ditto Done > struct exit_reason[] string for KVM_EXIT_MEMORY_FAULT can be added as > well. Done, assuming you mean the exit_reasons_known definition in kvm_util.c
Anish Moorthy <amoorthy@google.com> 于2023年4月21日周五 02:10写道: > > struct exit_reason[] string for KVM_EXIT_MEMORY_FAULT can be added as > > well. > > Done, assuming you mean the exit_reasons_known definition in kvm_util.c Yes.
On Wed, Apr 12, 2023 at 09:34:53PM +0000, Anish Moorthy wrote: [...] > +7.34 KVM_CAP_MEMORY_FAULT_INFO > +------------------------------ > + > +:Architectures: x86, arm64 > +:Parameters: args[0] - KVM_MEMORY_FAULT_INFO_ENABLE|DISABLE to enable/disable > + the capability. > +:Returns: 0 on success, or -EINVAL if unsupported or invalid args[0]. > + > +When enabled, EFAULTs "returned" by KVM_RUN in response to failed vCPU guest > +memory accesses may be annotated with additional information. When KVM_RUN > +returns an error with errno=EFAULT, userspace may check the exit reason: if it > +is KVM_EXIT_MEMORY_FAULT, userspace is then permitted to read the 'memory_fault' > +member of the run struct. So the other angle of my concern w.r.t. NOWAIT exits is the fact that userspace gets to decide whether or not we annotate such an exit. We all agree that a NOWAIT exit w/o context isn't actionable, right? Sean is suggesting that we abuse the fact that kvm_run already contains junk for EFAULT exits and populate kvm_run::memory_fault unconditionally [*]. I agree with him, and it eliminates the odd quirk of 'bare' NOWAIT exits too. Old userspace will still see 'garbage' in kvm_run struct, but one man's trash is another man's treasure after all :) So, based on that, could you: - Unconditionally prepare MEMORY_FAULT exits everywhere you're converting here - Redefine KVM_CAP_MEMORY_FAULT_INFO as an informational cap, and do not accept an attempt to enable it. Instead, have calls to KVM_CHECK_EXTENSION return a set of flags describing the supported feature set. Eventually, you can stuff a bit in there to advertise that all EFAULTs are reliable. [*] https://lore.kernel.org/kvmarm/ZHjqkdEOVUiazj5d@google.com/ > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index cf7d3de6f3689..f3effc93cbef3 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1142,6 +1142,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) > spin_lock_init(&kvm->mn_invalidate_lock); > rcuwait_init(&kvm->mn_memslots_update_rcuwait); > xa_init(&kvm->vcpu_array); > + kvm->fill_efault_info = false; > > INIT_LIST_HEAD(&kvm->gpc_list); > spin_lock_init(&kvm->gpc_lock); > @@ -4096,6 +4097,8 @@ static long kvm_vcpu_ioctl(struct file *filp, > put_pid(oldpid); > } > r = kvm_arch_vcpu_ioctl_run(vcpu); > + WARN_ON_ONCE(r == -EFAULT && > + vcpu->run->exit_reason != KVM_EXIT_MEMORY_FAULT); This might be a bit overkill, as it will definitely fire on unsupported architectures. Instead you may want to condition this on an architecture actually selecting support for MEMORY_FAULT_INFO. > trace_kvm_userspace_exit(vcpu->run->exit_reason, r); > break; > } > @@ -4672,6 +4675,15 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, > > return r; > } > + case KVM_CAP_MEMORY_FAULT_INFO: { > + if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap) > + || (cap->args[0] != KVM_MEMORY_FAULT_INFO_ENABLE > + && cap->args[0] != KVM_MEMORY_FAULT_INFO_DISABLE)) { > + return -EINVAL; > + } > + kvm->fill_efault_info = cap->args[0] == KVM_MEMORY_FAULT_INFO_ENABLE; > + return 0; > + } > default: > return kvm_vm_ioctl_enable_cap(kvm, cap); > } > @@ -6173,3 +6185,35 @@ int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn, > > return init_context.err; > } > + > +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu, > + uint64_t gpa, uint64_t len) > +{ > + if (!vcpu->kvm->fill_efault_info) > + return; > + > + preempt_disable(); > + /* > + * Ensure the this vCPU isn't modifying another vCPU's run struct, which > + * would open the door for races between concurrent calls to this > + * function. > + */ > + if (WARN_ON_ONCE(vcpu != __this_cpu_read(kvm_running_vcpu))) > + goto out; > + /* > + * Try not to overwrite an already-populated run struct. > + * This isn't a perfect solution, as there's no guarantee that the exit > + * reason is set before the run struct is populated, but it should prevent > + * at least some bugs. > + */ > + else if (WARN_ON_ONCE(vcpu->run->exit_reason != KVM_EXIT_UNKNOWN)) > + goto out; > + > + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; > + vcpu->run->memory_fault.gpa = gpa; > + vcpu->run->memory_fault.len = len; > + vcpu->run->memory_fault.flags = 0; > + > +out: > + preempt_enable(); > +} > -- > 2.40.0.577.gac1e443424-goog >
On Thu, Jun 1, 2023 at 12:52 PM Oliver Upton <oliver.upton@linux.dev> wrote: > > So the other angle of my concern w.r.t. NOWAIT exits is the fact that > userspace gets to decide whether or not we annotate such an exit. We all > agree that a NOWAIT exit w/o context isn't actionable, right? Yup > Sean is suggesting that we abuse the fact that kvm_run already contains > junk for EFAULT exits and populate kvm_run::memory_fault unconditionally > [*]. I agree with him, and it eliminates the odd quirk of 'bare' NOWAIT > exits too. Old userspace will still see 'garbage' in kvm_run struct, > but one man's trash is another man's treasure after all :) > > So, based on that, could you: > > - Unconditionally prepare MEMORY_FAULT exits everywhere you're > converting here > > - Redefine KVM_CAP_MEMORY_FAULT_INFO as an informational cap, and do > not accept an attempt to enable it. Instead, have calls to > KVM_CHECK_EXTENSION return a set of flags describing the supported > feature set. Sure. I've been collecting feedback as it comes in, so I can send up a v4 with everything up to now soon. The major thing left to resolve is that the exact set of annotations is still waiting on feedback: I've already gone ahead and dropped everything I wasn't sure of in [1], so the next version will be quite a bit smaller. If it turns out that I've dropped too much, then I can add things back in based on the feedback. [1] https://lore.kernel.org/kvm/20230412213510.1220557-1-amoorthy@google.com/T/#mfe28e6a5015b7cd8c5ea1c351b0ca194aeb33daf > Eventually, you can stuff a bit in there to advertise that all > EFAULTs are reliable. I don't think this is an objective: the idea is to annotate efaults tracing back to user accesses (see [2]). Although the idea of annotating with some "unrecoverable" flag set for other efaults has been tossed around, so we may end up with that. [2] https://lore.kernel.org/kvm/20230412213510.1220557-1-amoorthy@google.com/T/#m5715f3a14a6a9ff9a4188918ec105592f0bfc69a > [*] https://lore.kernel.org/kvmarm/ZHjqkdEOVUiazj5d@google.com/ > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index cf7d3de6f3689..f3effc93cbef3 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -1142,6 +1142,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) > > spin_lock_init(&kvm->mn_invalidate_lock); > > rcuwait_init(&kvm->mn_memslots_update_rcuwait); > > xa_init(&kvm->vcpu_array); > > + kvm->fill_efault_info = false; > > > > INIT_LIST_HEAD(&kvm->gpc_list); > > spin_lock_init(&kvm->gpc_lock); > > @@ -4096,6 +4097,8 @@ static long kvm_vcpu_ioctl(struct file *filp, > > put_pid(oldpid); > > } > > r = kvm_arch_vcpu_ioctl_run(vcpu); > > + WARN_ON_ONCE(r == -EFAULT && > > + vcpu->run->exit_reason != KVM_EXIT_MEMORY_FAULT); > > This might be a bit overkill, as it will definitely fire on unsupported > architectures. Instead you may want to condition this on an architecture > actually selecting support for MEMORY_FAULT_INFO. Ah, that's embarrassing. Thanks for the catch.
On Thu, Jun 01, 2023 at 01:30:58PM -0700, Anish Moorthy wrote: > On Thu, Jun 1, 2023 at 12:52 PM Oliver Upton <oliver.upton@linux.dev> wrote: > > Eventually, you can stuff a bit in there to advertise that all > > EFAULTs are reliable. > > I don't think this is an objective: the idea is to annotate efaults > tracing back to user accesses (see [2]). Although the idea of > annotating with some "unrecoverable" flag set for other efaults has > been tossed around, so we may end up with that. Right, there's quite a bit of detail entailed by what such a bit means... In any case, the idea would be to have a forward-looking stance with the UAPI where we can bolt on more things to the existing CAP in the future. > [2] https://lore.kernel.org/kvm/20230412213510.1220557-1-amoorthy@google.com/T/#m5715f3a14a6a9ff9a4188918ec105592f0bfc69a > > > [*] https://lore.kernel.org/kvmarm/ZHjqkdEOVUiazj5d@google.com/ > > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > index cf7d3de6f3689..f3effc93cbef3 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -1142,6 +1142,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) > > > spin_lock_init(&kvm->mn_invalidate_lock); > > > rcuwait_init(&kvm->mn_memslots_update_rcuwait); > > > xa_init(&kvm->vcpu_array); > > > + kvm->fill_efault_info = false; > > > > > > INIT_LIST_HEAD(&kvm->gpc_list); > > > spin_lock_init(&kvm->gpc_lock); > > > @@ -4096,6 +4097,8 @@ static long kvm_vcpu_ioctl(struct file *filp, > > > put_pid(oldpid); > > > } > > > r = kvm_arch_vcpu_ioctl_run(vcpu); > > > + WARN_ON_ONCE(r == -EFAULT && > > > + vcpu->run->exit_reason != KVM_EXIT_MEMORY_FAULT); > > > > This might be a bit overkill, as it will definitely fire on unsupported > > architectures. Instead you may want to condition this on an architecture > > actually selecting support for MEMORY_FAULT_INFO. > > Ah, that's embarrassing. Thanks for the catch. No problem at all. Pretty sure I've done a lot more actually egregious changes than you have ;) While we're here, forgot to mention it before but please clean up that indentation too. I think you may've gotten in a fight with the Google3 styling of your editor and lost :)
On 2023-04-12 21:34:53, Anish Moorthy wrote: > KVM_CAP_MEMORY_FAULT_INFO allows kvm_run to return useful information > besides a return value of -1 and errno of EFAULT when a vCPU fails an > access to guest memory. > > Add documentation, updates to the KVM headers, and a helper function > (kvm_populate_efault_info) for implementing the capability. > > Besides simply filling the run struct, kvm_populate_efault_info takes > two safety measures > > a. It tries to prevent concurrent fills on a single vCPU run struct > by checking that the run struct being modified corresponds to the > currently loaded vCPU. > b. It tries to avoid filling an already-populated run struct by > checking whether the exit reason has been modified since entry > into KVM_RUN. > > Finally, mark KVM_CAP_MEMORY_FAULT_INFO as available on arm64 and x86, > even though EFAULT annotation are currently totally absent. Picking a > point to declare the implementation "done" is difficult because > > 1. Annotations will be performed incrementally in subsequent commits > across both core and arch-specific KVM. > 2. The initial series will very likely miss some cases which need > annotation. Although these omissions are to be fixed in the future, > userspace thus still needs to expect and be able to handle > unannotated EFAULTs. > > Given these qualifications, just marking it available here seems the > least arbitrary thing to do. > > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Anish Moorthy <amoorthy@google.com> > --- > Documentation/virt/kvm/api.rst | 35 +++++++++++++++++++++++++++ > arch/arm64/kvm/arm.c | 1 + > arch/x86/kvm/x86.c | 1 + > include/linux/kvm_host.h | 12 ++++++++++ > include/uapi/linux/kvm.h | 16 +++++++++++++ > tools/include/uapi/linux/kvm.h | 11 +++++++++ > virt/kvm/kvm_main.c | 44 ++++++++++++++++++++++++++++++++++ > 7 files changed, 120 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 48fad65568227..f174f43c38d45 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6637,6 +6637,18 @@ array field represents return values. The userspace should update the return > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > spec refer, https://github.com/riscv/riscv-sbi-doc. > > +:: > + > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > + __u64 flags; > + __u64 gpa; > + __u64 len; /* in bytes */ > + } memory_fault; > + > +Indicates a vCPU memory fault on the guest physical address range > +[gpa, gpa + len). See KVM_CAP_MEMORY_FAULT_INFO for more details. > + > :: > > /* KVM_EXIT_NOTIFY */ > @@ -7670,6 +7682,29 @@ This capability is aimed to mitigate the threat that malicious VMs can > cause CPU stuck (due to event windows don't open up) and make the CPU > unavailable to host or other VMs. > > +7.34 KVM_CAP_MEMORY_FAULT_INFO > +------------------------------ > + > +:Architectures: x86, arm64 > +:Parameters: args[0] - KVM_MEMORY_FAULT_INFO_ENABLE|DISABLE to enable/disable > + the capability. > +:Returns: 0 on success, or -EINVAL if unsupported or invalid args[0]. > + > +When enabled, EFAULTs "returned" by KVM_RUN in response to failed vCPU guest > +memory accesses may be annotated with additional information. When KVM_RUN > +returns an error with errno=EFAULT, userspace may check the exit reason: if it > +is KVM_EXIT_MEMORY_FAULT, userspace is then permitted to read the 'memory_fault' > +member of the run struct. > + > +The 'gpa' and 'len' (in bytes) fields describe the range of guest > +physical memory to which access failed, i.e. [gpa, gpa + len). 'flags' is > +currently always zero. > + > +NOTE: The implementation of this capability is incomplete. Even with it enabled, > +userspace may receive "bare" EFAULTs (i.e. exit reason != > +KVM_EXIT_MEMORY_FAULT) from KVM_RUN. These should be considered bugs and > +reported to the maintainers. > + > 8. Other capabilities. > ====================== > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index a43e1cb3b7e97..a932346b59f61 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -220,6 +220,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_VCPU_ATTRIBUTES: > case KVM_CAP_PTP_KVM: > case KVM_CAP_ARM_SYSTEM_SUSPEND: > + case KVM_CAP_MEMORY_FAULT_INFO: > r = 1; > break; > case KVM_CAP_SET_GUEST_DEBUG2: > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index ca73eb066af81..0925678e741de 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4432,6 +4432,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_VAPIC: > case KVM_CAP_ENABLE_CAP: > case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: > + case KVM_CAP_MEMORY_FAULT_INFO: > r = 1; > break; > case KVM_CAP_EXIT_HYPERCALL: > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 90edc16d37e59..776f9713f3921 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -805,6 +805,8 @@ struct kvm { > struct notifier_block pm_notifier; > #endif > char stats_id[KVM_STATS_NAME_SIZE]; > + > + bool fill_efault_info; > }; > > #define kvm_err(fmt, ...) \ > @@ -2277,4 +2279,14 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) > /* Max number of entries allowed for each kvm dirty ring */ > #define KVM_DIRTY_RING_MAX_ENTRIES 65536 > > +/* > + * Attempts to set the run struct's exit reason to KVM_EXIT_MEMORY_FAULT and > + * populate the memory_fault field with the given information. > + * > + * Does nothing if KVM_CAP_MEMORY_FAULT_INFO is not enabled. WARNs and does > + * nothing if the exit reason is not KVM_EXIT_UNKNOWN, or if 'vcpu' is not > + * the current running vcpu. > + */ > +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu, > + uint64_t gpa, uint64_t len); > #endif > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 4003a166328cc..bc73e8381a2bb 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -264,6 +264,7 @@ struct kvm_xen_exit { > #define KVM_EXIT_RISCV_SBI 35 > #define KVM_EXIT_RISCV_CSR 36 > #define KVM_EXIT_NOTIFY 37 > +#define KVM_EXIT_MEMORY_FAULT 38 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -505,6 +506,16 @@ struct kvm_run { > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) > __u32 flags; > } notify; > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > + /* > + * Indicates a memory fault on the guest physical address range > + * [gpa, gpa + len). flags is always zero for now. > + */ > + __u64 flags; > + __u64 gpa; > + __u64 len; /* in bytes */ > + } memory_fault; > /* Fix the size of the union. */ > char padding[256]; > }; > @@ -1184,6 +1195,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224 > #define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225 > #define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226 > +#define KVM_CAP_MEMORY_FAULT_INFO 227 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -2237,4 +2249,8 @@ struct kvm_s390_zpci_op { > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > +/* flags for KVM_CAP_MEMORY_FAULT_INFO */ > +#define KVM_MEMORY_FAULT_INFO_DISABLE 0 > +#define KVM_MEMORY_FAULT_INFO_ENABLE 1 > + > #endif /* __LINUX_KVM_H */ > diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h > index 4003a166328cc..5c57796364d65 100644 > --- a/tools/include/uapi/linux/kvm.h > +++ b/tools/include/uapi/linux/kvm.h > @@ -264,6 +264,7 @@ struct kvm_xen_exit { > #define KVM_EXIT_RISCV_SBI 35 > #define KVM_EXIT_RISCV_CSR 36 > #define KVM_EXIT_NOTIFY 37 > +#define KVM_EXIT_MEMORY_FAULT 38 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -505,6 +506,16 @@ struct kvm_run { > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) > __u32 flags; > } notify; > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > + /* > + * Indicates a memory fault on the guest physical address range > + * [gpa, gpa + len). flags is always zero for now. > + */ > + __u64 flags; > + __u64 gpa; > + __u64 len; /* in bytes */ > + } memory_fault; > /* Fix the size of the union. */ > char padding[256]; > }; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index cf7d3de6f3689..f3effc93cbef3 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1142,6 +1142,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) > spin_lock_init(&kvm->mn_invalidate_lock); > rcuwait_init(&kvm->mn_memslots_update_rcuwait); > xa_init(&kvm->vcpu_array); > + kvm->fill_efault_info = false; > > INIT_LIST_HEAD(&kvm->gpc_list); > spin_lock_init(&kvm->gpc_lock); > @@ -4096,6 +4097,8 @@ static long kvm_vcpu_ioctl(struct file *filp, > put_pid(oldpid); > } > r = kvm_arch_vcpu_ioctl_run(vcpu); > + WARN_ON_ONCE(r == -EFAULT && > + vcpu->run->exit_reason != KVM_EXIT_MEMORY_FAULT); > trace_kvm_userspace_exit(vcpu->run->exit_reason, r); > break; > } > @@ -4672,6 +4675,15 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, > > return r; > } > + case KVM_CAP_MEMORY_FAULT_INFO: { > + if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap) > + || (cap->args[0] != KVM_MEMORY_FAULT_INFO_ENABLE > + && cap->args[0] != KVM_MEMORY_FAULT_INFO_DISABLE)) { > + return -EINVAL; > + } > + kvm->fill_efault_info = cap->args[0] == KVM_MEMORY_FAULT_INFO_ENABLE; > + return 0; > + } > default: > return kvm_vm_ioctl_enable_cap(kvm, cap); > } > @@ -6173,3 +6185,35 @@ int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn, > > return init_context.err; > } > + > +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu, > + uint64_t gpa, uint64_t len) > +{ > + if (!vcpu->kvm->fill_efault_info) > + return; > + > + preempt_disable(); > + /* > + * Ensure the this vCPU isn't modifying another vCPU's run struct, which > + * would open the door for races between concurrent calls to this > + * function. > + */ > + if (WARN_ON_ONCE(vcpu != __this_cpu_read(kvm_running_vcpu))) > + goto out; Why use WARN_ON_ONCE when there is a clear possiblity of preemption kicking in (with the possibility of vcpu_load/vcpu_put being called in the new task) before preempt_disable() is called in this function ? I think you should use WARN_ON_ONCE only where there is some impossible situation happening, not when there is a possibility of that situation happening as per the kernel code. I think that this WARN_ON_ONCE could make sense if kvm_populate_efault_info() is called from atomic context, but not when you are disabling preemption from this function itself. Basically I don't think there is any way we can guarantee that preemption DOESN'T kick in before the preempt_disable() such that this warning is actually something that deserves to have a kernel WARN_ON_ONCE() warning. Can we get rid of this WARN_ON_ONCE and straightaway jump to the out label if "(vcpu != __this_cpu_read(kvm_running_vcpu))" is true, or please do correct me if I am wrong about something ? > + /* > + * Try not to overwrite an already-populated run struct. > + * This isn't a perfect solution, as there's no guarantee that the exit > + * reason is set before the run struct is populated, but it should prevent > + * at least some bugs. > + */ > + else if (WARN_ON_ONCE(vcpu->run->exit_reason != KVM_EXIT_UNKNOWN)) > + goto out; > + > + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; > + vcpu->run->memory_fault.gpa = gpa; > + vcpu->run->memory_fault.len = len; > + vcpu->run->memory_fault.flags = 0; > + > +out: > + preempt_enable(); > +} > -- > 2.40.0.577.gac1e443424-goog >
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 48fad65568227..f174f43c38d45 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6637,6 +6637,18 @@ array field represents return values. The userspace should update the return values of SBI call before resuming the VCPU. For more details on RISC-V SBI spec refer, https://github.com/riscv/riscv-sbi-doc. +:: + + /* KVM_EXIT_MEMORY_FAULT */ + struct { + __u64 flags; + __u64 gpa; + __u64 len; /* in bytes */ + } memory_fault; + +Indicates a vCPU memory fault on the guest physical address range +[gpa, gpa + len). See KVM_CAP_MEMORY_FAULT_INFO for more details. + :: /* KVM_EXIT_NOTIFY */ @@ -7670,6 +7682,29 @@ This capability is aimed to mitigate the threat that malicious VMs can cause CPU stuck (due to event windows don't open up) and make the CPU unavailable to host or other VMs. +7.34 KVM_CAP_MEMORY_FAULT_INFO +------------------------------ + +:Architectures: x86, arm64 +:Parameters: args[0] - KVM_MEMORY_FAULT_INFO_ENABLE|DISABLE to enable/disable + the capability. +:Returns: 0 on success, or -EINVAL if unsupported or invalid args[0]. + +When enabled, EFAULTs "returned" by KVM_RUN in response to failed vCPU guest +memory accesses may be annotated with additional information. When KVM_RUN +returns an error with errno=EFAULT, userspace may check the exit reason: if it +is KVM_EXIT_MEMORY_FAULT, userspace is then permitted to read the 'memory_fault' +member of the run struct. + +The 'gpa' and 'len' (in bytes) fields describe the range of guest +physical memory to which access failed, i.e. [gpa, gpa + len). 'flags' is +currently always zero. + +NOTE: The implementation of this capability is incomplete. Even with it enabled, +userspace may receive "bare" EFAULTs (i.e. exit reason != +KVM_EXIT_MEMORY_FAULT) from KVM_RUN. These should be considered bugs and +reported to the maintainers. + 8. Other capabilities. ====================== diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index a43e1cb3b7e97..a932346b59f61 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -220,6 +220,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VCPU_ATTRIBUTES: case KVM_CAP_PTP_KVM: case KVM_CAP_ARM_SYSTEM_SUSPEND: + case KVM_CAP_MEMORY_FAULT_INFO: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ca73eb066af81..0925678e741de 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4432,6 +4432,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VAPIC: case KVM_CAP_ENABLE_CAP: case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: + case KVM_CAP_MEMORY_FAULT_INFO: r = 1; break; case KVM_CAP_EXIT_HYPERCALL: diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 90edc16d37e59..776f9713f3921 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -805,6 +805,8 @@ struct kvm { struct notifier_block pm_notifier; #endif char stats_id[KVM_STATS_NAME_SIZE]; + + bool fill_efault_info; }; #define kvm_err(fmt, ...) \ @@ -2277,4 +2279,14 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) /* Max number of entries allowed for each kvm dirty ring */ #define KVM_DIRTY_RING_MAX_ENTRIES 65536 +/* + * Attempts to set the run struct's exit reason to KVM_EXIT_MEMORY_FAULT and + * populate the memory_fault field with the given information. + * + * Does nothing if KVM_CAP_MEMORY_FAULT_INFO is not enabled. WARNs and does + * nothing if the exit reason is not KVM_EXIT_UNKNOWN, or if 'vcpu' is not + * the current running vcpu. + */ +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu, + uint64_t gpa, uint64_t len); #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 4003a166328cc..bc73e8381a2bb 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -264,6 +264,7 @@ struct kvm_xen_exit { #define KVM_EXIT_RISCV_SBI 35 #define KVM_EXIT_RISCV_CSR 36 #define KVM_EXIT_NOTIFY 37 +#define KVM_EXIT_MEMORY_FAULT 38 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -505,6 +506,16 @@ struct kvm_run { #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) __u32 flags; } notify; + /* KVM_EXIT_MEMORY_FAULT */ + struct { + /* + * Indicates a memory fault on the guest physical address range + * [gpa, gpa + len). flags is always zero for now. + */ + __u64 flags; + __u64 gpa; + __u64 len; /* in bytes */ + } memory_fault; /* Fix the size of the union. */ char padding[256]; }; @@ -1184,6 +1195,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224 #define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225 #define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226 +#define KVM_CAP_MEMORY_FAULT_INFO 227 #ifdef KVM_CAP_IRQ_ROUTING @@ -2237,4 +2249,8 @@ struct kvm_s390_zpci_op { /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) +/* flags for KVM_CAP_MEMORY_FAULT_INFO */ +#define KVM_MEMORY_FAULT_INFO_DISABLE 0 +#define KVM_MEMORY_FAULT_INFO_ENABLE 1 + #endif /* __LINUX_KVM_H */ diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index 4003a166328cc..5c57796364d65 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -264,6 +264,7 @@ struct kvm_xen_exit { #define KVM_EXIT_RISCV_SBI 35 #define KVM_EXIT_RISCV_CSR 36 #define KVM_EXIT_NOTIFY 37 +#define KVM_EXIT_MEMORY_FAULT 38 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -505,6 +506,16 @@ struct kvm_run { #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) __u32 flags; } notify; + /* KVM_EXIT_MEMORY_FAULT */ + struct { + /* + * Indicates a memory fault on the guest physical address range + * [gpa, gpa + len). flags is always zero for now. + */ + __u64 flags; + __u64 gpa; + __u64 len; /* in bytes */ + } memory_fault; /* Fix the size of the union. */ char padding[256]; }; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index cf7d3de6f3689..f3effc93cbef3 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1142,6 +1142,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) spin_lock_init(&kvm->mn_invalidate_lock); rcuwait_init(&kvm->mn_memslots_update_rcuwait); xa_init(&kvm->vcpu_array); + kvm->fill_efault_info = false; INIT_LIST_HEAD(&kvm->gpc_list); spin_lock_init(&kvm->gpc_lock); @@ -4096,6 +4097,8 @@ static long kvm_vcpu_ioctl(struct file *filp, put_pid(oldpid); } r = kvm_arch_vcpu_ioctl_run(vcpu); + WARN_ON_ONCE(r == -EFAULT && + vcpu->run->exit_reason != KVM_EXIT_MEMORY_FAULT); trace_kvm_userspace_exit(vcpu->run->exit_reason, r); break; } @@ -4672,6 +4675,15 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, return r; } + case KVM_CAP_MEMORY_FAULT_INFO: { + if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap) + || (cap->args[0] != KVM_MEMORY_FAULT_INFO_ENABLE + && cap->args[0] != KVM_MEMORY_FAULT_INFO_DISABLE)) { + return -EINVAL; + } + kvm->fill_efault_info = cap->args[0] == KVM_MEMORY_FAULT_INFO_ENABLE; + return 0; + } default: return kvm_vm_ioctl_enable_cap(kvm, cap); } @@ -6173,3 +6185,35 @@ int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn, return init_context.err; } + +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu, + uint64_t gpa, uint64_t len) +{ + if (!vcpu->kvm->fill_efault_info) + return; + + preempt_disable(); + /* + * Ensure the this vCPU isn't modifying another vCPU's run struct, which + * would open the door for races between concurrent calls to this + * function. + */ + if (WARN_ON_ONCE(vcpu != __this_cpu_read(kvm_running_vcpu))) + goto out; + /* + * Try not to overwrite an already-populated run struct. + * This isn't a perfect solution, as there's no guarantee that the exit + * reason is set before the run struct is populated, but it should prevent + * at least some bugs. + */ + else if (WARN_ON_ONCE(vcpu->run->exit_reason != KVM_EXIT_UNKNOWN)) + goto out; + + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; + vcpu->run->memory_fault.gpa = gpa; + vcpu->run->memory_fault.len = len; + vcpu->run->memory_fault.flags = 0; + +out: + preempt_enable(); +}
KVM_CAP_MEMORY_FAULT_INFO allows kvm_run to return useful information besides a return value of -1 and errno of EFAULT when a vCPU fails an access to guest memory. Add documentation, updates to the KVM headers, and a helper function (kvm_populate_efault_info) for implementing the capability. Besides simply filling the run struct, kvm_populate_efault_info takes two safety measures a. It tries to prevent concurrent fills on a single vCPU run struct by checking that the run struct being modified corresponds to the currently loaded vCPU. b. It tries to avoid filling an already-populated run struct by checking whether the exit reason has been modified since entry into KVM_RUN. Finally, mark KVM_CAP_MEMORY_FAULT_INFO as available on arm64 and x86, even though EFAULT annotation are currently totally absent. Picking a point to declare the implementation "done" is difficult because 1. Annotations will be performed incrementally in subsequent commits across both core and arch-specific KVM. 2. The initial series will very likely miss some cases which need annotation. Although these omissions are to be fixed in the future, userspace thus still needs to expect and be able to handle unannotated EFAULTs. Given these qualifications, just marking it available here seems the least arbitrary thing to do. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Anish Moorthy <amoorthy@google.com> --- Documentation/virt/kvm/api.rst | 35 +++++++++++++++++++++++++++ arch/arm64/kvm/arm.c | 1 + arch/x86/kvm/x86.c | 1 + include/linux/kvm_host.h | 12 ++++++++++ include/uapi/linux/kvm.h | 16 +++++++++++++ tools/include/uapi/linux/kvm.h | 11 +++++++++ virt/kvm/kvm_main.c | 44 ++++++++++++++++++++++++++++++++++ 7 files changed, 120 insertions(+)