Message ID | 20240215235405.368539-7-amoorthy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improve KVM + userfaultfd performance via KVM_EXIT_MEMORY_FAULTs on stage-2 faults | expand |
On Thu, Feb 15, 2024, Anish Moorthy wrote: > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 9f5d45c49e36..bf7bc21d56ac 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -1353,6 +1353,7 @@ yet and must be cleared on entry. > #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) > #define KVM_MEM_READONLY (1UL << 1) > #define KVM_MEM_GUEST_MEMFD (1UL << 2) > + #define KVM_MEM_EXIT_ON_MISSING (1UL << 3) David M., Before this gets queued anywhere, a few questions related to the generic KVM userfault stuff you're working on: 1. Do you anticipate reusing KVM_MEM_EXIT_ON_MISSING to communicate that a vCPU should exit to userspace, even for guest_memfd? Or are you envisioning the "data invalid" gfn attribute as being a superset? We danced very close to this topic in the PUCK call, but I don't _think_ we ever explicitly talked about whether or not KVM_MEM_EXIT_ON_MISSING would effectively be obsoleted by a KVM_SET_MEMORY_ATTRIBUTES-based "invalid data" flag. I was originally thinking that KVM_MEM_EXIT_ON_MISSING would be re-used, but after re-watching parts of the PUCK recording, e.g. about decoupling KVM from userspace page tables, I suspect past me was wrong. 2. What is your best guess as to when KVM userfault patches will be available, even if only in RFC form? The reason I ask is because Oliver pointed out (off-list) that (a) Google is the primary user for KVM_MEM_EXIT_ON_MISSING, possibly the _only_ user for the forseeable future, and (b) if Google moves on to KVM userfault before ever ingesting KVM_MEM_EXIT_ON_MISSING from upstream, then we'll have effectively added dead code to KVM's eternal ABI.
On 2024-03-08 02:07 PM, Sean Christopherson wrote: > On Thu, Feb 15, 2024, Anish Moorthy wrote: > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index 9f5d45c49e36..bf7bc21d56ac 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -1353,6 +1353,7 @@ yet and must be cleared on entry. > > #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) > > #define KVM_MEM_READONLY (1UL << 1) > > #define KVM_MEM_GUEST_MEMFD (1UL << 2) > > + #define KVM_MEM_EXIT_ON_MISSING (1UL << 3) > > David M., > > Before this gets queued anywhere, a few questions related to the generic KVM > userfault stuff you're working on: > > 1. Do you anticipate reusing KVM_MEM_EXIT_ON_MISSING to communicate that a vCPU > should exit to userspace, even for guest_memfd? Or are you envisioning the > "data invalid" gfn attribute as being a superset? > > We danced very close to this topic in the PUCK call, but I don't _think_ we > ever explicitly talked about whether or not KVM_MEM_EXIT_ON_MISSING would > effectively be obsoleted by a KVM_SET_MEMORY_ATTRIBUTES-based "invalid data" > flag. > > I was originally thinking that KVM_MEM_EXIT_ON_MISSING would be re-used, > but after re-watching parts of the PUCK recording, e.g. about decoupling > KVM from userspace page tables, I suspect past me was wrong. No I don't anticipate reusing KVM_MEM_EXIT_ON_MISSING. The plan is to introduce a new gfn attribute and exit to userspace based on that. I do forsee having an on/off switch for the new attribute, but it wouldn't make sense to reuse KVM_MEM_EXIT_ON_MISSING for that. > > 2. What is your best guess as to when KVM userfault patches will be available, > even if only in RFC form? We're aiming for the end of April for RFC with KVM/ARM support. > > The reason I ask is because Oliver pointed out (off-list) that (a) Google is the > primary user for KVM_MEM_EXIT_ON_MISSING, possibly the _only_ user for the > forseeable future, and (b) if Google moves on to KVM userfault before ever > ingesting KVM_MEM_EXIT_ON_MISSING from upstream, then we'll have effectively > added dead code to KVM's eternal ABI.
Hey, Thanks Sean for bringing this up on the list, didn't have time for a lot of upstream stuffs :) On Fri, Mar 08, 2024 at 04:46:32PM -0800, David Matlack wrote: > On 2024-03-08 02:07 PM, Sean Christopherson wrote: > > On Thu, Feb 15, 2024, Anish Moorthy wrote: > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > > index 9f5d45c49e36..bf7bc21d56ac 100644 > > > --- a/Documentation/virt/kvm/api.rst > > > +++ b/Documentation/virt/kvm/api.rst > > > @@ -1353,6 +1353,7 @@ yet and must be cleared on entry. > > > #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) > > > #define KVM_MEM_READONLY (1UL << 1) > > > #define KVM_MEM_GUEST_MEMFD (1UL << 2) > > > + #define KVM_MEM_EXIT_ON_MISSING (1UL << 3) > > > > David M., > > > > Before this gets queued anywhere, a few questions related to the generic KVM > > userfault stuff you're working on: > > > > 1. Do you anticipate reusing KVM_MEM_EXIT_ON_MISSING to communicate that a vCPU > > should exit to userspace, even for guest_memfd? Or are you envisioning the > > "data invalid" gfn attribute as being a superset? > > > > We danced very close to this topic in the PUCK call, but I don't _think_ we > > ever explicitly talked about whether or not KVM_MEM_EXIT_ON_MISSING would > > effectively be obsoleted by a KVM_SET_MEMORY_ATTRIBUTES-based "invalid data" > > flag. > > > > I was originally thinking that KVM_MEM_EXIT_ON_MISSING would be re-used, > > but after re-watching parts of the PUCK recording, e.g. about decoupling > > KVM from userspace page tables, I suspect past me was wrong. > > No I don't anticipate reusing KVM_MEM_EXIT_ON_MISSING. > > The plan is to introduce a new gfn attribute and exit to userspace based > on that. I do forsee having an on/off switch for the new attribute, but > it wouldn't make sense to reuse KVM_MEM_EXIT_ON_MISSING for that. With that in mind, unless someone else has a usecase for the KVM_MEM_EXIT_ON_MISSING behavior my *strong* preference is that we not take this bit of the series upstream. The "memory fault" UAPI should still be useful when the KVM userfault stuff comes along. Anish, apologies, you must have whiplash from all the bikeshedding, nitpicking, and other fun you've been put through on this series. Thanks for being patient. > > > > 2. What is your best guess as to when KVM userfault patches will be available, > > even if only in RFC form? > > We're aiming for the end of April for RFC with KVM/ARM support. Just to make sure everyone is read in on what this entails -- is this the implementation that only worries about vCPUs touching non-present memory, leaving the question of other UAPIs that consume guest memory (e.g. GIC/ITS table save/restore) up for further discussion?
On Sun, Mar 10, 2024 at 9:46 PM Oliver Upton <oliver.upton@linux.dev> wrote: > > > > > > 2. What is your best guess as to when KVM userfault patches will be available, > > > even if only in RFC form? > > > > We're aiming for the end of April for RFC with KVM/ARM support. > > Just to make sure everyone is read in on what this entails -- is this > the implementation that only worries about vCPUs touching non-present > memory, leaving the question of other UAPIs that consume guest memory > (e.g. GIC/ITS table save/restore) up for further discussion? Yes. The initial version will only support returning to userspace on invalid vCPU accesses with KVM_EXIT_MEMORY_FAULT. Non-vCPU accesses to invalid pages (e.g. GIC/ITS table save/restore) will trigger an error return from __gfn_to_hva_many() (which will cause the corresponding ioctl to fail). It will be userspace's responsibility to clear the invalid attribute before invoking those ioctls. For x86 we may need an blocking kernel-to-userspace notification mechanism for code paths in the emulator, but we'd like to investigate and discuss if there are any other cleaner alternatives before going too far down that route.
On Sun, Mar 10, 2024, Oliver Upton wrote: > On Fri, Mar 08, 2024 at 04:46:32PM -0800, David Matlack wrote: > > On 2024-03-08 02:07 PM, Sean Christopherson wrote: > > > On Thu, Feb 15, 2024, Anish Moorthy wrote: > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > > > index 9f5d45c49e36..bf7bc21d56ac 100644 > > > > --- a/Documentation/virt/kvm/api.rst > > > > +++ b/Documentation/virt/kvm/api.rst > > > > @@ -1353,6 +1353,7 @@ yet and must be cleared on entry. > > > > #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) > > > > #define KVM_MEM_READONLY (1UL << 1) > > > > #define KVM_MEM_GUEST_MEMFD (1UL << 2) > > > > + #define KVM_MEM_EXIT_ON_MISSING (1UL << 3) > > > > > > David M., > > > > > > Before this gets queued anywhere, a few questions related to the generic KVM > > > userfault stuff you're working on: > > > > > > 1. Do you anticipate reusing KVM_MEM_EXIT_ON_MISSING to communicate that a vCPU > > > should exit to userspace, even for guest_memfd? Or are you envisioning the > > > "data invalid" gfn attribute as being a superset? > > > > > > We danced very close to this topic in the PUCK call, but I don't _think_ we > > > ever explicitly talked about whether or not KVM_MEM_EXIT_ON_MISSING would > > > effectively be obsoleted by a KVM_SET_MEMORY_ATTRIBUTES-based "invalid data" > > > flag. > > > > > > I was originally thinking that KVM_MEM_EXIT_ON_MISSING would be re-used, > > > but after re-watching parts of the PUCK recording, e.g. about decoupling > > > KVM from userspace page tables, I suspect past me was wrong. > > > > No I don't anticipate reusing KVM_MEM_EXIT_ON_MISSING. > > > > The plan is to introduce a new gfn attribute and exit to userspace based > > on that. I do forsee having an on/off switch for the new attribute, but > > it wouldn't make sense to reuse KVM_MEM_EXIT_ON_MISSING for that. > > With that in mind, unless someone else has a usecase for the > KVM_MEM_EXIT_ON_MISSING behavior my *strong* preference is that we not > take this bit of the series upstream. The "memory fault" UAPI should > still be useful when the KVM userfault stuff comes along. +1 Though I'll go a step further and say that even if someone does have a use case, we should still wait. The imminent collision with David Steven's kvm_follow_pfn() series[*] is going to be a painful rebase no matter what, and once that's out of the way, rebasing this series onto future kernels shouldn't be crazy difficult. In other words, _if_ it turns out there's value in KVM_MEM_EXIT_ON_MISSING even with David M's work, the cost of waiting another cycle (or two) is relatively small. Oh, and I'll plan on grabbing patches 1-4 for 6.10. [*]https://lore.kernel.org/all/20240229025759.1187910-1-stevensd@google.com
On Sun, Mar 10, 2024 at 9:46 PM Oliver Upton <oliver.upton@linux.dev> wrote: > > Hey, > > Thanks Sean for bringing this up on the list, didn't have time for a lot > of upstream stuffs :) > > On Fri, Mar 08, 2024 at 04:46:32PM -0800, David Matlack wrote: > > On 2024-03-08 02:07 PM, Sean Christopherson wrote: > > > On Thu, Feb 15, 2024, Anish Moorthy wrote: > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > > > index 9f5d45c49e36..bf7bc21d56ac 100644 > > > > --- a/Documentation/virt/kvm/api.rst > > > > +++ b/Documentation/virt/kvm/api.rst > > > > @@ -1353,6 +1353,7 @@ yet and must be cleared on entry. > > > > #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) > > > > #define KVM_MEM_READONLY (1UL << 1) > > > > #define KVM_MEM_GUEST_MEMFD (1UL << 2) > > > > + #define KVM_MEM_EXIT_ON_MISSING (1UL << 3) > > > > > > David M., > > > > > > Before this gets queued anywhere, a few questions related to the generic KVM > > > userfault stuff you're working on: > > > > > > 1. Do you anticipate reusing KVM_MEM_EXIT_ON_MISSING to communicate that a vCPU > > > should exit to userspace, even for guest_memfd? Or are you envisioning the > > > "data invalid" gfn attribute as being a superset? > > > > > > We danced very close to this topic in the PUCK call, but I don't _think_ we > > > ever explicitly talked about whether or not KVM_MEM_EXIT_ON_MISSING would > > > effectively be obsoleted by a KVM_SET_MEMORY_ATTRIBUTES-based "invalid data" > > > flag. > > > > > > I was originally thinking that KVM_MEM_EXIT_ON_MISSING would be re-used, > > > but after re-watching parts of the PUCK recording, e.g. about decoupling > > > KVM from userspace page tables, I suspect past me was wrong. > > > > No I don't anticipate reusing KVM_MEM_EXIT_ON_MISSING. > > > > The plan is to introduce a new gfn attribute and exit to userspace based > > on that. I do forsee having an on/off switch for the new attribute, but > > it wouldn't make sense to reuse KVM_MEM_EXIT_ON_MISSING for that. > > With that in mind, unless someone else has a usecase for the > KVM_MEM_EXIT_ON_MISSING behavior my *strong* preference is that we not > take this bit of the series upstream. The "memory fault" UAPI should > still be useful when the KVM userfault stuff comes along. > > Anish, apologies, you must have whiplash from all the bikeshedding, > nitpicking, and other fun you've been put through on this series. Thanks > for being patient. No worries- I got a lot of patient (and much-needed) review as well :). And I understand not wanting to add an eternal feature when something better is coming down the line. On Mon, Mar 11, 2024 at 9:36 AM Sean Christopherson <seanjc@google.com> wrote: > > Oh, and I'll plan on grabbing patches 1-4 for 6.10. I think patches 10/11/12 are useful changes to the selftest that make sense to merge even with KVM_MEM_EXIT_ON_MISSING being mothballed- they should rebase without any issues. And the annotations on the stage-2 fault handlers seem like they should still be added, but I suppose David can do that with his series.
On Mon, Mar 11, 2024 at 10:08:56AM -0700, Anish Moorthy wrote: > I think patches 10/11/12 are useful changes to the selftest that make > sense to merge even with KVM_MEM_EXIT_ON_MISSING being mothballed- > they should rebase without any issues. And the annotations on the > stage-2 fault handlers seem like they should still be added, but I > suppose David can do that with his series. Yeah, let's fold the vCPU exit portions of the UAPI into the overall KVM userfault series. In that case there is sufficient context at the time of the "memory fault" to generate a 'precise' fault context (this GFN failed since it isn't marked as present). Compare that to the current implementation, which actually annotates _any_ __gfn_to_pfn_memslot() failures on the way out to userspace. I haven't seen anyone saying their userspace wants to use this, and I'd rather not take a new feature without a user, even if it is comparably trivial.
Hi David, On 11/03/2024 16:20, David Matlack wrote: > On Sun, Mar 10, 2024 at 9:46 PM Oliver Upton <oliver.upton@linux.dev> wrote: >>>> >>>> 2. What is your best guess as to when KVM userfault patches will be available, >>>> even if only in RFC form? >>> >>> We're aiming for the end of April for RFC with KVM/ARM support. >> >> Just to make sure everyone is read in on what this entails -- is this >> the implementation that only worries about vCPUs touching non-present >> memory, leaving the question of other UAPIs that consume guest memory >> (e.g. GIC/ITS table save/restore) up for further discussion? > > Yes. The initial version will only support returning to userspace on > invalid vCPU accesses with KVM_EXIT_MEMORY_FAULT. Non-vCPU accesses to > invalid pages (e.g. GIC/ITS table save/restore) will trigger an error > return from __gfn_to_hva_many() (which will cause the corresponding > ioctl to fail). It will be userspace's responsibility to clear the > invalid attribute before invoking those ioctls. > > For x86 we may need an blocking kernel-to-userspace notification > mechanism for code paths in the emulator, but we'd like to investigate > and discuss if there are any other cleaner alternatives before going > too far down that route. I wasn't able to locate any follow-ups on the LKML about this topic. May I know if you are still working on or planning to work on this? Thanks, Nikita
On Wed, Jul 3, 2024 at 10:35 AM Nikita Kalyazin <kalyazin@amazon.com> wrote: > > Hi David, > > On 11/03/2024 16:20, David Matlack wrote: > > On Sun, Mar 10, 2024 at 9:46 PM Oliver Upton <oliver.upton@linux.dev> wrote: > >>>> > >>>> 2. What is your best guess as to when KVM userfault patches will be available, > >>>> even if only in RFC form? > >>> > >>> We're aiming for the end of April for RFC with KVM/ARM support. > >> > >> Just to make sure everyone is read in on what this entails -- is this > >> the implementation that only worries about vCPUs touching non-present > >> memory, leaving the question of other UAPIs that consume guest memory > >> (e.g. GIC/ITS table save/restore) up for further discussion? > > > > Yes. The initial version will only support returning to userspace on > > invalid vCPU accesses with KVM_EXIT_MEMORY_FAULT. Non-vCPU accesses to > > invalid pages (e.g. GIC/ITS table save/restore) will trigger an error > > return from __gfn_to_hva_many() (which will cause the corresponding > > ioctl to fail). It will be userspace's responsibility to clear the > > invalid attribute before invoking those ioctls. > > > > For x86 we may need an blocking kernel-to-userspace notification > > mechanism for code paths in the emulator, but we'd like to investigate > > and discuss if there are any other cleaner alternatives before going > > too far down that route. > > I wasn't able to locate any follow-ups on the LKML about this topic. > May I know if you are still working on or planning to work on this? Yes, James Houghton at Google has been working on this. We decided to build a more complete RFC (with x86 and ARM) support, so that reviewers can get an idea of the full scope of the feature, so it has taken a bit longer than originally planned. But the RFC is code complete now. I think James is planning to send the patches next week.
On 03/07/2024 21:11, David Matlack wrote: > Yes, James Houghton at Google has been working on this. We decided to > build a more complete RFC (with x86 and ARM) support, so that > reviewers can get an idea of the full scope of the feature, so it has > taken a bit longer than originally planned. But the RFC is code > complete now. I think James is planning to send the patches next week. Great to hear, looking forward to seeing it!
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 9f5d45c49e36..bf7bc21d56ac 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -1353,6 +1353,7 @@ yet and must be cleared on entry. #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) #define KVM_MEM_GUEST_MEMFD (1UL << 2) + #define KVM_MEM_EXIT_ON_MISSING (1UL << 3) This ioctl allows the user to create, modify or delete a guest physical memory slot. Bits 0-15 of "slot" specify the slot id and this value @@ -1383,7 +1384,7 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr be identical. This allows large pages in the guest to be backed by large pages in the host. -The flags field supports three flags +The flags field supports four flags 1. KVM_MEM_LOG_DIRTY_PAGES: can be set to instruct KVM to keep track of writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to @@ -1393,6 +1394,7 @@ to make a new slot read-only. In this case, writes to this memory will be posted to userspace as KVM_EXIT_MMIO exits. 3. KVM_MEM_GUEST_MEMFD: see KVM_SET_USER_MEMORY_REGION2. This flag is incompatible with KVM_SET_USER_MEMORY_REGION. +4. KVM_MEM_EXIT_ON_MISSING: see KVM_CAP_EXIT_ON_MISSING for details. When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of the memory region are automatically reflected into the guest. For example, an @@ -1408,6 +1410,9 @@ Instead, an abort (data abort if the cause of the page-table update was a load or a store, instruction abort if it was an instruction fetch) is injected in the guest. +Note: KVM_MEM_READONLY and KVM_MEM_EXIT_ON_MISSING are currently mutually +exclusive. + 4.36 KVM_SET_TSS_ADDR --------------------- @@ -8044,6 +8049,22 @@ error/annotated fault. See KVM_EXIT_MEMORY_FAULT for more information. +7.35 KVM_CAP_EXIT_ON_MISSING +---------------------------- + +:Architectures: None +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP. + +The presence of this capability indicates that userspace may set the +KVM_MEM_EXIT_ON_MISSING flag on memslots. Said flag will cause KVM_RUN to fail +(-EFAULT) in response to guest-context memory accesses which would require KVM +to page fault on the userspace mapping. + +The range of guest physical memory causing the fault is advertised to userspace +through KVM_CAP_MEMORY_FAULT_INFO. Userspace should take appropriate action. +This could mean, for instance, checking that the fault is resolvable, faulting +in the relevant userspace mapping, then retrying KVM_RUN. + 8. Other capabilities. ====================== diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index d14504821b79..dfe0cbb5937c 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1487,7 +1487,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mmap_read_unlock(current->mm); pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, - write_fault, &writable, NULL); + write_fault, &writable, false, NULL); if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); return 0; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 2b1f0cdd8c18..31ebfe4fe8e1 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -614,7 +614,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, } else { /* Call KVM generic code to do the slow-path check */ pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, - writing, &write_ok, NULL); + writing, &write_ok, false, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; page = NULL; diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 4a1abb9f7c05..03b0f1c4a0d8 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -853,7 +853,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, /* Call KVM generic code to do the slow-path check */ pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, - writing, upgrade_p, NULL); + writing, upgrade_p, false, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; page = NULL; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2d6cdeab1f8a..b89a9518f6de 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4371,7 +4371,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async, fault->write, &fault->map_writable, - &fault->hva); + false, &fault->hva); if (!async) return RET_PF_CONTINUE; /* *pfn has correct page already */ @@ -4393,7 +4393,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault */ fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true, NULL, fault->write, &fault->map_writable, - &fault->hva); + false, &fault->hva); return RET_PF_CONTINUE; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 32cbe5c3a9d1..210e07c4c2eb 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1216,7 +1216,8 @@ kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn); kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, bool atomic, bool interruptible, bool *async, - bool write_fault, bool *writable, hva_t *hva); + bool write_fault, bool *writable, + bool can_exit_on_missing, hva_t *hva); void kvm_release_pfn_clean(kvm_pfn_t pfn); void kvm_release_pfn_dirty(kvm_pfn_t pfn); @@ -2394,4 +2395,13 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, } #endif /* CONFIG_KVM_PRIVATE_MEM */ +/* + * Whether vCPUs should exit upon trying to access memory for which the + * userspace mappings are missing. + */ +static inline bool kvm_is_slot_exit_on_missing(const struct kvm_memory_slot *slot) +{ + return slot && slot->flags & KVM_MEM_EXIT_ON_MISSING; +} + #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 36a51b162a71..e9f33ae93dee 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -51,6 +51,7 @@ struct kvm_userspace_memory_region2 { #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) #define KVM_MEM_GUEST_MEMFD (1UL << 2) +#define KVM_MEM_EXIT_ON_MISSING (1UL << 3) /* for KVM_IRQ_LINE */ struct kvm_irq_level { @@ -920,6 +921,7 @@ struct kvm_enable_cap { #define KVM_CAP_MEMORY_ATTRIBUTES 233 #define KVM_CAP_GUEST_MEMFD 234 #define KVM_CAP_VM_TYPES 235 +#define KVM_CAP_EXIT_ON_MISSING 236 struct kvm_irq_routing_irqchip { __u32 irqchip; diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 29b73eedfe74..c7bdde127af4 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -109,3 +109,6 @@ config KVM_GENERIC_PRIVATE_MEM select KVM_GENERIC_MEMORY_ATTRIBUTES select KVM_PRIVATE_MEM bool + +config HAVE_KVM_EXIT_ON_MISSING + bool diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 67ca580a18c5..469b99898be8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1600,7 +1600,7 @@ static void kvm_replace_memslot(struct kvm *kvm, * only allows these. */ #define KVM_SET_USER_MEMORY_REGION_V1_FLAGS \ - (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY) + (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY | KVM_MEM_EXIT_ON_MISSING) static int check_memory_region_flags(struct kvm *kvm, const struct kvm_userspace_memory_region2 *mem) @@ -1618,8 +1618,14 @@ static int check_memory_region_flags(struct kvm *kvm, valid_flags |= KVM_MEM_READONLY; #endif + if (IS_ENABLED(CONFIG_HAVE_KVM_EXIT_ON_MISSING)) + valid_flags |= KVM_MEM_EXIT_ON_MISSING; + if (mem->flags & ~valid_flags) return -EINVAL; + else if ((mem->flags & KVM_MEM_READONLY) && + (mem->flags & KVM_MEM_EXIT_ON_MISSING)) + return -EINVAL; return 0; } @@ -3024,7 +3030,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, bool atomic, bool interruptible, bool *async, - bool write_fault, bool *writable, hva_t *hva) + bool write_fault, bool *writable, + bool can_exit_on_missing, hva_t *hva) { unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault); @@ -3047,6 +3054,19 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, writable = NULL; } + /* When the slot is exit-on-missing (and when we should respect that) + * set atomic=true to prevent GUP from faulting in the userspace + * mappings. + */ + if (!atomic && can_exit_on_missing && + kvm_is_slot_exit_on_missing(slot)) { + atomic = true; + if (async) { + *async = false; + async = NULL; + } + } + return hva_to_pfn(addr, atomic, interruptible, async, write_fault, writable); } @@ -3056,21 +3076,21 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable) { return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false, - NULL, write_fault, writable, NULL); + NULL, write_fault, writable, false, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn) { return __gfn_to_pfn_memslot(slot, gfn, false, false, NULL, true, - NULL, NULL); + NULL, false, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn) { return __gfn_to_pfn_memslot(slot, gfn, true, false, NULL, true, - NULL, NULL); + NULL, false, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic); @@ -4877,6 +4897,8 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) case KVM_CAP_GUEST_MEMFD: return !kvm || kvm_arch_has_private_mem(kvm); #endif + case KVM_CAP_EXIT_ON_MISSING: + return IS_ENABLED(CONFIG_HAVE_KVM_EXIT_ON_MISSING); default: break; }
Allowing KVM to fault in pages during vcpu-context guest memory accesses can be undesirable: during userfaultfd-based postcopy, it can cause significant performance issues due to vCPUs contending for userfaultfd-internal locks. Add a new memslot flag (KVM_MEM_EXIT_ON_MISSING) through which userspace can indicate that KVM_RUN should exit instead of faulting in pages during vcpu-context guest memory accesses. The unfaulted pages are reported by the accompanying KVM_EXIT_MEMORY_FAULT_INFO, allowing userspace to determine and take appropriate action. The basic implementation strategy is to check the memslot flag from within __gfn_to_pfn_memslot() and override the caller-provided arguments accordingly. Some callers (such as kvm_vcpu_map()) must be able to opt out of this behavior, and do so by passing can_exit_on_missing=false. No functional change intended: nothing sets KVM_MEM_EXIT_ON_MISSING or passes can_exit_on_missing=true to __gfn_to_pfn_memslot(). Suggested-by: James Houghton <jthoughton@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Anish Moorthy <amoorthy@google.com> --- Documentation/virt/kvm/api.rst | 23 +++++++++++++++++- arch/arm64/kvm/mmu.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- arch/x86/kvm/mmu/mmu.c | 4 ++-- include/linux/kvm_host.h | 12 +++++++++- include/uapi/linux/kvm.h | 2 ++ virt/kvm/Kconfig | 3 +++ virt/kvm/kvm_main.c | 32 ++++++++++++++++++++++---- 9 files changed, 70 insertions(+), 12 deletions(-)