Message ID | 20241204191349.1730936-7-jthoughton@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: Introduce KVM Userfault | expand |
Hi James, On Wed, Dec 04, 2024 at 07:13:41PM +0000, James Houghton wrote: > Adhering to the requirements of KVM Userfault: > > 1. When it is toggled (either on or off), zap the second stage with > kvm_arch_flush_shadow_memslot(). This is to (1) respect > userfault-ness and (2) to reconstruct block mappings. > 2. While KVM_MEM_USERFAULT is enabled, restrict new second-stage mappings > to be PAGE_SIZE, just like when dirty logging is enabled. > > Signed-off-by: James Houghton <jthoughton@google.com> > --- > I'm not 100% sure if kvm_arch_flush_shadow_memslot() is correct in > this case (like if the host does not have S2FWB). Invalidating the stage-2 entries is of course necessary for correctness on the !USERFAULT -> USERFAULT transition, and the MMU will do the right thing regardless of whether hardware implements FEAT_S2FWB. What I think you may be getting at is the *performance* implications are quite worrying without FEAT_S2FWB due to the storm of CMOs, and I'd definitely agree with that. > @@ -2062,6 +2069,20 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, > enum kvm_mr_change change) > { > bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES; > + u32 changed_flags = (new ? new->flags : 0) ^ (old ? old->flags : 0); > + > + /* > + * If KVM_MEM_USERFAULT changed, drop all the stage-2 mappings so that > + * we can (1) respect userfault-ness or (2) create block mappings. > + */ > + if ((changed_flags & KVM_MEM_USERFAULT) && change == KVM_MR_FLAGS_ONLY) > + kvm_arch_flush_shadow_memslot(kvm, old); I'd strongly prefer that we make (2) a userspace problem and don't eagerly invalidate stage-2 mappings on the USERFAULT -> !USERFAULT change. Having implied user-visible behaviors on ioctls is never good, and for systems without FEAT_S2FWB you might be better off avoiding the unmap in the first place. So, if userspace decides there's a benefit to invalidating the stage-2 MMU, it can just delete + recreate the memslot.
On Wed, Dec 4, 2024 at 3:07 PM Oliver Upton <oliver.upton@linux.dev> wrote: > > Hi James, Hi Oliver! > > On Wed, Dec 04, 2024 at 07:13:41PM +0000, James Houghton wrote: > > Adhering to the requirements of KVM Userfault: > > > > 1. When it is toggled (either on or off), zap the second stage with > > kvm_arch_flush_shadow_memslot(). This is to (1) respect > > userfault-ness and (2) to reconstruct block mappings. > > 2. While KVM_MEM_USERFAULT is enabled, restrict new second-stage mappings > > to be PAGE_SIZE, just like when dirty logging is enabled. > > > > Signed-off-by: James Houghton <jthoughton@google.com> > > --- > > I'm not 100% sure if kvm_arch_flush_shadow_memslot() is correct in > > this case (like if the host does not have S2FWB). > > Invalidating the stage-2 entries is of course necessary for correctness > on the !USERFAULT -> USERFAULT transition, and the MMU will do the right > thing regardless of whether hardware implements FEAT_S2FWB. > > What I think you may be getting at is the *performance* implications are > quite worrying without FEAT_S2FWB due to the storm of CMOs, and I'd > definitely agree with that. Thanks for clarifying that for me. > > @@ -2062,6 +2069,20 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, > > enum kvm_mr_change change) > > { > > bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES; > > + u32 changed_flags = (new ? new->flags : 0) ^ (old ? old->flags : 0); > > + > > + /* > > + * If KVM_MEM_USERFAULT changed, drop all the stage-2 mappings so that > > + * we can (1) respect userfault-ness or (2) create block mappings. > > + */ > > + if ((changed_flags & KVM_MEM_USERFAULT) && change == KVM_MR_FLAGS_ONLY) > > + kvm_arch_flush_shadow_memslot(kvm, old); > > I'd strongly prefer that we make (2) a userspace problem and don't > eagerly invalidate stage-2 mappings on the USERFAULT -> !USERFAULT > change. > > Having implied user-visible behaviors on ioctls is never good, and for > systems without FEAT_S2FWB you might be better off avoiding the unmap in > the first place. > > So, if userspace decides there's a benefit to invalidating the stage-2 > MMU, it can just delete + recreate the memslot. Ok I think that's reasonable. So for USERFAULT -> !USERFAULT, I'll just follow the precedent set by dirty logging. For x86 today, we collapse the mappings, and for arm64 we do not. Is arm64 ever going to support collapsing back to huge mappings after dirty logging is disabled?
On Thu, Dec 05, 2024 at 03:31:05PM -0800, James Houghton wrote: > > > @@ -2062,6 +2069,20 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, > > > enum kvm_mr_change change) > > > { > > > bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES; > > > + u32 changed_flags = (new ? new->flags : 0) ^ (old ? old->flags : 0); > > > + > > > + /* > > > + * If KVM_MEM_USERFAULT changed, drop all the stage-2 mappings so that > > > + * we can (1) respect userfault-ness or (2) create block mappings. > > > + */ > > > + if ((changed_flags & KVM_MEM_USERFAULT) && change == KVM_MR_FLAGS_ONLY) > > > + kvm_arch_flush_shadow_memslot(kvm, old); > > > > I'd strongly prefer that we make (2) a userspace problem and don't > > eagerly invalidate stage-2 mappings on the USERFAULT -> !USERFAULT > > change. > > > > Having implied user-visible behaviors on ioctls is never good, and for > > systems without FEAT_S2FWB you might be better off avoiding the unmap in > > the first place. > > > > So, if userspace decides there's a benefit to invalidating the stage-2 > > MMU, it can just delete + recreate the memslot. > > Ok I think that's reasonable. So for USERFAULT -> !USERFAULT, I'll > just follow the precedent set by dirty logging. For x86 today, we > collapse the mappings, and for arm64 we do not. > > Is arm64 ever going to support collapsing back to huge mappings after > dirty logging is disabled? Patches on list is always a good place to start :) What I'd expect on FEAT_S2FWB hardware is that invalidating the whole stage-2 and faulting back in block entries would give the best experience. Only in the case of !FWB would a literal table -> block collapse be beneficial, as the MMU could potentially elide CMOs when remapping. But that assumes you're starting with a fully-mapped table and there are no holes that are "out of sync" with the guest.
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index ead632ad01b4..d89b4088b580 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -38,6 +38,7 @@ menuconfig KVM select HAVE_KVM_VCPU_RUN_PID_CHANGE select SCHED_INFO select GUEST_PERF_EVENTS if PERF_EVENTS + select HAVE_KVM_USERFAULT help Support hosting virtualized guest machines. diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index a71fe6f6bd90..53cee0bacb75 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1482,7 +1482,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * logging_active is guaranteed to never be true for VM_PFNMAP * memslots. */ - if (logging_active) { + if (logging_active || kvm_memslot_userfault(memslot)) { force_pte = true; vma_shift = PAGE_SHIFT; } else { @@ -1571,6 +1571,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mmu_seq = vcpu->kvm->mmu_invalidate_seq; mmap_read_unlock(current->mm); + if (kvm_gfn_userfault(kvm, memslot, gfn)) { + kvm_prepare_memory_fault_exit(vcpu, gfn << PAGE_SHIFT, + PAGE_SIZE, write_fault, + exec_fault, false, true); + return -EFAULT; + } + pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0, &writable, &page); if (pfn == KVM_PFN_ERR_HWPOISON) { @@ -2062,6 +2069,20 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, enum kvm_mr_change change) { bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES; + u32 changed_flags = (new ? new->flags : 0) ^ (old ? old->flags : 0); + + /* + * If KVM_MEM_USERFAULT changed, drop all the stage-2 mappings so that + * we can (1) respect userfault-ness or (2) create block mappings. + */ + if ((changed_flags & KVM_MEM_USERFAULT) && change == KVM_MR_FLAGS_ONLY) + kvm_arch_flush_shadow_memslot(kvm, old); + + /* + * Nothing left to do if not toggling dirty logging. + */ + if (!(changed_flags & KVM_MEM_LOG_DIRTY_PAGES)) + return; /* * At this point memslot has been committed and there is an
Adhering to the requirements of KVM Userfault: 1. When it is toggled (either on or off), zap the second stage with kvm_arch_flush_shadow_memslot(). This is to (1) respect userfault-ness and (2) to reconstruct block mappings. 2. While KVM_MEM_USERFAULT is enabled, restrict new second-stage mappings to be PAGE_SIZE, just like when dirty logging is enabled. Signed-off-by: James Houghton <jthoughton@google.com> --- I'm not 100% sure if kvm_arch_flush_shadow_memslot() is correct in this case (like if the host does not have S2FWB). --- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/mmu.c | 23 ++++++++++++++++++++++- 2 files changed, 23 insertions(+), 1 deletion(-)