Message ID | 20230602161921.208564-9-amoorthy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improve scalability of KVM + userfaultfd live migration via annotated memory faults. | expand |
On Fri, Jun 02, 2023, Anish Moorthy wrote: > Implement KVM_CAP_MEMORY_FAULT_INFO for efaults generated by > kvm_handle_error_pfn(). > > Signed-off-by: Anish Moorthy <amoorthy@google.com> > --- > arch/x86/kvm/mmu/mmu.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index c8961f45e3b1..cb71aae9aaec 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3291,6 +3291,10 @@ static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn) > > static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > { > + uint64_t rounded_gfn; gfn_t, and probably no need to specificy "rounded", let the code do the talking. > + uint64_t fault_size; > + uint64_t fault_flags; With a few exceptions that we should kill off, KVM uses "u64", not "uint64_t". Though arguably the "size" should be gfn_t too. And these can all go on a single line, e.g. u64 fault_size, fault_flags; Though since the kvm_run.memory_fault field and the param to the helper are "len", a simple "len" here is better IMO. And since this is not remotely performance sensitive, I wouldn't bother deferring the initialization, e.g. gfn_t gfn = gfn_round_for_level(fault->gfn, fault->goal_level); gfn_t len = KVM_HPAGE_SIZE(fault->goal_level); u64 fault_flags; All that said, consuming fault->goal_level is unnecessary, and not be coincidence. The *only* time KVM should bail to userspace is if KVM failed to faultin a 4KiB page. Providing a hugepage is done opportunistically, it's never a hard requirement. So throw away all of the above and see below. > + > if (is_sigpending_pfn(fault->pfn)) { > kvm_handle_signal_exit(vcpu); > return -EINTR; > @@ -3309,6 +3313,15 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa > return RET_PF_RETRY; > } > > + fault_size = KVM_HPAGE_SIZE(fault->goal_level); > + rounded_gfn = round_down(fault->gfn * PAGE_SIZE, fault_size); We have a helper for this too, gfn_round_for_level(). Ooh, but this isn't storing a gfn, it's storing a gpa. Naughty, naughty. > + > + fault_flags = 0; > + if (fault->write) > + fault_flags |= KVM_MEMORY_FAULT_FLAG_WRITE; > + if (fault->exec) exec and write are mutually exclusive. There's even documented precedent for this in page_fault_can_be_fast(). So with a READ flag, this can be if (fault->write) fault_flags = KVM_MEMORY_FAULT_FLAG_WRITE; else if (fault->exec) fault_flags = KVM_MEMORY_FAULT_FLAG_EXEC; else fault_flags = KVM_MEMORY_FAULT_FLAG_READ; Or as Paolo would probably write it ;-) fault_flags = (fault->write & 1) << KVM_MEMORY_FAULT_FLAG_WRITE_SHIFT | (fault->exec & 1) << KVM_MEMORY_FAULT_FLAG_EXEC_SHIFT | (!fault->write && !fault->exec) << KVM_MEMORY_FAULT_FLAG_READ_SHIFT; (that was a joke, don't actually do that) > + fault_flags |= KVM_MEMORY_FAULT_FLAG_EXEC; > + kvm_populate_efault_info(vcpu, rounded_gfn, fault_size, fault_flags); This is where passing a "gfn" variable as a "gpa" looks broken. > return -EFAULT; All in all, something like this? u64 fault_flags; <other error handling> /* comment goes here */ WARN_ON_ONCE(fault->goal_level != PG_LEVEL_4K); if (fault->write) fault_flags = KVM_MEMORY_FAULT_FLAG_WRITE; else if (fault->exec) fault_flags = KVM_MEMORY_FAULT_FLAG_EXEC; else fault_flags = KVM_MEMORY_FAULT_FLAG_READ; kvm_handle_blahblahblah_fault(vcpu, gfn_to_gpa(fault->gfn), PAGE_SIZE, fault_flags);
On 6/3/2023 12:19 AM, Anish Moorthy wrote: > Implement KVM_CAP_MEMORY_FAULT_INFO for efaults generated by > kvm_handle_error_pfn(). > > Signed-off-by: Anish Moorthy <amoorthy@google.com> > --- > arch/x86/kvm/mmu/mmu.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index c8961f45e3b1..cb71aae9aaec 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3291,6 +3291,10 @@ static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn) > > static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > { > + uint64_t rounded_gfn; > + uint64_t fault_size; > + uint64_t fault_flags; > + > if (is_sigpending_pfn(fault->pfn)) { > kvm_handle_signal_exit(vcpu); > return -EINTR; > @@ -3309,6 +3313,15 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa > return RET_PF_RETRY; > } > > + fault_size = KVM_HPAGE_SIZE(fault->goal_level); IIUC, here fault->goal_level is always PG_LEVEL_4K. goal_level could be adjusted in later kvm_tdp_mmu_map() --> kvm_mmu_hugepage_adjust(), if kvm_faultin_pfn() doesn't fail, that is to say, code path doesn't go through here. I wonder, if you would like put (kind of) kvm_mmu_hugepage_adjust() here as well, reporting to user space the maximum map size it could do with, OR, just report 4K size, let user space itself to detect/decide max possible size (but now I've no idea how to). > + rounded_gfn = round_down(fault->gfn * PAGE_SIZE, fault_size); > + > + fault_flags = 0; > + if (fault->write) > + fault_flags |= KVM_MEMORY_FAULT_FLAG_WRITE; > + if (fault->exec) > + fault_flags |= KVM_MEMORY_FAULT_FLAG_EXEC; > + kvm_populate_efault_info(vcpu, rounded_gfn, fault_size, fault_flags); > return -EFAULT; > } >
On Thu, Jun 15, 2023, Robert Hoo wrote: > On 6/3/2023 12:19 AM, Anish Moorthy wrote: > > Implement KVM_CAP_MEMORY_FAULT_INFO for efaults generated by > > kvm_handle_error_pfn(). > > > > Signed-off-by: Anish Moorthy <amoorthy@google.com> > > --- > > arch/x86/kvm/mmu/mmu.c | 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index c8961f45e3b1..cb71aae9aaec 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -3291,6 +3291,10 @@ static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn) > > static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > { > > + uint64_t rounded_gfn; > > + uint64_t fault_size; > > + uint64_t fault_flags; > > + > > if (is_sigpending_pfn(fault->pfn)) { > > kvm_handle_signal_exit(vcpu); > > return -EINTR; > > @@ -3309,6 +3313,15 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa > > return RET_PF_RETRY; > > } > > + fault_size = KVM_HPAGE_SIZE(fault->goal_level); > > IIUC, here fault->goal_level is always PG_LEVEL_4K. > goal_level could be adjusted in later kvm_tdp_mmu_map() --> > kvm_mmu_hugepage_adjust(), if kvm_faultin_pfn() doesn't fail, that is to > say, code path doesn't go through here. > > I wonder, if you would like put (kind of) kvm_mmu_hugepage_adjust() here as > well, reporting to user space the maximum map size it could do with, OR, > just report 4K size, let user space itself to detect/decide max possible > size (but now I've no idea how to). No, that's nonsensical because KVM uses the host mapping to compute the max mapping level. If there's no valid mapping, then there's no defined level. And as I said in my reply, KVM should never kick out to userspace if KVM can establish a 4KiB mapping, i.e. 4KiB is always the effective scope, and reporting anything else would just be wild speculation.
On Wed, Jun 14, 2023 at 1:03 PM Sean Christopherson <seanjc@google.com> wrote: > > We have a helper for this too, gfn_round_for_level(). Ooh, but this isn't storing > a gfn, it's storing a gpa. Naughty, naughty. Caught in mischief I didn't even realize I was committing :O Anyways, I've taken all the feedback you provided here. Although, > All that said, consuming fault->goal_level is unnecessary, and not be coincidence. > The *only* time KVM should bail to userspace is if KVM failed to faultin a 4KiB > page. Providing a hugepage is done opportunistically, it's never a hard requirement. > So throw away all of the above and see below. I wonder if my arm64 patch commits the same error. I'll send an email over there asking Oliver about it.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c8961f45e3b1..cb71aae9aaec 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3291,6 +3291,10 @@ static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn) static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { + uint64_t rounded_gfn; + uint64_t fault_size; + uint64_t fault_flags; + if (is_sigpending_pfn(fault->pfn)) { kvm_handle_signal_exit(vcpu); return -EINTR; @@ -3309,6 +3313,15 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa return RET_PF_RETRY; } + fault_size = KVM_HPAGE_SIZE(fault->goal_level); + rounded_gfn = round_down(fault->gfn * PAGE_SIZE, fault_size); + + fault_flags = 0; + if (fault->write) + fault_flags |= KVM_MEMORY_FAULT_FLAG_WRITE; + if (fault->exec) + fault_flags |= KVM_MEMORY_FAULT_FLAG_EXEC; + kvm_populate_efault_info(vcpu, rounded_gfn, fault_size, fault_flags); return -EFAULT; }
Implement KVM_CAP_MEMORY_FAULT_INFO for efaults generated by kvm_handle_error_pfn(). Signed-off-by: Anish Moorthy <amoorthy@google.com> --- arch/x86/kvm/mmu/mmu.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)