Message ID | 20241118130403.23184-1-kalyazin@amazon.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: x86: async_pf: check earlier if can deliver async pf | expand |
Nikita Kalyazin <kalyazin@amazon.com> writes: > On x86, async pagefault events can only be delivered if the page fault > was triggered by guest userspace, not kernel. This is because > the guest may be in non-sleepable context and will not be able > to reschedule. We used to set KVM_ASYNC_PF_SEND_ALWAYS for Linux guests before commit 3a7c8fafd1b42adea229fd204132f6a2fb3cd2d9 Author: Thomas Gleixner <tglx@linutronix.de> Date: Fri Apr 24 09:57:56 2020 +0200 x86/kvm: Restrict ASYNC_PF to user space but KVM side of the feature is kind of still there, namely kvm_pv_enable_async_pf() sets vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS); and then we check it in kvm_can_deliver_async_pf(): if (vcpu->arch.apf.send_user_only && kvm_x86_call(get_cpl)(vcpu) == 0) return false; and this can still be used by some legacy guests I suppose. How about we start with removing this completely? It does not matter if some legacy guest wants to get an APF for CPL0, we are never obliged to actually use the mechanism. > > However existing implementation pays the following overhead even for the > kernel-originated faults, even though it is known in advance that they > cannot be processed asynchronously: > - allocate async PF token > - create and schedule an async work > > This patch avoids the overhead above in case of kernel-originated faults > by moving the `kvm_can_deliver_async_pf` check from > `kvm_arch_async_page_not_present` to `__kvm_faultin_pfn`. > > Note that the existing check `kvm_can_do_async_pf` already calls > `kvm_can_deliver_async_pf` internally, however it only does that if the > `kvm_hlt_in_guest` check is true, ie userspace requested KVM not to exit > on guest halts via `KVM_CAP_X86_DISABLE_EXITS`. In that case the code > proceeds with the async fault processing with the following > justification in 1dfdb45ec510ba27e366878f97484e9c9e728902 ("KVM: x86: > clean up conditions for asynchronous page fault handling"): > > "Even when asynchronous page fault is disabled, KVM does not want to pause > the host if a guest triggers a page fault; instead it will put it into > an artificial HLT state that allows running other host processes while > allowing interrupt delivery into the guest." > > Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> > --- > arch/x86/kvm/mmu/mmu.c | 3 ++- > arch/x86/kvm/x86.c | 5 ++--- > arch/x86/kvm/x86.h | 2 ++ > 3 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 22e7ad235123..11d29d15b6cd 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4369,7 +4369,8 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu, > trace_kvm_async_pf_repeated_fault(fault->addr, fault->gfn); > kvm_make_request(KVM_REQ_APF_HALT, vcpu); > return RET_PF_RETRY; > - } else if (kvm_arch_setup_async_pf(vcpu, fault)) { > + } else if (kvm_can_deliver_async_pf(vcpu) && > + kvm_arch_setup_async_pf(vcpu, fault)) { > return RET_PF_RETRY; > } > } > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 2e713480933a..8edae75b39f7 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -13355,7 +13355,7 @@ static inline bool apf_pageready_slot_free(struct kvm_vcpu *vcpu) > return !val; > } > > -static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu) > +bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu) > { > > if (!kvm_pv_async_pf_enabled(vcpu)) > @@ -13406,8 +13406,7 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, > trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa); > kvm_add_async_pf_gfn(vcpu, work->arch.gfn); > > - if (kvm_can_deliver_async_pf(vcpu) && > - !apf_put_user_notpresent(vcpu)) { > + if (!apf_put_user_notpresent(vcpu)) { > fault.vector = PF_VECTOR; > fault.error_code_valid = true; > fault.error_code = 0; > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h > index ec623d23d13d..9647f41e5c49 100644 > --- a/arch/x86/kvm/x86.h > +++ b/arch/x86/kvm/x86.h > @@ -387,6 +387,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu); > fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu); > > +bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu); > + > extern struct kvm_caps kvm_caps; > extern struct kvm_host_values kvm_host; > > > base-commit: d96c77bd4eeba469bddbbb14323d2191684da82a
On Mon, Nov 18, 2024, Nikita Kalyazin wrote: > On x86, async pagefault events can only be delivered if the page fault > was triggered by guest userspace, not kernel. This is because > the guest may be in non-sleepable context and will not be able > to reschedule. > > However existing implementation pays the following overhead even for the > kernel-originated faults, even though it is known in advance that they > cannot be processed asynchronously: > - allocate async PF token > - create and schedule an async work Very deliberately, because as noted below, async page faults aren't limited to the paravirt case. > This patch avoids the overhead above in case of kernel-originated faults Please avoid "This patch". > by moving the `kvm_can_deliver_async_pf` check from > `kvm_arch_async_page_not_present` to `__kvm_faultin_pfn`. > > Note that the existing check `kvm_can_do_async_pf` already calls > `kvm_can_deliver_async_pf` internally, however it only does that if the > `kvm_hlt_in_guest` check is true, ie userspace requested KVM not to exit > on guest halts via `KVM_CAP_X86_DISABLE_EXITS`. In that case the code > proceeds with the async fault processing with the following > justification in 1dfdb45ec510ba27e366878f97484e9c9e728902 ("KVM: x86: > clean up conditions for asynchronous page fault handling"): > > "Even when asynchronous page fault is disabled, KVM does not want to pause > the host if a guest triggers a page fault; instead it will put it into > an artificial HLT state that allows running other host processes while > allowing interrupt delivery into the guest." None of this justifies breaking host-side, non-paravirt async page faults. If a vCPU hits a missing page, KVM can schedule out the vCPU and let something else run on the pCPU, or enter idle and let the SMT sibling get more cycles, or maybe even enter a low enough sleep state to let other cores turbo a wee bit. I have no objection to disabling host async page faults, e.g. it's probably a net negative for 1:1 vCPU:pCPU pinned setups, but such disabling needs an opt-in from userspace.
On 19/11/2024 13:24, Sean Christopherson wrote: >> This patch avoids the overhead above in case of kernel-originated faults > > Please avoid "This patch". Ack, thanks. >> by moving the `kvm_can_deliver_async_pf` check from >> `kvm_arch_async_page_not_present` to `__kvm_faultin_pfn`. >> >> Note that the existing check `kvm_can_do_async_pf` already calls >> `kvm_can_deliver_async_pf` internally, however it only does that if the >> `kvm_hlt_in_guest` check is true, ie userspace requested KVM not to exit >> on guest halts via `KVM_CAP_X86_DISABLE_EXITS`. In that case the code >> proceeds with the async fault processing with the following >> justification in 1dfdb45ec510ba27e366878f97484e9c9e728902 ("KVM: x86: >> clean up conditions for asynchronous page fault handling"): >> >> "Even when asynchronous page fault is disabled, KVM does not want to pause >> the host if a guest triggers a page fault; instead it will put it into >> an artificial HLT state that allows running other host processes while >> allowing interrupt delivery into the guest." > > None of this justifies breaking host-side, non-paravirt async page faults. If a > vCPU hits a missing page, KVM can schedule out the vCPU and let something else > run on the pCPU, or enter idle and let the SMT sibling get more cycles, or maybe > even enter a low enough sleep state to let other cores turbo a wee bit. > > I have no objection to disabling host async page faults, e.g. it's probably a net > negative for 1:1 vCPU:pCPU pinned setups, but such disabling needs an opt-in from > userspace. That's a good point, I didn't think about it. The async work would still need to execute somewhere in that case (or sleep in GUP until the page is available). If processing the fault synchronously, the vCPU thread can also sleep in the same way freeing the pCPU for something else, so the amount of work to be done looks equivalent (please correct me otherwise). What's the net gain of moving that to an async work in the host async fault case? "while allowing interrupt delivery into the guest." -- is this the main advantage?
On 18/11/2024 17:58, Vitaly Kuznetsov wrote: > Nikita Kalyazin <kalyazin@amazon.com> writes: > >> On x86, async pagefault events can only be delivered if the page fault >> was triggered by guest userspace, not kernel. This is because >> the guest may be in non-sleepable context and will not be able >> to reschedule. > > We used to set KVM_ASYNC_PF_SEND_ALWAYS for Linux guests before > > commit 3a7c8fafd1b42adea229fd204132f6a2fb3cd2d9 > Author: Thomas Gleixner <tglx@linutronix.de> > Date: Fri Apr 24 09:57:56 2020 +0200 > > x86/kvm: Restrict ASYNC_PF to user space > > but KVM side of the feature is kind of still there, namely > > kvm_pv_enable_async_pf() sets > > vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS); > > and then we check it in > > kvm_can_deliver_async_pf(): > > if (vcpu->arch.apf.send_user_only && > kvm_x86_call(get_cpl)(vcpu) == 0) > return false; > > and this can still be used by some legacy guests I suppose. How about > we start with removing this completely? It does not matter if some > legacy guest wants to get an APF for CPL0, we are never obliged to > actually use the mechanism. If I understand you correctly, the change you propose is rather orthogonal to the original one as the check is performed after the work has been already allocated (in kvm_setup_async_pf). Would you expect tangible savings from omitting the send_user_only check? > > -- > Vitaly
On Thu, Nov 21, 2024, Nikita Kalyazin wrote: > On 19/11/2024 13:24, Sean Christopherson wrote: > > None of this justifies breaking host-side, non-paravirt async page faults. If a > > vCPU hits a missing page, KVM can schedule out the vCPU and let something else > > run on the pCPU, or enter idle and let the SMT sibling get more cycles, or maybe > > even enter a low enough sleep state to let other cores turbo a wee bit. > > > > I have no objection to disabling host async page faults, e.g. it's probably a net > > negative for 1:1 vCPU:pCPU pinned setups, but such disabling needs an opt-in from > > userspace. > > That's a good point, I didn't think about it. The async work would still > need to execute somewhere in that case (or sleep in GUP until the page is > available). The "async work" is often an I/O operation, e.g. to pull in the page from disk, or over the network from the source. The *CPU* doesn't need to actively do anything for those operations. The I/O is initiated, so the CPU can do something else, or go idle if there's no other work to be done. > If processing the fault synchronously, the vCPU thread can also sleep in the > same way freeing the pCPU for something else, If and only if the vCPU can handle a PV async #PF. E.g. if the guest kernel flat out doesn't support PV async #PF, or the fault happened while the guest was in an incompatible mode, etc. If KVM doesn't do async #PFs of any kind, the vCPU will spin on the fault until the I/O completes and the page is ready. > so the amount of work to be done looks equivalent (please correct me > otherwise). What's the net gain of moving that to an async work in the host > async fault case? "while allowing interrupt delivery into the guest." -- is > this the main advantage?
Nikita Kalyazin <kalyazin@amazon.com> writes: > On 18/11/2024 17:58, Vitaly Kuznetsov wrote: >> Nikita Kalyazin <kalyazin@amazon.com> writes: >> >>> On x86, async pagefault events can only be delivered if the page fault >>> was triggered by guest userspace, not kernel. This is because >>> the guest may be in non-sleepable context and will not be able >>> to reschedule. >> >> We used to set KVM_ASYNC_PF_SEND_ALWAYS for Linux guests before >> >> commit 3a7c8fafd1b42adea229fd204132f6a2fb3cd2d9 >> Author: Thomas Gleixner <tglx@linutronix.de> >> Date: Fri Apr 24 09:57:56 2020 +0200 >> >> x86/kvm: Restrict ASYNC_PF to user space >> >> but KVM side of the feature is kind of still there, namely >> >> kvm_pv_enable_async_pf() sets >> >> vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS); >> >> and then we check it in >> >> kvm_can_deliver_async_pf(): >> >> if (vcpu->arch.apf.send_user_only && >> kvm_x86_call(get_cpl)(vcpu) == 0) >> return false; >> >> and this can still be used by some legacy guests I suppose. How about >> we start with removing this completely? It does not matter if some >> legacy guest wants to get an APF for CPL0, we are never obliged to >> actually use the mechanism. > > If I understand you correctly, the change you propose is rather > orthogonal to the original one as the check is performed after the work > has been already allocated (in kvm_setup_async_pf). Would you expect > tangible savings from omitting the send_user_only check? > No, I don't expect any performance benefits. Basically, I was referring to the description of your patch: "On x86, async pagefault events can only be delivered if the page fault was triggered by guest userspace, not kernel" and strictly speaking this is not true today as we still support KVM_ASYNC_PF_SEND_ALWAYS in KVM. Yes, modern Linux guest don't use it but the flag is there. Basically, my suggestion is to start with a cleanup (untested): diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6d9f763a7bb9..d0906830a9fb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -974,7 +974,6 @@ struct kvm_vcpu_arch { u64 msr_int_val; /* MSR_KVM_ASYNC_PF_INT */ u16 vec; u32 id; - bool send_user_only; u32 host_apf_flags; bool delivery_as_pf_vmexit; bool pageready_pending; diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index a1efa7907a0b..5558a1ec3dc9 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -87,7 +87,7 @@ struct kvm_clock_pairing { #define KVM_MAX_MMU_OP_BATCH 32 #define KVM_ASYNC_PF_ENABLED (1 << 0) -#define KVM_ASYNC_PF_SEND_ALWAYS (1 << 1) +#define KVM_ASYNC_PF_SEND_ALWAYS (1 << 1) /* deprecated */ #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT (1 << 2) #define KVM_ASYNC_PF_DELIVERY_AS_INT (1 << 3) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 83fe0a78146f..cd15e738ca9b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3585,7 +3585,6 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) sizeof(u64))) return 1; - vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS); vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT; kvm_async_pf_wakeup_all(vcpu); @@ -13374,8 +13373,7 @@ static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu) if (!kvm_pv_async_pf_enabled(vcpu)) return false; - if (vcpu->arch.apf.send_user_only && - kvm_x86_call(get_cpl)(vcpu) == 0) + if (kvm_x86_call(get_cpl)(vcpu) == 0) return false; if (is_guest_mode(vcpu)) {
On Fri, Nov 22, 2024, Vitaly Kuznetsov wrote: > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h > index a1efa7907a0b..5558a1ec3dc9 100644 > --- a/arch/x86/include/uapi/asm/kvm_para.h > +++ b/arch/x86/include/uapi/asm/kvm_para.h > @@ -87,7 +87,7 @@ struct kvm_clock_pairing { > #define KVM_MAX_MMU_OP_BATCH 32 > > #define KVM_ASYNC_PF_ENABLED (1 << 0) > -#define KVM_ASYNC_PF_SEND_ALWAYS (1 << 1) > +#define KVM_ASYNC_PF_SEND_ALWAYS (1 << 1) /* deprecated */ > #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT (1 << 2) > #define KVM_ASYNC_PF_DELIVERY_AS_INT (1 << 3) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 83fe0a78146f..cd15e738ca9b 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3585,7 +3585,6 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) > sizeof(u64))) > return 1; > > - vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS); > vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT; > > kvm_async_pf_wakeup_all(vcpu); > @@ -13374,8 +13373,7 @@ static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu) > if (!kvm_pv_async_pf_enabled(vcpu)) > return false; > > - if (vcpu->arch.apf.send_user_only && > - kvm_x86_call(get_cpl)(vcpu) == 0) > + if (kvm_x86_call(get_cpl)(vcpu) == 0) By x86's general definition of "user", this should be "!= 3" :-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 22e7ad235123..11d29d15b6cd 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4369,7 +4369,8 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu, trace_kvm_async_pf_repeated_fault(fault->addr, fault->gfn); kvm_make_request(KVM_REQ_APF_HALT, vcpu); return RET_PF_RETRY; - } else if (kvm_arch_setup_async_pf(vcpu, fault)) { + } else if (kvm_can_deliver_async_pf(vcpu) && + kvm_arch_setup_async_pf(vcpu, fault)) { return RET_PF_RETRY; } } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2e713480933a..8edae75b39f7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13355,7 +13355,7 @@ static inline bool apf_pageready_slot_free(struct kvm_vcpu *vcpu) return !val; } -static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu) +bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu) { if (!kvm_pv_async_pf_enabled(vcpu)) @@ -13406,8 +13406,7 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa); kvm_add_async_pf_gfn(vcpu, work->arch.gfn); - if (kvm_can_deliver_async_pf(vcpu) && - !apf_put_user_notpresent(vcpu)) { + if (!apf_put_user_notpresent(vcpu)) { fault.vector = PF_VECTOR; fault.error_code_valid = true; fault.error_code = 0; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index ec623d23d13d..9647f41e5c49 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -387,6 +387,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu); fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu); +bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu); + extern struct kvm_caps kvm_caps; extern struct kvm_host_values kvm_host;
On x86, async pagefault events can only be delivered if the page fault was triggered by guest userspace, not kernel. This is because the guest may be in non-sleepable context and will not be able to reschedule. However existing implementation pays the following overhead even for the kernel-originated faults, even though it is known in advance that they cannot be processed asynchronously: - allocate async PF token - create and schedule an async work This patch avoids the overhead above in case of kernel-originated faults by moving the `kvm_can_deliver_async_pf` check from `kvm_arch_async_page_not_present` to `__kvm_faultin_pfn`. Note that the existing check `kvm_can_do_async_pf` already calls `kvm_can_deliver_async_pf` internally, however it only does that if the `kvm_hlt_in_guest` check is true, ie userspace requested KVM not to exit on guest halts via `KVM_CAP_X86_DISABLE_EXITS`. In that case the code proceeds with the async fault processing with the following justification in 1dfdb45ec510ba27e366878f97484e9c9e728902 ("KVM: x86: clean up conditions for asynchronous page fault handling"): "Even when asynchronous page fault is disabled, KVM does not want to pause the host if a guest triggers a page fault; instead it will put it into an artificial HLT state that allows running other host processes while allowing interrupt delivery into the guest." Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> --- arch/x86/kvm/mmu/mmu.c | 3 ++- arch/x86/kvm/x86.c | 5 ++--- arch/x86/kvm/x86.h | 2 ++ 3 files changed, 6 insertions(+), 4 deletions(-) base-commit: d96c77bd4eeba469bddbbb14323d2191684da82a