diff mbox series

[v2,02/10] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info

Message ID 20200525144125.143875-3-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications | expand

Commit Message

Vitaly Kuznetsov May 25, 2020, 2:41 p.m. UTC
Currently, APF mechanism relies on the #PF abuse where the token is being
passed through CR2. If we switch to using interrupts to deliver page-ready
notifications we need a different way to pass the data. Extent the existing
'struct kvm_vcpu_pv_apf_data' with token information for page-ready
notifications.

While on it, rename 'reason' to 'flags'. This doesn't change the semantics
as we only have reasons '1' and '2' and these can be treated as bit flags
but KVM_PV_REASON_PAGE_READY is going away with interrupt based delivery
making 'reason' name misleading.

The newly introduced apf_put_user_ready() temporary puts both flags and
token information, this will be changed to put token only when we switch
to interrupt based notifications.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h      |  2 +-
 arch/x86/include/asm/kvm_para.h      |  4 ++--
 arch/x86/include/uapi/asm/kvm_para.h |  5 +++--
 arch/x86/kernel/kvm.c                | 16 ++++++++--------
 arch/x86/kvm/mmu/mmu.c               |  6 +++---
 arch/x86/kvm/svm/nested.c            |  2 +-
 arch/x86/kvm/svm/svm.c               |  3 ++-
 arch/x86/kvm/vmx/nested.c            |  2 +-
 arch/x86/kvm/vmx/vmx.c               |  5 +++--
 arch/x86/kvm/x86.c                   | 17 +++++++++++++----
 10 files changed, 37 insertions(+), 25 deletions(-)

Comments

Vivek Goyal May 26, 2020, 6:27 p.m. UTC | #1
On Mon, May 25, 2020 at 04:41:17PM +0200, Vitaly Kuznetsov wrote:
> 

[..]
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0a6b35353fc7..c195f63c1086 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -767,7 +767,7 @@ struct kvm_vcpu_arch {
>  		u64 msr_val;
>  		u32 id;
>  		bool send_user_only;
> -		u32 host_apf_reason;
> +		u32 host_apf_flags;

Hi Vitaly,

What is host_apf_reason used for. Looks like it is somehow used in
context of nested guests. I hope by now you have been able to figure
it out.

Is it somehow the case of that L2 guest takes a page fault exit
and then L0 injects this event in L1 using exception. I have been
trying to read this code but can't wrap my head around it.

I am still concerned about the case of nested kvm. We have discussed
apf mechanism but never touched nested part of it. Given we are
touching code in nested kvm part, want to make sure it is not broken
in new design.

Thanks
Vivek
Vitaly Kuznetsov May 28, 2020, 8:42 a.m. UTC | #2
Vivek Goyal <vgoyal@redhat.com> writes:

> On Mon, May 25, 2020 at 04:41:17PM +0200, Vitaly Kuznetsov wrote:
>> 
>
> [..]
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 0a6b35353fc7..c195f63c1086 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -767,7 +767,7 @@ struct kvm_vcpu_arch {
>>  		u64 msr_val;
>>  		u32 id;
>>  		bool send_user_only;
>> -		u32 host_apf_reason;
>> +		u32 host_apf_flags;
>
> Hi Vitaly,
>
> What is host_apf_reason used for. Looks like it is somehow used in
> context of nested guests. I hope by now you have been able to figure
> it out.
>
> Is it somehow the case of that L2 guest takes a page fault exit
> and then L0 injects this event in L1 using exception. I have been
> trying to read this code but can't wrap my head around it.
>
> I am still concerned about the case of nested kvm. We have discussed
> apf mechanism but never touched nested part of it. Given we are
> touching code in nested kvm part, want to make sure it is not broken
> in new design.
>

Sorry I missed this.

I think we've touched nested topic a bit already:
https://lore.kernel.org/kvm/87lfluwfi0.fsf@vitty.brq.redhat.com/

But let me try to explain the whole thing and maybe someone will point
out what I'm missing.

The problem being solved: L2 guest is running and it is hitting a page
which is not present *in L0* and instead of pausing *L1* vCPU completely
we want to let L1 know about the problem so it can run something else
(e.g. another guest or just another application).

What's different between this and 'normal' APF case. When L2 guest is
running, the CPU (physical) is in 'guest' mode so we can't inject #PF
there. Actually, we can but L2 may get confused and we're not even sure
it's L2's fault, that L2 supported APF and so on. We want to make L1
deal with the issue.

How does it work then. We inject #PF and L1 sees it as #PF VMEXIT. It
needs to know about APF (thus KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT) but
the handling is exactly the same as do_pagefault(): L1's
kvm_handle_page_fault() checkes APF area (shared between L0 and L1) and
either pauses a task or resumes a previously paused one. This can be a
L2 guest or something else.

What is 'host_apf_reason'. It is a copy of 'reason' field from 'struct
kvm_vcpu_pv_apf_data' which we read upon #PF VMEXIT. It indicates that
the #PF VMEXIT is synthetic.

How does it work with the patchset: 'page not present' case remains the
same. 'page ready' case now goes through interrupts so it may not get
handled immediately. External interrupts will be handled by L0 in host
mode (when L2 is not running). For the 'page ready' case L1 hypervisor
doesn't need any special handling, kvm_async_pf_intr() irq handler will
work correctly.

I've smoke tested this with VMX and nothing immediately blew up.
Paolo Bonzini May 28, 2020, 10:59 a.m. UTC | #3
On 28/05/20 10:42, Vitaly Kuznetsov wrote:
> How does it work with the patchset: 'page not present' case remains the
> same. 'page ready' case now goes through interrupts so it may not get
> handled immediately. External interrupts will be handled by L0 in host
> mode (when L2 is not running). For the 'page ready' case L1 hypervisor
> doesn't need any special handling, kvm_async_pf_intr() irq handler will
> work correctly.
> 
> I've smoke tested this with VMX and nothing immediately blew up.

Indeed.

It would be an issue in the remote (read: nonexistent) case of a
hypervisor that handles async page faults and does not VMEXIT on
interrupts.  In this case it would not be able to enable page ready as
interrupt, and it would have to get rid of async page fault support.

Paolo
Vivek Goyal June 3, 2020, 7:35 p.m. UTC | #4
On Thu, May 28, 2020 at 10:42:38AM +0200, Vitaly Kuznetsov wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
> > On Mon, May 25, 2020 at 04:41:17PM +0200, Vitaly Kuznetsov wrote:
> >> 
> >
> > [..]
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index 0a6b35353fc7..c195f63c1086 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -767,7 +767,7 @@ struct kvm_vcpu_arch {
> >>  		u64 msr_val;
> >>  		u32 id;
> >>  		bool send_user_only;
> >> -		u32 host_apf_reason;
> >> +		u32 host_apf_flags;
> >
> > Hi Vitaly,
> >
> > What is host_apf_reason used for. Looks like it is somehow used in
> > context of nested guests. I hope by now you have been able to figure
> > it out.
> >
> > Is it somehow the case of that L2 guest takes a page fault exit
> > and then L0 injects this event in L1 using exception. I have been
> > trying to read this code but can't wrap my head around it.
> >
> > I am still concerned about the case of nested kvm. We have discussed
> > apf mechanism but never touched nested part of it. Given we are
> > touching code in nested kvm part, want to make sure it is not broken
> > in new design.
> >
> 
> Sorry I missed this.
> 
> I think we've touched nested topic a bit already:
> https://lore.kernel.org/kvm/87lfluwfi0.fsf@vitty.brq.redhat.com/
> 
> But let me try to explain the whole thing and maybe someone will point
> out what I'm missing.

Hi Vitaly,

Sorry, I got busy in some other things. Got back to it now. Thanks for
the explanation. I think I understand it up to some extent now.

Vivek

> 
> The problem being solved: L2 guest is running and it is hitting a page
> which is not present *in L0* and instead of pausing *L1* vCPU completely
> we want to let L1 know about the problem so it can run something else
> (e.g. another guest or just another application).
> 
> What's different between this and 'normal' APF case. When L2 guest is
> running, the CPU (physical) is in 'guest' mode so we can't inject #PF
> there. Actually, we can but L2 may get confused and we're not even sure
> it's L2's fault, that L2 supported APF and so on. We want to make L1
> deal with the issue.
> 
> How does it work then. We inject #PF and L1 sees it as #PF VMEXIT. It
> needs to know about APF (thus KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT) but
> the handling is exactly the same as do_pagefault(): L1's
> kvm_handle_page_fault() checkes APF area (shared between L0 and L1) and
> either pauses a task or resumes a previously paused one. This can be a
> L2 guest or something else.
> 
> What is 'host_apf_reason'. It is a copy of 'reason' field from 'struct
> kvm_vcpu_pv_apf_data' which we read upon #PF VMEXIT. It indicates that
> the #PF VMEXIT is synthetic.
> 
> How does it work with the patchset: 'page not present' case remains the
> same. 'page ready' case now goes through interrupts so it may not get
> handled immediately. External interrupts will be handled by L0 in host
> mode (when L2 is not running). For the 'page ready' case L1 hypervisor
> doesn't need any special handling, kvm_async_pf_intr() irq handler will
> work correctly.
> 
> I've smoke tested this with VMX and nothing immediately blew up.
> 
> -- 
> Vitaly
>
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0a6b35353fc7..c195f63c1086 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -767,7 +767,7 @@  struct kvm_vcpu_arch {
 		u64 msr_val;
 		u32 id;
 		bool send_user_only;
-		u32 host_apf_reason;
+		u32 host_apf_flags;
 		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
 	} apf;
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9b4df6eaa11a..2a3102fee189 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -90,7 +90,7 @@  unsigned int kvm_arch_para_features(void);
 unsigned int kvm_arch_para_hints(void);
 void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
 void kvm_async_pf_task_wake(u32 token);
-u32 kvm_read_and_reset_pf_reason(void);
+u32 kvm_read_and_reset_apf_flags(void);
 extern void kvm_disable_steal_time(void);
 void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
 
@@ -121,7 +121,7 @@  static inline unsigned int kvm_arch_para_hints(void)
 	return 0;
 }
 
-static inline u32 kvm_read_and_reset_pf_reason(void)
+static inline u32 kvm_read_and_reset_apf_flags(void)
 {
 	return 0;
 }
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..d1cd5c0f431a 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -112,8 +112,9 @@  struct kvm_mmu_op_release_pt {
 #define KVM_PV_REASON_PAGE_READY 2
 
 struct kvm_vcpu_pv_apf_data {
-	__u32 reason;
-	__u8 pad[60];
+	__u32 flags;
+	__u32 token; /* Used for page ready notification only */
+	__u8 pad[56];
 	__u32 enabled;
 };
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6efe0410fb72..340df5dab30d 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -228,24 +228,24 @@  void kvm_async_pf_task_wake(u32 token)
 }
 EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);
 
-u32 kvm_read_and_reset_pf_reason(void)
+u32 kvm_read_and_reset_apf_flags(void)
 {
-	u32 reason = 0;
+	u32 flags = 0;
 
 	if (__this_cpu_read(apf_reason.enabled)) {
-		reason = __this_cpu_read(apf_reason.reason);
-		__this_cpu_write(apf_reason.reason, 0);
+		flags = __this_cpu_read(apf_reason.flags);
+		__this_cpu_write(apf_reason.flags, 0);
 	}
 
-	return reason;
+	return flags;
 }
-EXPORT_SYMBOL_GPL(kvm_read_and_reset_pf_reason);
-NOKPROBE_SYMBOL(kvm_read_and_reset_pf_reason);
+EXPORT_SYMBOL_GPL(kvm_read_and_reset_apf_flags);
+NOKPROBE_SYMBOL(kvm_read_and_reset_apf_flags);
 
 dotraplinkage void
 do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 {
-	switch (kvm_read_and_reset_pf_reason()) {
+	switch (kvm_read_and_reset_apf_flags()) {
 	default:
 		do_page_fault(regs, error_code, address);
 		break;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8071952e9cf2..7fa5253237b2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4186,7 +4186,7 @@  int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 #endif
 
 	vcpu->arch.l1tf_flush_l1d = true;
-	switch (vcpu->arch.apf.host_apf_reason) {
+	switch (vcpu->arch.apf.host_apf_flags) {
 	default:
 		trace_kvm_page_fault(fault_address, error_code);
 
@@ -4196,13 +4196,13 @@  int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 				insn_len);
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
-		vcpu->arch.apf.host_apf_reason = 0;
+		vcpu->arch.apf.host_apf_flags = 0;
 		local_irq_disable();
 		kvm_async_pf_task_wait(fault_address, 0);
 		local_irq_enable();
 		break;
 	case KVM_PV_REASON_PAGE_READY:
-		vcpu->arch.apf.host_apf_reason = 0;
+		vcpu->arch.apf.host_apf_flags = 0;
 		local_irq_disable();
 		kvm_async_pf_task_wake(fault_address);
 		local_irq_enable();
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9a2a62e5afeb..c04adbbdbd20 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -835,7 +835,7 @@  int nested_svm_exit_special(struct vcpu_svm *svm)
 		break;
 	case SVM_EXIT_EXCP_BASE + PF_VECTOR:
 		/* When we're shadowing, trap PFs, but not async PF */
-		if (!npt_enabled && svm->vcpu.arch.apf.host_apf_reason == 0)
+		if (!npt_enabled && svm->vcpu.arch.apf.host_apf_flags == 0)
 			return NESTED_EXIT_HOST;
 		break;
 	default:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a862c768fd54..70d136e9e075 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3399,7 +3399,8 @@  static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	/* if exit due to PF check for async PF */
 	if (svm->vmcb->control.exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR)
-		svm->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
+		svm->vcpu.arch.apf.host_apf_flags =
+			kvm_read_and_reset_apf_flags();
 
 	if (npt_enabled) {
 		vcpu->arch.regs_avail &= ~(1 << VCPU_EXREG_PDPTR);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e44f33c82332..c82d59070a9c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5582,7 +5582,7 @@  bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason)
 		if (is_nmi(intr_info))
 			return false;
 		else if (is_page_fault(intr_info))
-			return !vmx->vcpu.arch.apf.host_apf_reason && enable_ept;
+			return !vmx->vcpu.arch.apf.host_apf_flags && enable_ept;
 		else if (is_debug(intr_info) &&
 			 vcpu->guest_debug &
 			 (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 89c766fad889..4a82319353f2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4662,7 +4662,7 @@  static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 	if (is_page_fault(intr_info)) {
 		cr2 = vmcs_readl(EXIT_QUALIFICATION);
 		/* EPT won't cause page fault directly */
-		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason && enable_ept);
+		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_flags && enable_ept);
 		return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
 	}
 
@@ -6234,7 +6234,8 @@  static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 
 	/* if exit due to PF check for async PF */
 	if (is_page_fault(vmx->exit_intr_info)) {
-		vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
+		vmx->vcpu.arch.apf.host_apf_flags =
+			kvm_read_and_reset_apf_flags();
 	/* Handle machine checks before interrupts are enabled */
 	} else if (is_machine_check(vmx->exit_intr_info)) {
 		kvm_machine_check();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8e0bbe4f0d5b..a3102406102c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2654,7 +2654,7 @@  static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 	}
 
 	if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
-					sizeof(u32)))
+					sizeof(u64)))
 		return 1;
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
@@ -10357,8 +10357,17 @@  static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 	}
 }
 
-static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
+static inline int apf_put_user_notpresent(struct kvm_vcpu *vcpu)
 {
+	u32 reason = KVM_PV_REASON_PAGE_NOT_PRESENT;
+
+	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &reason,
+				      sizeof(reason));
+}
+
+static inline int apf_put_user_ready(struct kvm_vcpu *vcpu, u32 token)
+{
+	u64 val = (u64)token << 32 | KVM_PV_REASON_PAGE_READY;
 
 	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &val,
 				      sizeof(val));
@@ -10403,7 +10412,7 @@  void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 	kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
 
 	if (kvm_can_deliver_async_pf(vcpu) &&
-	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
+	    !apf_put_user_notpresent(vcpu)) {
 		fault.vector = PF_VECTOR;
 		fault.error_code_valid = true;
 		fault.error_code = 0;
@@ -10436,7 +10445,7 @@  void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
 	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
-	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+	    !apf_put_user_ready(vcpu, work->arch.token)) {
 			fault.vector = PF_VECTOR;
 			fault.error_code_valid = true;
 			fault.error_code = 0;