diff mbox series

[RFC,v5,075/104] KVM: x86: Check for pending APICv interrupt in kvm_vcpu_has_events()

Message ID d28f3b27c281e0c6a8f93c01bb4da78980d654c8.1646422845.git.isaku.yamahata@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM TDX basic feature support | expand

Commit Message

Isaku Yamahata March 4, 2022, 7:49 p.m. UTC
From: Sean Christopherson <sean.j.christopherson@intel.com>

Return true for kvm_vcpu_has_events() if the vCPU has a pending APICv
interrupt to support TDX's usage of APICv.  Unlike VMX, TDX doesn't have
access to vmcs.GUEST_INTR_STATUS and so can't emulate posted interrupts,
i.e. needs to generate a posted interrupt and more importantly can't
manually move requested interrupts into the vIRR (which it also doesn't
have access to).

Because pi_has_pending_interrupt() is heavy operation which uses two atomic
test bit operations and one atomic 256 bit bitmap check, introduce new
callback for this check instead of reusing dy_apicv_has_pending_interrupt()
callback to avoid affecting the exiting code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx/main.c         | 9 +++++++++
 arch/x86/kvm/x86.c              | 5 ++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

Comments

Sean Christopherson April 8, 2022, 4:24 p.m. UTC | #1
On Fri, Mar 04, 2022, isaku.yamahata@intel.com wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Return true for kvm_vcpu_has_events() if the vCPU has a pending APICv
> interrupt to support TDX's usage of APICv.  Unlike VMX, TDX doesn't have
> access to vmcs.GUEST_INTR_STATUS and so can't emulate posted interrupts,

Based on the discussion in the HLT patch, this is no longer true.

> i.e. needs to generate a posted interrupt and more importantly can't
> manually move requested interrupts into the vIRR (which it also doesn't
> have access to).
> 
> Because pi_has_pending_interrupt() is heavy operation which uses two atomic
> test bit operations and one atomic 256 bit bitmap check, introduce new
> callback for this check instead of reusing dy_apicv_has_pending_interrupt()
> callback to avoid affecting the exiting code.

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 89d04cd64cd0..314ae43e07bf 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12111,7 +12111,10 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>  
>  	if (kvm_arch_interrupt_allowed(vcpu) &&
>  	    (kvm_cpu_has_interrupt(vcpu) ||
> -	    kvm_guest_apic_has_interrupt(vcpu)))
> +	     kvm_guest_apic_has_interrupt(vcpu) ||
> +	     (vcpu->arch.apicv_active &&
> +	      kvm_x86_ops.apicv_has_pending_interrupt &&
> +	      kvm_x86_ops.apicv_has_pending_interrupt(vcpu))))

This is pretty gross (fully realizing that I wrote this patch).  It's also arguably
wrong as it really should be called from apic_has_interrupt_for_ppr().

  1. The hook implies it is valid for APICv in general, which is misleading.

  2. It's wasted effort for VMX.
 
  3. It does a poor job of conveying _why_ TDX is different.

  4. KVM is unnecessarily processing its useless "copy" of the PPR/IRR for TDX
     vCPUs.  It's functionally not an issue unless userspace stuffs garbage into
     KVM's vAPIC, but it's unnecessary work.

Rather than hook this path, I would rather we tag kvm_apic has having some of its
state protected.  Then kvm_cpu_has_interrupt() can invoke the alternative,
protected-apic-only hook when appropriate, and kvm_apic_has_interrupt() can bail
immediately instead of doing useless processing of stale vAPIC state.

Note, the below moves the !apic check from tdx_vcpu_reset() to tdx_vcpu_create().
That part should be hoisted earlier in the series, there's no reason to wait until
RESET to perform the check, and I suspect the WARN_ON() can be triggered by userespace.

Compile tested only...

From: Sean Christopherson <seanjc@google.com>
Date: Fri, 8 Apr 2022 08:56:27 -0700
Subject: [PATCH] KVM: TDX: Add support for find pending IRQ in a protected
 local APIC

Add flag and hook to KVM's local APIC management to support determining
whether or not a TDX guest as a pending IRQ.  For TDX vCPUs, the virtual
APIC page is owned by the TDX module and cannot be accessed by KVM.  As a
result, registers that are virtualized by the CPU, e.g. PPR, cannot be
read or written by KVM.  To deliver interrupts for TDX guests, KVM must
send an IRQ to the CPU on the posted interrupt notification vector.  And
to determine if TDX vCPU has a pending interrupt, KVM must check if there
is an outstanding notification.

Return "no interrupt" in kvm_apic_has_interrupt() if the guest APIC is
protected to short-circuit the various other flows that try to pull an
IRQ out of the vAPIC, the only valid operation is querying _if_ an IRQ is
pending, KVM can't do anything based on _which_ IRQ is pending.

Intentionally omit sanity checks from other flows, e.g. PPR update, so as
not to degrade non-TDX guests with unecessary checks.  A well-behaved KVM
and userspace will never reach those flows for TDX guests, but reaching
them is not fatal if something does go awry.

Note, this doesn't handle interrupts that have been delivered to the vCPU
but not yet recognized by the core, i.e. interrupts that are sitting in
vmcs.GUEST_INTR_STATUS.  Querying that state requires a SEAMCALL and will
be supported in a future patch.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/irq.c                 |  3 +++
 arch/x86/kvm/lapic.c               |  3 +++
 arch/x86/kvm/lapic.h               |  2 ++
 arch/x86/kvm/vmx/main.c            | 11 +++++++++++
 arch/x86/kvm/vmx/tdx.c             |  9 ++++++---
 7 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 7e27b73d839f..ce705d0c6241 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -110,6 +110,7 @@ KVM_X86_OP_NULL(update_pi_irte)
 KVM_X86_OP_NULL(start_assignment)
 KVM_X86_OP_NULL(apicv_post_state_restore)
 KVM_X86_OP_NULL(dy_apicv_has_pending_interrupt)
+KVM_X86_OP_NULL(protected_apic_has_interrupt)
 KVM_X86_OP_NULL(set_hv_timer)
 KVM_X86_OP_NULL(cancel_hv_timer)
 KVM_X86_OP(setup_mce)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 489374a57b66..b3dcc0814461 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1491,6 +1491,7 @@ struct kvm_x86_ops {
 	void (*start_assignment)(struct kvm *kvm);
 	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
 	bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
+	bool (*protected_apic_has_interrupt)(struct kvm_vcpu *vcpu);

 	int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 			    bool *expired);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 172b05343cfd..24f180c538b0 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -96,6 +96,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 	if (kvm_cpu_has_extint(v))
 		return 1;

+	if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected)
+		return static_call(kvm_x86_protected_apic_has_interrupt)(v);
+
 	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
 }
 EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9322e6340a74..50a483abc0fe 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2503,6 +2503,9 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
 	if (!kvm_apic_present(vcpu))
 		return -1;

+	if (apic->guest_apic_protected)
+		return -1;
+
 	__apic_update_ppr(apic, &ppr);
 	return apic_has_interrupt_for_ppr(apic, ppr);
 }
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 2b44e533fc8d..7b62f1889a98 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -52,6 +52,8 @@ struct kvm_lapic {
 	bool sw_enabled;
 	bool irr_pending;
 	bool lvt0_in_nmi_mode;
+	/* Select registers in the vAPIC cannot be read/written. */
+	bool guest_apic_protected;
 	/* Number of bits set in ISR. */
 	s16 isr_count;
 	/* The highest vector set in ISR; if -1 - invalid, must scan ISR. */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 882358ac270b..31aab8add010 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -42,6 +42,9 @@ static __init int vt_hardware_setup(void)

 	tdx_hardware_setup(&vt_x86_ops);

+	if (!enable_tdx)
+		vt_x86_ops.protected_apic_has_interrupt = NULL;
+
 	if (enable_ept) {
 		const u64 init_value = enable_tdx ? VMX_EPT_SUPPRESS_VE_BIT : 0ull;
 		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
@@ -148,6 +151,13 @@ static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	return vmx_vcpu_load(vcpu, cpu);
 }

+static bool vt_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	KVM_BUG_ON(!is_td_vcpu(vcpu), vcpu->kvm);
+
+	return pi_has_pending_interrupt(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -297,6 +307,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.sync_pir_to_irr = vmx_sync_pir_to_irr,
 	.deliver_interrupt = vmx_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
+	.protected_apic_has_interrupt = vt_protected_apic_has_interrupt,

 	.set_tss_addr = vmx_set_tss_addr,
 	.set_identity_map_addr = vmx_set_identity_map_addr,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 3a0e826fbe0c..7b9370384ce4 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -467,6 +467,12 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	struct vcpu_tdx *tdx = to_tdx(vcpu);
 	int ret, i;

+	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
+	if (!vcpu->arch.apic)
+		return -EINVAL;
+
+	vcpu->arch.apic->guest_apic_protected = true;
+
 	ret = tdx_alloc_td_page(&tdx->tdvpr);
 	if (ret)
 		return ret;
@@ -602,9 +608,6 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	/* TDX doesn't support INIT event. */
 	if (WARN_ON(init_event))
 		goto td_bugged;
-	/* TDX supports only X2APIC enabled. */
-	if (WARN_ON(!vcpu->arch.apic))
-		goto td_bugged;
 	if (WARN_ON(is_td_vcpu_created(tdx)))
 		goto td_bugged;


base-commit: f88e9fa63cbd87cda9352ee9a86a6f815744be33
--
Paolo Bonzini April 15, 2022, 2:20 p.m. UTC | #2
On 4/8/22 18:24, Sean Christopherson wrote:
>> Return true for kvm_vcpu_has_events() if the vCPU has a pending APICv
>> interrupt to support TDX's usage of APICv.  Unlike VMX, TDX doesn't have
>> access to vmcs.GUEST_INTR_STATUS and so can't emulate posted interrupts,
> Based on the discussion in the HLT patch, this is no longer true.
> 

It's still true, it only has access to RVI > PPR (which is enough to check
if the vCPU is runnable).

> Rather than hook this path, I would rather we tag kvm_apic has having some of its
> state protected.  Then kvm_cpu_has_interrupt() can invoke the alternative,
> protected-apic-only hook when appropriate, and kvm_apic_has_interrupt() can bail
> immediately instead of doing useless processing of stale vAPIC state.

Agreed, this is similar to my suggestion on the HLT patch:

https://lkml.kernel.org/r/a7d28775-2dbe-7d97-7053-e182bd5be51c@redhat.com

Paolo
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 489374a57b66..8dab9f16f559 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1491,6 +1491,7 @@  struct kvm_x86_ops {
 	void (*start_assignment)(struct kvm *kvm);
 	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
 	bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
+	bool (*apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
 
 	int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 			    bool *expired);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 882358ac270b..d75caf0d6861 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -148,6 +148,14 @@  static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	return vmx_vcpu_load(vcpu, cpu);
 }
 
+static bool vt_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return pi_has_pending_interrupt(vcpu);
+
+	return false;
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -297,6 +305,7 @@  struct kvm_x86_ops vt_x86_ops __initdata = {
 	.sync_pir_to_irr = vmx_sync_pir_to_irr,
 	.deliver_interrupt = vmx_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
+	.apicv_has_pending_interrupt = vt_apicv_has_pending_interrupt,
 
 	.set_tss_addr = vmx_set_tss_addr,
 	.set_identity_map_addr = vmx_set_identity_map_addr,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 89d04cd64cd0..314ae43e07bf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12111,7 +12111,10 @@  static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 
 	if (kvm_arch_interrupt_allowed(vcpu) &&
 	    (kvm_cpu_has_interrupt(vcpu) ||
-	    kvm_guest_apic_has_interrupt(vcpu)))
+	     kvm_guest_apic_has_interrupt(vcpu) ||
+	     (vcpu->arch.apicv_active &&
+	      kvm_x86_ops.apicv_has_pending_interrupt &&
+	      kvm_x86_ops.apicv_has_pending_interrupt(vcpu))))
 		return true;
 
 	if (kvm_hv_has_stimer_pending(vcpu))