diff mbox series

[RFC,v3,23/59] KVM: x86: Allow host-initiated WRMSR to set X2APIC regardless of CPUID

Message ID 63556f13e9608cbccf97d356be46a345772d76d3.1637799475.git.isaku.yamahata@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: X86: TDX support | expand

Commit Message

Isaku Yamahata Nov. 25, 2021, 12:20 a.m. UTC
From: Sean Christopherson <sean.j.christopherson@intel.com>

Let userspace, or in the case of TDX, KVM itself, enable X2APIC even if
X2APIC is not reported as supported in the guest's CPU model.  KVM
generally does not force specific ordering between ioctls(), e.g. this
forces userspace to configure CPUID before MSRs.  And for TDX, vCPUs
will always run with X2APIC enabled, e.g. KVM will want/need to enable
X2APIC from time zero.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/x86.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Thomas Gleixner Nov. 25, 2021, 7:41 p.m. UTC | #1
On Wed, Nov 24 2021 at 16:20, isaku yamahata wrote:
> Let userspace, or in the case of TDX, KVM itself, enable X2APIC even if
> X2APIC is not reported as supported in the guest's CPU model.  KVM
> generally does not force specific ordering between ioctls(), e.g. this
> forces userspace to configure CPUID before MSRs.  And for TDX, vCPUs
> will always run with X2APIC enabled, e.g. KVM will want/need to enable
> X2APIC from time zero.

This is complete crap. Fix the broken user space and do not add
horrible hacks to the kernel.

Thanks,

        tglx
Paolo Bonzini Nov. 26, 2021, 8:18 a.m. UTC | #2
On 11/25/21 20:41, Thomas Gleixner wrote:
> On Wed, Nov 24 2021 at 16:20, isaku yamahata wrote:
>> Let userspace, or in the case of TDX, KVM itself, enable X2APIC even if
>> X2APIC is not reported as supported in the guest's CPU model.  KVM
>> generally does not force specific ordering between ioctls(), e.g. this
>> forces userspace to configure CPUID before MSRs.  And for TDX, vCPUs
>> will always run with X2APIC enabled, e.g. KVM will want/need to enable
>> X2APIC from time zero.
> 
> This is complete crap. Fix the broken user space and do not add
> horrible hacks to the kernel.

tl;dr: I agree that it's a userspace issue but "configure CPUID before 
MSR" is not the issue (in fact QEMU calls KVM_SET_CPUID2 before any call 
to KVM_SET_MSRS).

We have quite a few other cases in which KVM_GET/SET_MSR is allowed to 
get/set MSRs in ways that the guests are not allowed to do.

In general, there are several reasons for this:

- simplifying userspace so that it can use the same list of MSRs for all 
guests (likely, the list that KVM provides with KVM_GET_MSR_INDEX_LIST). 
  For example MSR_TSC_AUX is only exposed to the guest if RDTSCP or 
RDPID are available, but the host can always access it.  This is usually 
the reason why host accesses to MSRs override CPUID.

- simplifying userspace so that it does not have to go through the 
various steps of a state machine; for example, it's okay if userspace 
goes DISABLED->X2APIC instead of having to do DISABLED->XAPIC->X2APIC.

- allowing userspace to set a reset value, for example overriding the 
lock bit in MSR_IA32_FEAT_CTL.

- read-only MSRs that are really "CPUID-like", i.e. they give the guest 
information about processor features (for example the VMX feature MSRs)

- MSRs had some weird limitations that were lifted later by introducing 
additional MSRs; for example KVM always allows the host to write to the 
full-width MSR_IA32_PMC0 counters, because they are a saner version of 
MSR_IA32_PERFCTR0 and there's no reason for userspace to inflict 
MSR_IA32_PERFCTR0 on userspace.

So the host_initiated check doesn't _necessarily_ count as a horrible 
hack in the kernel.  However, in this case we have a trusted domain 
without X2APIC.  I'm not sure this configuration is clearly bogus.  One 
could imagine special-purpose VMs that don't need interrupts at all. 
For full guests such as the ones that QEMU runs, I agree with Thomas 
that userspace must be fixed to enforce x2apic for TDX guests.

Paolo
Sean Christopherson Nov. 29, 2021, 9:21 p.m. UTC | #3
On Fri, Nov 26, 2021, Paolo Bonzini wrote:
> On 11/25/21 20:41, Thomas Gleixner wrote:
> > On Wed, Nov 24 2021 at 16:20, isaku yamahata wrote:
> > > Let userspace, or in the case of TDX, KVM itself, enable X2APIC even if
                                            ^^^^^^^^^^

> > > X2APIC is not reported as supported in the guest's CPU model.  KVM
> > > generally does not force specific ordering between ioctls(), e.g. this
> > > forces userspace to configure CPUID before MSRs.  And for TDX, vCPUs
> > > will always run with X2APIC enabled, e.g. KVM will want/need to enable
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > X2APIC from time zero.
      ^^^^^^^^^^^^^^^^^^^^^
> > 
> > This is complete crap. Fix the broken user space and do not add
> > horrible hacks to the kernel.
> 
> tl;dr: I agree that it's a userspace issue but "configure CPUID before MSR"
> is not the issue (in fact QEMU calls KVM_SET_CPUID2 before any call to
> KVM_SET_MSRS).

Specifically for TDX, it's not a userspace issue.  To simplify other checks and
to report sane values for KVM_GET_MSRS, KVM forces X2APIC for TDX guests when the
vCPU is created, before its exposed to usersepace.  The bit about not forcing
specific ordering is justification for making the change independent of TDX,
i.e. to call out that APIC_BASE is different from every other MSR, and is even
inconsistent in its own behavior since illegal transitions are allowed when
userspace is stuffing the MSR.

IMO, this patch is valid irrespective of TDX.  It's included in the TDX series
because TDX support forces the issue.

That said, an alternative for TDX would be do handle this in kvm_lapic_reset()
now that the lAPIC RESET flows are consolidated.  Back when this patch was first
written, that wasn't really an option.
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f2b6a3f89e9e..0221ef691a15 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -466,8 +466,11 @@  int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	enum lapic_mode old_mode = kvm_get_apic_mode(vcpu);
 	enum lapic_mode new_mode = kvm_apic_mode(msr_info->data);
-	u64 reserved_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu) | 0x2ff |
-		(guest_cpuid_has(vcpu, X86_FEATURE_X2APIC) ? 0 : X2APIC_ENABLE);
+	u64 reserved_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu) | 0x2ff;
+
+	if (!msr_info->host_initiated &&
+	    !guest_cpuid_has(vcpu, X86_FEATURE_X2APIC))
+		reserved_bits |= X2APIC_ENABLE;
 
 	if ((msr_info->data & reserved_bits) != 0 || new_mode == LAPIC_MODE_INVALID)
 		return 1;