diff mbox series

[4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api

Message ID 20220301135526.136554-5-mlevitsk@redhat.com (mailing list archive)
State New, archived
Headers show
Series SVM fixes + apic fix | expand

Commit Message

Maxim Levitsky March 1, 2022, 1:55 p.m. UTC
Fix a loop hole in setting the apic state that didn't check if
apic id == vcpu_id when x2apic is enabled but userspace is using
a older variant of the ioctl which didn't had 32 bit apic ids.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/lapic.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

Comments

Sean Christopherson March 1, 2022, 4:56 p.m. UTC | #1
Please, please post standalone patches/fixes as standalone patches/fixes.  And in
general, keep series to one topic.  There is very real value in following the
established and documented process, e.g. avoids creating artificial dependencies
where a changes works only because of an "unrelated" patch earlier in the series.
And for us reviewers, it helps tremendously as it means I can scan just the cover
letter for a series to prioritize review accordingly.  Bundling things together
means I have to scan through every patch to triage the series..

On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> Fix a loop hole in setting the apic state that didn't check if

Heh, "loophole", took me a second to figure out there was no literal loop. :-)

> apic id == vcpu_id when x2apic is enabled but userspace is using
> a older variant of the ioctl which didn't had 32 bit apic ids.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 80a2020c4db40..8d35f56c64020 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>  		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
>  		u64 icr;
>  
> -		if (vcpu->kvm->arch.x2apic_format) {
> -			if (*id != vcpu->vcpu_id)
> -				return -EINVAL;
> -		} else {
> -			if (set)
> -				*id >>= 24;
> -			else
> -				*id <<= 24;
> -		}
> +		if (!vcpu->kvm->arch.x2apic_format && set)
> +			*id >>= 24;
> +
> +		if (*id != vcpu->vcpu_id)
> +			return -EINVAL;

This breaks backwards compability, userspace will start failing where it previously
succeeded.  It doesn't even require a malicious userspace, e.g. if userspace creates
a vCPU with a vcpu_id > 255 vCPUs, the shift will truncate and cause failure.  It'll
obviously do weird things, but this code is 5.5 years old, I don't think it's worth
trying to close a loophole that doesn't harm KVM.

If we provide a way for userspace to opt into disallowiong APICID != vcpu_id, we
can address this further upstream, e.g. reject vcpu_id > 255 without x2apic_format.

> +
> +		if (!vcpu->kvm->arch.x2apic_format && !set)
> +			*id <<= 24;
>  
>  		/*
>  		 * In x2APIC mode, the LDR is fixed and based on the id.  And
> -- 
> 2.26.3
>
Maxim Levitsky March 1, 2022, 5:09 p.m. UTC | #2
On Tue, 2022-03-01 at 16:56 +0000, Sean Christopherson wrote:
> Please, please post standalone patches/fixes as standalone patches/fixes.  And in
> general, keep series to one topic.  There is very real value in following the
> established and documented process, e.g. avoids creating artificial dependencies
> where a changes works only because of an "unrelated" patch earlier in the series.
> And for us reviewers, it helps tremendously as it means I can scan just the cover
> letter for a series to prioritize review accordingly.  Bundling things together
> means I have to scan through every patch to triage the series..

I know, this series is just set of small fixes - this patch belongs to the series
of my nested avic, but as was requested, I posted it seperately as part of
'fixes only' series, since it is stanadlone. All patches in this series
are standalone.

> 
> On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > Fix a loop hole in setting the apic state that didn't check if
> 
> Heh, "loophole", took me a second to figure out there was no literal loop. :-)
> 
> > apic id == vcpu_id when x2apic is enabled but userspace is using
> > a older variant of the ioctl which didn't had 32 bit apic ids.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/lapic.c | 17 ++++++++---------
> >  1 file changed, 8 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 80a2020c4db40..8d35f56c64020 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> >  		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
> >  		u64 icr;
> >  
> > -		if (vcpu->kvm->arch.x2apic_format) {
> > -			if (*id != vcpu->vcpu_id)
> > -				return -EINVAL;
> > -		} else {
> > -			if (set)
> > -				*id >>= 24;
> > -			else
> > -				*id <<= 24;
> > -		}
> > +		if (!vcpu->kvm->arch.x2apic_format && set)
> > +			*id >>= 24;
> > +
> > +		if (*id != vcpu->vcpu_id)
> > +			return -EINVAL;
> 
> This breaks backwards compability, userspace will start failing where it previously
> succeeded.  It doesn't even require a malicious userspace, e.g. if userspace creates
> a vCPU with a vcpu_id > 255 vCPUs, the shift will truncate and cause failure.  It'll
> obviously do weird things, but this code is 5.5 years old, I don't think it's worth
> trying to close a loophole that doesn't harm KVM.
> 
> If we provide a way for userspace to opt into disallowiong APICID != vcpu_id, we
> can address this further upstream, e.g. reject vcpu_id > 255 without x2apic_format.

This check is only when CPU is in x2apic mode. In this mode the X86 specs demands that
apic_id == vcpu_id.

When old API is used, APIC IDs are encoded in xapic format, even when vCPU is in x2apic
mode, meaning that apic id can't be more  that 255 even in x2apic mode.
That is why new API 'x2apic_format' was added in first place.

Thus I don't see how this is breaking userspace.

Best regards,
	Maxim Levitsky



> 
> > +
> > +		if (!vcpu->kvm->arch.x2apic_format && !set)
> > +			*id <<= 24;
> >  
> >  		/*
> >  		 * In x2APIC mode, the LDR is fixed and based on the id.  And
> > -- 
> > 2.26.3
> >
Sean Christopherson March 1, 2022, 5:46 p.m. UTC | #3
On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 80a2020c4db40..8d35f56c64020 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> > >  		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
> > >  		u64 icr;
> > >  
> > > -		if (vcpu->kvm->arch.x2apic_format) {
> > > -			if (*id != vcpu->vcpu_id)
> > > -				return -EINVAL;
> > > -		} else {
> > > -			if (set)
> > > -				*id >>= 24;
> > > -			else
> > > -				*id <<= 24;
> > > -		}
> > > +		if (!vcpu->kvm->arch.x2apic_format && set)
> > > +			*id >>= 24;
> > > +
> > > +		if (*id != vcpu->vcpu_id)
> > > +			return -EINVAL;
> > 
> > This breaks backwards compability, userspace will start failing where it previously
> > succeeded.  It doesn't even require a malicious userspace, e.g. if userspace creates
> > a vCPU with a vcpu_id > 255 vCPUs, the shift will truncate and cause failure.  It'll
> > obviously do weird things, but this code is 5.5 years old, I don't think it's worth
> > trying to close a loophole that doesn't harm KVM.
> > 
> > If we provide a way for userspace to opt into disallowiong APICID != vcpu_id, we
> > can address this further upstream, e.g. reject vcpu_id > 255 without x2apic_format.
> 
> This check is only when CPU is in x2apic mode. In this mode the X86 specs demands that
> apic_id == vcpu_id.

AMD's APM explicitly allows changing the x2APIC ID by writing the xAPIC ID and then
switching to x2APIC mode.

> When old API is used, APIC IDs are encoded in xapic format, even when vCPU is in x2apic
> mode, meaning that apic id can't be more  that 255 even in x2apic mode.

Even on Intel CPUs, if userspace creates a vCPU with vcpu_id = 256, then the xAPIC ID
will be (256 << 24) == 0.  If userspace does get+set, then KVM will see 0 != 256 and
reject the set with return -EINVAL.

And as above, userspace that hasn't opted into the x2apic_format could also legitimately
change the x2APIC ID.

I 100% agree this is a mess and probably can't work without major shenanigans, but on
the other hand this isn't harming anything, so why risk breaking userspace, even if the
risk is infinitesimally small?

I'm all for locking down the APIC ID with a userspace opt-in control, I just don't
think it's worth trying to retroactively plug the various holes in KVM.
Maxim Levitsky March 1, 2022, 5:56 p.m. UTC | #4
On Tue, 2022-03-01 at 17:46 +0000, Sean Christopherson wrote:
> On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 80a2020c4db40..8d35f56c64020 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> > > >  		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
> > > >  		u64 icr;
> > > >  
> > > > -		if (vcpu->kvm->arch.x2apic_format) {
> > > > -			if (*id != vcpu->vcpu_id)
> > > > -				return -EINVAL;
> > > > -		} else {
> > > > -			if (set)
> > > > -				*id >>= 24;
> > > > -			else
> > > > -				*id <<= 24;
> > > > -		}
> > > > +		if (!vcpu->kvm->arch.x2apic_format && set)
> > > > +			*id >>= 24;
> > > > +
> > > > +		if (*id != vcpu->vcpu_id)
> > > > +			return -EINVAL;
> > > 
> > > This breaks backwards compability, userspace will start failing where it previously
> > > succeeded.  It doesn't even require a malicious userspace, e.g. if userspace creates
> > > a vCPU with a vcpu_id > 255 vCPUs, the shift will truncate and cause failure.  It'll
> > > obviously do weird things, but this code is 5.5 years old, I don't think it's worth
> > > trying to close a loophole that doesn't harm KVM.
> > > 
> > > If we provide a way for userspace to opt into disallowiong APICID != vcpu_id, we
> > > can address this further upstream, e.g. reject vcpu_id > 255 without x2apic_format.
> > 
> > This check is only when CPU is in x2apic mode. In this mode the X86 specs demands that
> > apic_id == vcpu_id.
> 
> AMD's APM explicitly allows changing the x2APIC ID by writing the xAPIC ID and then
> switching to x2APIC mode.
This patch was supposed to go together with patch that fully forbids changing apic id.

> 
> > When old API is used, APIC IDs are encoded in xapic format, even when vCPU is in x2apic
> > mode, meaning that apic id can't be more  that 255 even in x2apic mode.
> 
> Even on Intel CPUs, if userspace creates a vCPU with vcpu_id = 256, then the xAPIC ID
> will be (256 << 24) == 0.  If userspace does get+set, then KVM will see 0 != 256 and
> reject the set with return -EINVAL.

First of all, I am saying again - with old API (!x2apic_format) - the APIC ID was limited
to 8 bits - literaly only bits 24-31 of the value was used.

With old API - literaly, the APIC_ID register was supposed to have x2apic format
regardless of if vCPU is in x2apic mode or not, and the above code cares to translate
it to the real apic id register from and to.

Thus there is really no point of letting old API change apic id for x2apic mode,
it is literaly a loophole that should be plugged.

Best regards,
	Maxim Levitsky

> 
> And as above, userspace that hasn't opted into the x2apic_format could also legitimately
> change the x2APIC ID.
> 
> I 100% agree this is a mess and probably can't work without major shenanigans, but on
> the other hand this isn't harming anything, so why risk breaking userspace, even if the
> risk is infinitesimally small?
> 
> I'm all for locking down the APIC ID with a userspace opt-in control, I just don't
> think it's worth trying to retroactively plug the various holes in KVM.
>
Maxim Levitsky March 2, 2022, 11:50 a.m. UTC | #5
On Tue, 2022-03-01 at 19:56 +0200, Maxim Levitsky wrote:
> On Tue, 2022-03-01 at 17:46 +0000, Sean Christopherson wrote:
> > On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > index 80a2020c4db40..8d35f56c64020 100644
> > > > > --- a/arch/x86/kvm/lapic.c
> > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > @@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> > > > >  		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
> > > > >  		u64 icr;
> > > > >  
> > > > > -		if (vcpu->kvm->arch.x2apic_format) {
> > > > > -			if (*id != vcpu->vcpu_id)
> > > > > -				return -EINVAL;
> > > > > -		} else {
> > > > > -			if (set)
> > > > > -				*id >>= 24;
> > > > > -			else
> > > > > -				*id <<= 24;
> > > > > -		}
> > > > > +		if (!vcpu->kvm->arch.x2apic_format && set)
> > > > > +			*id >>= 24;
> > > > > +
> > > > > +		if (*id != vcpu->vcpu_id)
> > > > > +			return -EINVAL;
> > > > 
> > > > This breaks backwards compability, userspace will start failing where it previously
> > > > succeeded.  It doesn't even require a malicious userspace, e.g. if userspace creates
> > > > a vCPU with a vcpu_id > 255 vCPUs, the shift will truncate and cause failure.  It'll
> > > > obviously do weird things, but this code is 5.5 years old, I don't think it's worth
> > > > trying to close a loophole that doesn't harm KVM.
> > > > 
> > > > If we provide a way for userspace to opt into disallowiong APICID != vcpu_id, we
> > > > can address this further upstream, e.g. reject vcpu_id > 255 without x2apic_format.
> > > 
> > > This check is only when CPU is in x2apic mode. In this mode the X86 specs demands that
> > > apic_id == vcpu_id.
> > 
> > AMD's APM explicitly allows changing the x2APIC ID by writing the xAPIC ID and then
> > switching to x2APIC mode.
> This patch was supposed to go together with patch that fully forbids changing apic id.
> 
> > > When old API is used, APIC IDs are encoded in xapic format, even when vCPU is in x2apic
> > > mode, meaning that apic id can't be more  that 255 even in x2apic mode.
> > 
> > Even on Intel CPUs, if userspace creates a vCPU with vcpu_id = 256, then the xAPIC ID
> > will be (256 << 24) == 0.  If userspace does get+set, then KVM will see 0 != 256 and
> > reject the set with return -EINVAL.
> 
> First of all, I am saying again - with old API (!x2apic_format) - the APIC ID was limited
> to 8 bits - literaly only bits 24-31 of the value was used.
> 
> With old API - literaly, the APIC_ID register was supposed to have x2apic format
> regardless of if vCPU is in x2apic mode or not, and the above code cares to translate
> it to the real apic id register from and to.
> 
> Thus there is really no point of letting old API change apic id for x2apic mode,
> it is literaly a loophole that should be plugged.

Since I probably didn't explain myself correctly let me try to explain myself again:
 
 
First of all let me explain the x86 spec:
 
APIC id is number, usually relatively small number - its initial value is taken from
vcpu_id, when vCPU is created.
Initial APIC id is exposed in CPUID, as well.
 
When APIC is in xapic mode, then mmio apic_base + 0x20, contains apic id in bits 24-31.
plus apic id 0xff is reserved for broadcast, thus max of 255 cpus.
 
When APIC is in x2apic mode, mmio access is disabled, and instead MSR 0x802 contains
the local apic id, and it is the whole 32 bit number.
 
 
We have 2 ioctls, KVM_GET_LAPIC/KVM_SET_LAPIC which set/get local apic state.
They use array of apic registers which is basically snapshot of xapic mmio, 
and it is not defined fully if these are x2apic or xapic registers.
 
Because of this, initially the APIC_ID register in this snapshot would
contain apic id in bits 24-31.
 
To make it possible to support apic id > 256, new KVM cap was added KVM_CAP_X2APIC_API
It has nothing to do with x2apic strictly speaking and can be used regardless of
x2apic mode.
 
All it does was to make the value at 0x20 offset in that mmio state dump be full 32 bit
value of the apic id.
 
 
When APIC state is loading while APIC is in *x2apic* mode it does enforce that
value in this 0x20 offset is initial apic id if KVM_CAP_X2APIC_API.
 
I think that it is fair to also enforce this when KVM_CAP_X2APIC_API is not used,
especially if we make apic id read-only.
 
I hope that this explains my point better, sorry if I was not able
to explain it well before.
 
Best regards,
	Maxim Levitsky


> 
> Best regards,
> 	Maxim Levitsky
> 
> > And as above, userspace that hasn't opted into the x2apic_format could also legitimately
> > change the x2APIC ID.
> > 
> > I 100% agree this is a mess and probably can't work without major shenanigans, but on
> > the other hand this isn't harming anything, so why risk breaking userspace, even if the
> > risk is infinitesimally small?
> > 
> > I'm all for locking down the APIC ID with a userspace opt-in control, I just don't
> > think it's worth trying to retroactively plug the various holes in KVM.
> >
Sean Christopherson March 3, 2022, 4:51 p.m. UTC | #6
On Wed, Mar 02, 2022, Maxim Levitsky wrote:
> When APIC state is loading while APIC is in *x2apic* mode it does enforce that
> value in this 0x20 offset is initial apic id if KVM_CAP_X2APIC_API.
>  
> I think that it is fair to also enforce this when KVM_CAP_X2APIC_API is not used,
> especially if we make apic id read-only.

I don't disagree in principle.  But, (a) this loophole as existing for nearly 6
years, (b) closing the loophole could break userspace, (c) false positive are
possible due to truncation, and (d) KVM gains nothing meaningful by closing the
loophole.

(d) changes when we add a knob to make xAPIC ID read-only, but we can simply
require userspace to enable KVM_CAP_X2APIC_API (or force it).  That approach
avoids (c) by eliminating truncation, and avoids (b) by virtue of being opt-in.
Maxim Levitsky March 3, 2022, 6:15 p.m. UTC | #7
On Thu, 2022-03-03 at 16:51 +0000, Sean Christopherson wrote:
> On Wed, Mar 02, 2022, Maxim Levitsky wrote:
> > When APIC state is loading while APIC is in *x2apic* mode it does enforce that
> > value in this 0x20 offset is initial apic id if KVM_CAP_X2APIC_API.
> >  
> > I think that it is fair to also enforce this when KVM_CAP_X2APIC_API is not used,
> > especially if we make apic id read-only.
> 
> I don't disagree in principle.  But, (a) this loophole as existing for nearly 6
> years, (b) closing the loophole could break userspace, (c) false positive are
> possible due to truncation, and (d) KVM gains nothing meaningful by closing the
> loophole.
> 
> (d) changes when we add a knob to make xAPIC ID read-only, but we can simply
> require userspace to enable KVM_CAP_X2APIC_API (or force it).  That approach
> avoids (c) by eliminating truncation, and avoids (b) by virtue of being opt-in.
> 

(a) - doesn't matter.

(b) - if userspace wants to have non default apic id with x2apic mode,
      which (*)can't even really be set from the guest - this is ridiculous.
 
      (*) Yes I know that in *theory* user can change apic id in xapic mode
      and then switch to x2apic mode - but I really doubt that KVM
      would even honor this - there are already places which assume
      that this is not the case. In fact it would be nice to audit KVM
      on what happens when userspace does this, there might be a nice
      CVE somewhere....
 
(c) - without KVM_CAP_X2APIC_API, literally just call to KVM_GET_LAPIC/KVM_SET_LAPIC
will truncate x2apic id if > 255 regardless of my patch - literally this cap
was added to avoid this.
What we should do is to avoid creating cpu with vcpu_id > 256 when this cap is not set…
 
(d) - doesn't matter - again we are talking about x2apic mode in which apic id is read only.
 

Don’t get me wrong - I understand your concerns about this, but I hope that you
also understand mine - I still think that you just don't understand me.

Best regards,
	Maxim Levitsky
Sean Christopherson March 3, 2022, 7:38 p.m. UTC | #8
On Thu, Mar 03, 2022, Maxim Levitsky wrote:
> On Thu, 2022-03-03 at 16:51 +0000, Sean Christopherson wrote:
> > On Wed, Mar 02, 2022, Maxim Levitsky wrote:
> > > When APIC state is loading while APIC is in *x2apic* mode it does enforce that
> > > value in this 0x20 offset is initial apic id if KVM_CAP_X2APIC_API.
> > >  
> > > I think that it is fair to also enforce this when KVM_CAP_X2APIC_API is not used,
> > > especially if we make apic id read-only.
> > 
> > I don't disagree in principle.  But, (a) this loophole as existing for nearly 6
> > years, (b) closing the loophole could break userspace, (c) false positive are
> > possible due to truncation, and (d) KVM gains nothing meaningful by closing the
> > loophole.
> > 
> > (d) changes when we add a knob to make xAPIC ID read-only, but we can simply
> > require userspace to enable KVM_CAP_X2APIC_API (or force it).  That approach
> > avoids (c) by eliminating truncation, and avoids (b) by virtue of being opt-in.
> > 
> 
> (a) - doesn't matter.

Yes, it absolutely matters.  If KVM_CAP_X2APIC_API was added in the 5.17 cycle
then I would not be objecting because there is zero chance of breaking userspace.

> (b) - if userspace wants to have non default apic id with x2apic mode,
>       which (*)can't even really be set from the guest - this is ridiculous.
>  
>       (*) Yes I know that in *theory* user can change apic id in xapic mode
>       and then switch to x2apic mode - but I really doubt that KVM
>       would even honor this - there are already places which assume
>       that this is not the case. In fact it would be nice to audit KVM
>       on what happens when userspace does this, there might be a nice
>       CVE somewhere....
>  
> (c) - without KVM_CAP_X2APIC_API, literally just call to KVM_GET_LAPIC/KVM_SET_LAPIC
> will truncate x2apic id if > 255 regardless of my patch - literally this cap
> was added to avoid this.
> What we should do is to avoid creating cpu with vcpu_id > 256 when this cap is not set…

Yes, but _rejecting_ that behavior is a change in KVM's ABI.  That's why (a) matters.

> (d) - doesn't matter - again we are talking about x2apic mode in which apic id is read only.

It does matter that changes that potentially break userspace provide value to KVM.
Not theoretical, make us feel warm and fuzzy value, but actual value to KVM or
userspace.  KVM is no worse off for this loophole.  For userspace, at best this
will break a flawed but functional setup.

Concretely, the below example of using x2APIC with an ID > 255 works because KVM
calculates its APIC maps using what KVM thinks is the x2APIC ID. 

With your proposed change, KVM_SET_LAPIC will fail and we've broken a functional,
if sketchy, setup.  Is there likely to be such a real-world setup that doesn't
barf on the inconsistent x2APIC ID?  Probably not, but I don't see any reason to
find out.

==== Test Assertion Failure ====
  lib/kvm_util.c:1953: ret == 0
  pid=837 tid=837 errno=22 - Invalid argument
     1	0x0000000000403c0e: vcpu_ioctl at kvm_util.c:1952
     2	0x0000000000401548: main at smm_test.c:151
     3	0x00007ff76518ebf6: ?? ??:0
     4	0x0000000000401829: _start at ??:?
  vcpu ioctl 1140895375 failed, rc: -1 errno: 22 (Invalid argument)


diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c
index a626d40fdb48..cce44d99c919 100644
--- a/tools/testing/selftests/kvm/x86_64/smm_test.c
+++ b/tools/testing/selftests/kvm/x86_64/smm_test.c
@@ -19,7 +19,7 @@
 #include "vmx.h"
 #include "svm_util.h"

-#define VCPU_ID              1
+#define VCPU_ID              256

 #define PAGE_SIZE  4096

@@ -56,7 +56,7 @@ static inline void sync_with_host(uint64_t phase)
 static void self_smi(void)
 {
        x2apic_write_reg(APIC_ICR,
-                        APIC_DEST_SELF | APIC_INT_ASSERT | APIC_DM_SMI);
+                        ((uint64_t)VCPU_ID << 32) | APIC_INT_ASSERT | APIC_DM_SMI);
 }

 static void l2_guest_code(void)
@@ -72,14 +72,10 @@ static void guest_code(void *arg)
 {
        #define L2_GUEST_STACK_SIZE 64
        unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-       uint64_t apicbase = rdmsr(MSR_IA32_APICBASE);
        struct svm_test_data *svm = arg;
        struct vmx_pages *vmx_pages = arg;

        sync_with_host(1);
-
-       wrmsr(MSR_IA32_APICBASE, apicbase | X2APIC_ENABLE);
-
        sync_with_host(2);

        self_smi();
@@ -139,12 +135,21 @@ int main(int argc, char *argv[])
        struct kvm_run *run;
        struct kvm_x86_state *state;
        int stage, stage_reported;
+       struct kvm_lapic_state lapic;

        /* Create VM */
        vm = vm_create_default(VCPU_ID, 0, guest_code);

        run = vcpu_state(vm, VCPU_ID);

+       vcpu_set_msr(vm, VCPU_ID, MSR_IA32_APICBASE,
+                    vcpu_get_msr(vm, VCPU_ID, MSR_IA32_APICBASE) | X2APIC_ENABLE);
+
+       vcpu_ioctl(vm, VCPU_ID, KVM_GET_LAPIC, &lapic);
+       TEST_ASSERT(*((u32 *)&lapic.regs[0x20]) == 0,
+                   "x2APIC ID == %u", *((u32 *)&lapic.regs[0x20]));
+       vcpu_ioctl(vm, VCPU_ID, KVM_SET_LAPIC, &lapic);
+
        vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, SMRAM_GPA,
                                    SMRAM_MEMSLOT, SMRAM_PAGES, 0);
        TEST_ASSERT(vm_phy_pages_alloc(vm, SMRAM_PAGES, SMRAM_GPA, SMRAM_MEMSLOT)
Sean Christopherson March 3, 2022, 7:49 p.m. UTC | #9
On Thu, Mar 03, 2022, Sean Christopherson wrote:
> With your proposed change, KVM_SET_LAPIC will fail and we've broken a functional,
> if sketchy, setup.  Is there likely to be such a real-world setup that doesn't
> barf on the inconsistent x2APIC ID?  Probably not, but I don't see any reason to
> find out.

I take back the "probably not", this isn't even all that contrived.  Prior to the
"migration", the guest will see a consistent x2APIC ID.  It's not hard to imagine
a guest that snapshots the ID and never re-reads the value from "hardware".
Maxim Levitsky March 4, 2022, 10:54 a.m. UTC | #10
On Thu, 2022-03-03 at 19:49 +0000, Sean Christopherson wrote:
> On Thu, Mar 03, 2022, Sean Christopherson wrote:
> > With your proposed change, KVM_SET_LAPIC will fail and we've broken a functional,
> > if sketchy, setup.  Is there likely to be such a real-world setup that doesn't
> > barf on the inconsistent x2APIC ID?  Probably not, but I don't see any reason to
> > find out.
> 
> I take back the "probably not", this isn't even all that contrived.  Prior to the
> "migration", the guest will see a consistent x2APIC ID.  It's not hard to imagine
> a guest that snapshots the ID and never re-reads the value from "hardware".
> 
Ok. you win. I will not argue about this.

Best regards,
	Maxim Levitsky
diff mbox series

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 80a2020c4db40..8d35f56c64020 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2618,15 +2618,14 @@  static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
 		u64 icr;
 
-		if (vcpu->kvm->arch.x2apic_format) {
-			if (*id != vcpu->vcpu_id)
-				return -EINVAL;
-		} else {
-			if (set)
-				*id >>= 24;
-			else
-				*id <<= 24;
-		}
+		if (!vcpu->kvm->arch.x2apic_format && set)
+			*id >>= 24;
+
+		if (*id != vcpu->vcpu_id)
+			return -EINVAL;
+
+		if (!vcpu->kvm->arch.x2apic_format && !set)
+			*id <<= 24;
 
 		/*
 		 * In x2APIC mode, the LDR is fixed and based on the id.  And