diff mbox series

[1/3] KVM: x86: Fix out-of-bounds access in kvm_recalculate_phys_map()

Message ID 20230525183347.2562472-2-mhal@rbox.co (mailing list archive)
State New, archived
Headers show
Series Out-of-bounds access in kvm_recalculate_phys_map() | expand

Commit Message

Michal Luczaj May 25, 2023, 6:33 p.m. UTC
Handle the case of vCPU addition and/or APIC enabling during the APIC map
recalculations. Check the sanity of x2APIC ID in !x2apic_format &&
apic_x2apic_mode() case.

kvm_recalculate_apic_map() creates the APIC map iterating over the list of
vCPUs twice. First to find the max APIC ID and allocate a max-sized buffer,
then again, calling kvm_recalculate_phys_map() for each vCPU. This opens a
race window: value of max APIC ID can increase _after_ the buffer was
allocated.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 arch/x86/kvm/lapic.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Sean Christopherson May 26, 2023, 12:07 a.m. UTC | #1
On Thu, May 25, 2023, Michal Luczaj wrote:
> Handle the case of vCPU addition and/or APIC enabling during the APIC map
> recalculations. Check the sanity of x2APIC ID in !x2apic_format &&
> apic_x2apic_mode() case.
> 
> kvm_recalculate_apic_map() creates the APIC map iterating over the list of
> vCPUs twice. First to find the max APIC ID and allocate a max-sized buffer,
> then again, calling kvm_recalculate_phys_map() for each vCPU. This opens a
> race window: value of max APIC ID can increase _after_ the buffer was
> allocated.
> 
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
>  arch/x86/kvm/lapic.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index e542cf285b51..39b9a318d04c 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -265,10 +265,14 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
>  		 * mapped, i.e. is aliased to multiple vCPUs.  The optimized
>  		 * map requires a strict 1:1 mapping between IDs and vCPUs.
>  		 */
> -		if (apic_x2apic_mode(apic))
> +		if (apic_x2apic_mode(apic)) {
> +			if (x2apic_id > new->max_apic_id)
> +				return -EINVAL;

Hmm, disabling the optimized map just because userspace created a new vCPU is
unfortunate and unnecessary.  Rather than return -EINVAL and only perform the
check when x2APIC is enabled, what if we instead do the check immediately and
return -E2BIG?  Then the caller can retry with a bigger array size.  Preemption
is enabled and retries are bounded by the number of possible vCPUs, so I don't
see any obvious issues with retrying.

And I vote to also add a sanity check on xapic_id, if only to provide documentation
as to why it can't overflow.

I think hoisting the checks up would also obviate the need for cleanup (patch 2),
which I agree isn't obviously better.

E.g. this?  Compile tested only.  I'll test more tomorrow unless you beat me to
it.  Thanks for the fun bugs, as always :-)

---
 arch/x86/kvm/lapic.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e542cf285b51..cd34b88c937a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -228,6 +228,12 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
 	u32 xapic_id = kvm_xapic_id(apic);
 	u32 physical_id;
 
+	if (WARN_ON_ONCE(xapic_id >= new->max_apic_id))
+		return -EINVAL;
+
+	if (x2apic_id >= new->max_apic_id)
+		return -E2BIG;
+
 	/*
 	 * Deliberately truncate the vCPU ID when detecting a mismatched APIC
 	 * ID to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a
@@ -253,8 +259,7 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
 	 */
 	if (vcpu->kvm->arch.x2apic_format) {
 		/* See also kvm_apic_match_physical_addr(). */
-		if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
-			x2apic_id <= new->max_apic_id)
+		if (apic_x2apic_mode(apic) || x2apic_id > 0xff)
 			new->phys_map[x2apic_id] = apic;
 
 		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
@@ -366,6 +371,7 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 	unsigned long i;
 	u32 max_id = 255; /* enough space for any xAPIC ID */
 	bool xapic_id_mismatch = false;
+	int r;
 
 	/* Read kvm->arch.apic_map_dirty before kvm->arch.apic_map.  */
 	if (atomic_read_acquire(&kvm->arch.apic_map_dirty) == CLEAN)
@@ -386,6 +392,7 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		return;
 	}
 
+retry:
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		if (kvm_apic_present(vcpu))
 			max_id = max(max_id, kvm_x2apic_id(vcpu->arch.apic));
@@ -404,9 +411,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		if (!kvm_apic_present(vcpu))
 			continue;
 
-		if (kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch)) {
+		r = kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch);
+		if (r) {
 			kvfree(new);
 			new = NULL;
+			if (r == -E2BIG)
+				goto retry;
+
 			goto out;
 		}
 

base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f
--
Michal Luczaj May 26, 2023, 10:52 a.m. UTC | #2
On 5/26/23 02:07, Sean Christopherson wrote:
> On Thu, May 25, 2023, Michal Luczaj wrote:
>> @@ -265,10 +265,14 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
>>  		 * mapped, i.e. is aliased to multiple vCPUs.  The optimized
>>  		 * map requires a strict 1:1 mapping between IDs and vCPUs.
>>  		 */
>> -		if (apic_x2apic_mode(apic))
>> +		if (apic_x2apic_mode(apic)) {
>> +			if (x2apic_id > new->max_apic_id)
>> +				return -EINVAL;
> 
> Hmm, disabling the optimized map just because userspace created a new vCPU is
> unfortunate and unnecessary.  Rather than return -EINVAL and only perform the
> check when x2APIC is enabled, what if we instead do the check immediately and
> return -E2BIG?  Then the caller can retry with a bigger array size.  Preemption
> is enabled and retries are bounded by the number of possible vCPUs, so I don't
> see any obvious issues with retrying.

Right, that makes perfect sense.

Just a note, it changes the logic a bit:

- x2apic_format: an overflowing x2apic_id won't be silently ignored.

- !x2apic_format: -E2BIG even for !apic_x2apic_mode() leads to an realloc
instead of "new->phys_map[xapic_id] = apic" right away.

> @@ -228,6 +228,12 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
>  	u32 xapic_id = kvm_xapic_id(apic);
>  	u32 physical_id;
>  
> +	if (WARN_ON_ONCE(xapic_id >= new->max_apic_id))
> +		return -EINVAL;

Shouldn't it be ">" instead of ">="?

That said, xapic_id > new->max_apic_id means something terrible has happened as
kvm_xapic_id() returns u8 and max_apic_id should never be less than 255. Does
this qualify for KVM_BUG_ON?

> +	if (x2apic_id >= new->max_apic_id)
> +		return -E2BIG;

Probably ">"?

> @@ -366,6 +371,7 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  	unsigned long i;
>  	u32 max_id = 255; /* enough space for any xAPIC ID */
>  	bool xapic_id_mismatch = false;
> +	int r;
>  
>  	/* Read kvm->arch.apic_map_dirty before kvm->arch.apic_map.  */
>  	if (atomic_read_acquire(&kvm->arch.apic_map_dirty) == CLEAN)
> @@ -386,6 +392,7 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  		return;
>  	}
>  
> +retry:
>  	kvm_for_each_vcpu(i, vcpu, kvm)
>  		if (kvm_apic_present(vcpu))
>  			max_id = max(max_id, kvm_x2apic_id(vcpu->arch.apic));
> @@ -404,9 +411,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>  		if (!kvm_apic_present(vcpu))
>  			continue;
>  
> -		if (kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch)) {
> +		r = kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch);
> +		if (r) {
>  			kvfree(new);
>  			new = NULL;
> +			if (r == -E2BIG)
> +				goto retry;
> +
>  			goto out;
>  		}

Maybe it's not important, but what about moving xapic_id_mismatch
(re)initialization after "retry:"?
Sean Christopherson May 26, 2023, 4:17 p.m. UTC | #3
On Fri, May 26, 2023, Michal Luczaj wrote:
> On 5/26/23 02:07, Sean Christopherson wrote:
> > On Thu, May 25, 2023, Michal Luczaj wrote:
> >> @@ -265,10 +265,14 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
> >>  		 * mapped, i.e. is aliased to multiple vCPUs.  The optimized
> >>  		 * map requires a strict 1:1 mapping between IDs and vCPUs.
> >>  		 */
> >> -		if (apic_x2apic_mode(apic))
> >> +		if (apic_x2apic_mode(apic)) {
> >> +			if (x2apic_id > new->max_apic_id)
> >> +				return -EINVAL;
> > 
> > Hmm, disabling the optimized map just because userspace created a new vCPU is
> > unfortunate and unnecessary.  Rather than return -EINVAL and only perform the
> > check when x2APIC is enabled, what if we instead do the check immediately and
> > return -E2BIG?  Then the caller can retry with a bigger array size.  Preemption
> > is enabled and retries are bounded by the number of possible vCPUs, so I don't
> > see any obvious issues with retrying.
> 
> Right, that makes perfect sense.
> 
> Just a note, it changes the logic a bit:
> 
> - x2apic_format: an overflowing x2apic_id won't be silently ignored.

Nit, I wouldn't describe the current behavior as silently ignored.  KVM doesn't
ignore the case, KVM instead disables the optimized map.

> - !x2apic_format: -E2BIG even for !apic_x2apic_mode() leads to an realloc
> instead of "new->phys_map[xapic_id] = apic" right away.
> 
> > @@ -228,6 +228,12 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
> >  	u32 xapic_id = kvm_xapic_id(apic);
> >  	u32 physical_id;
> >  
> > +	if (WARN_ON_ONCE(xapic_id >= new->max_apic_id))
> > +		return -EINVAL;
>
> Shouldn't it be ">" instead of ">="?

/facepalm

Yes, I was reading it as the number of IDs, not the max.

> That said, xapic_id > new->max_apic_id means something terrible has happened as
> kvm_xapic_id() returns u8 and max_apic_id should never be less than 255. Does
> this qualify for KVM_BUG_ON?

I don't think so?  The intent of the WARN is mostly to document that KVM always
allocates enough space for xAPIC IDs, and to guard against that changing in the
future.  In the latter case, there's no need to kill the VM despite there being
a KVM bug since running with the optimized map disabled is functionally ok.

If the WARN fires because of host data corruption, then so be it.

> > +	if (x2apic_id >= new->max_apic_id)
> > +		return -E2BIG;
> 
> Probably ">"?

Ya.

> > @@ -366,6 +371,7 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> >  	unsigned long i;
> >  	u32 max_id = 255; /* enough space for any xAPIC ID */
> >  	bool xapic_id_mismatch = false;
> > +	int r;
> >  
> >  	/* Read kvm->arch.apic_map_dirty before kvm->arch.apic_map.  */
> >  	if (atomic_read_acquire(&kvm->arch.apic_map_dirty) == CLEAN)
> > @@ -386,6 +392,7 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> >  		return;
> >  	}
> >  
> > +retry:
> >  	kvm_for_each_vcpu(i, vcpu, kvm)
> >  		if (kvm_apic_present(vcpu))
> >  			max_id = max(max_id, kvm_x2apic_id(vcpu->arch.apic));
> > @@ -404,9 +411,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> >  		if (!kvm_apic_present(vcpu))
> >  			continue;
> >  
> > -		if (kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch)) {
> > +		r = kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch);
> > +		if (r) {
> >  			kvfree(new);
> >  			new = NULL;
> > +			if (r == -E2BIG)
> > +				goto retry;
> > +
> >  			goto out;
> >  		}
> 
> Maybe it's not important, but what about moving xapic_id_mismatch
> (re)initialization after "retry:"?

Oof, good catch.  I think it makes sense to move max_id (re)initialization too,
even though I can't imagine it would matter in practice.
Michal Luczaj May 26, 2023, 5:17 p.m. UTC | #4
On 5/26/23 18:17, Sean Christopherson wrote:
> On Fri, May 26, 2023, Michal Luczaj wrote:
>> On 5/26/23 02:07, Sean Christopherson wrote:
>>> On Thu, May 25, 2023, Michal Luczaj wrote:
>>>> @@ -265,10 +265,14 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
>>>>  		 * mapped, i.e. is aliased to multiple vCPUs.  The optimized
>>>>  		 * map requires a strict 1:1 mapping between IDs and vCPUs.
>>>>  		 */
>>>> -		if (apic_x2apic_mode(apic))
>>>> +		if (apic_x2apic_mode(apic)) {
>>>> +			if (x2apic_id > new->max_apic_id)
>>>> +				return -EINVAL;
>>>
>>> Hmm, disabling the optimized map just because userspace created a new vCPU is
>>> unfortunate and unnecessary.  Rather than return -EINVAL and only perform the
>>> check when x2APIC is enabled, what if we instead do the check immediately and
>>> return -E2BIG?  Then the caller can retry with a bigger array size.  Preemption
>>> is enabled and retries are bounded by the number of possible vCPUs, so I don't
>>> see any obvious issues with retrying.
>>
>> Right, that makes perfect sense.
>>
>> Just a note, it changes the logic a bit:
>>
>> - x2apic_format: an overflowing x2apic_id won't be silently ignored.
> 
> Nit, I wouldn't describe the current behavior as silently ignored.  KVM doesn't
> ignore the case, KVM instead disables the optimized map.

I may be misusing "silently ignored", but currently if (x2apic_format &&
apic_x2apic_mode && x2apic_id > new->max_apic_id) new->phys_map[x2apic_id]
remains unchanged, then kvm_recalculate_phys_map() returns 0 (not -EINVAL).
I.e. this does not result in rcu_assign_pointer(kvm->arch.apic_map, NULL).

>> That said, xapic_id > new->max_apic_id means something terrible has happened as
>> kvm_xapic_id() returns u8 and max_apic_id should never be less than 255. Does
>> this qualify for KVM_BUG_ON?
> 
> I don't think so?  The intent of the WARN is mostly to document that KVM always
> allocates enough space for xAPIC IDs, and to guard against that changing in the
> future.  In the latter case, there's no need to kill the VM despite there being
> a KVM bug since running with the optimized map disabled is functionally ok.
> 
> If the WARN fires because of host data corruption, then so be it.

Ahh, OK, I get it.

>> Maybe it's not important, but what about moving xapic_id_mismatch
>> (re)initialization after "retry:"?
> 
> Oof, good catch.  I think it makes sense to move max_id (re)initialization too,
> even though I can't imagine it would matter in practice.

Right, I forgot that max APIC ID can decrease along the way.
Sean Christopherson May 26, 2023, 6:11 p.m. UTC | #5
On Fri, May 26, 2023, Michal Luczaj wrote:
> On 5/26/23 18:17, Sean Christopherson wrote:
> > On Fri, May 26, 2023, Michal Luczaj wrote:
> >> On 5/26/23 02:07, Sean Christopherson wrote:
> >>> On Thu, May 25, 2023, Michal Luczaj wrote:
> >>>> @@ -265,10 +265,14 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
> >>>>  		 * mapped, i.e. is aliased to multiple vCPUs.  The optimized
> >>>>  		 * map requires a strict 1:1 mapping between IDs and vCPUs.
> >>>>  		 */
> >>>> -		if (apic_x2apic_mode(apic))
> >>>> +		if (apic_x2apic_mode(apic)) {
> >>>> +			if (x2apic_id > new->max_apic_id)
> >>>> +				return -EINVAL;
> >>>
> >>> Hmm, disabling the optimized map just because userspace created a new vCPU is
> >>> unfortunate and unnecessary.  Rather than return -EINVAL and only perform the
> >>> check when x2APIC is enabled, what if we instead do the check immediately and
> >>> return -E2BIG?  Then the caller can retry with a bigger array size.  Preemption
> >>> is enabled and retries are bounded by the number of possible vCPUs, so I don't
> >>> see any obvious issues with retrying.
> >>
> >> Right, that makes perfect sense.
> >>
> >> Just a note, it changes the logic a bit:
> >>
> >> - x2apic_format: an overflowing x2apic_id won't be silently ignored.
> > 
> > Nit, I wouldn't describe the current behavior as silently ignored.  KVM doesn't
> > ignore the case, KVM instead disables the optimized map.
> 
> I may be misusing "silently ignored",

You're not.

> but currently if (x2apic_format && apic_x2apic_mode && x2apic_id >
> new->max_apic_id) new->phys_map[x2apic_id] remains unchanged, then
> kvm_recalculate_phys_map() returns 0 (not -EINVAL).  I.e. this does not
> result in rcu_assign_pointer(kvm->arch.apic_map, NULL).

My bad.  I was thinking of the -EINVAL from your patch.

Hmm, thinking more about the "silently ignored" case, it's only temporarily
ignored.  If a x2APIC ID is out-of-bounds, then there had to have been a write
to a vCPU's x2APIC ID between the two instances of kvm_for_each_vcpu(), and that
means that whatever changed the ID must also mark apic_map_dirty DIRTY and call
kvm_recalculate_apic_map().  I.e. another recalc will soon occur and cleanup the
mess.

There's still value in retrying, but it's not as much as an optimization as it
first appears.  That means the bug fix can be a separate patch from the retry.
Oh, and the retry can also redo the DIRTY=>IN_PROGRESS so that whatever modified
a vCPU's x2APIC ID doesn't perform an unnecessary recalculation.

E.g. this (plus comments) for stable@

---
 arch/x86/kvm/lapic.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e542cf285b51..7a83df7f45c5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -228,6 +228,12 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
 	u32 xapic_id = kvm_xapic_id(apic);
 	u32 physical_id;
 
+	if (WARN_ON_ONCE(xapic_id >= new->max_apic_id))
+		return -EINVAL;
+
+	if (x2apic_id >= new->max_apic_id)
+		return -E2BIG;
+
 	/*
 	 * Deliberately truncate the vCPU ID when detecting a mismatched APIC
 	 * ID to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a
@@ -253,8 +259,7 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
 	 */
 	if (vcpu->kvm->arch.x2apic_format) {
 		/* See also kvm_apic_match_physical_addr(). */
-		if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
-			x2apic_id <= new->max_apic_id)
+		if (apic_x2apic_mode(apic) || x2apic_id > 0xff)
 			new->phys_map[x2apic_id] = apic;
 
 		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])

base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f
Sean Christopherson May 26, 2023, 10:27 p.m. UTC | #6
On Fri, May 26, 2023, Michal Luczaj wrote:
> On 5/26/23 18:17, Sean Christopherson wrote:
> > On Fri, May 26, 2023, Michal Luczaj wrote:
> >> Maybe it's not important, but what about moving xapic_id_mismatch
> >> (re)initialization after "retry:"?
> > 
> > Oof, good catch.  I think it makes sense to move max_id (re)initialization too,
> > even though I can't imagine it would matter in practice.
> 
> Right, I forgot that max APIC ID can decrease along the way.

Actually, we don't want to reset max_id.  That would allow userspace or the guest
to put KVM into an infinite loop, e.g. by toggling the APIC of the vCPU with the
highest x2APIC ID between enabled and disabled.  The downside of not shrinking the
size is quite negligible.
diff mbox series

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e542cf285b51..39b9a318d04c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -265,10 +265,14 @@  static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
 		 * mapped, i.e. is aliased to multiple vCPUs.  The optimized
 		 * map requires a strict 1:1 mapping between IDs and vCPUs.
 		 */
-		if (apic_x2apic_mode(apic))
+		if (apic_x2apic_mode(apic)) {
+			if (x2apic_id > new->max_apic_id)
+				return -EINVAL;
+
 			physical_id = x2apic_id;
-		else
+		} else {
 			physical_id = xapic_id;
+		}
 
 		if (new->phys_map[physical_id])
 			return -EINVAL;