diff mbox

KVM: nVMX: Don't return error on nested bitmap memory allocation failure

Message ID jpgmw1srps3.fsf@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bandan Das April 28, 2015, 7:55 p.m. UTC
If get_free_page() fails for nested bitmap area, it's evident that
we are gonna get screwed anyway but returning failure because we failed
allocating memory for a nested structure seems like an unnecessary big
hammer. Also, save the call for later; after we are done with other
non-nested allocations.

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/kvm/vmx.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

Comments

Jan Kiszka April 29, 2015, 7:10 a.m. UTC | #1
Am 2015-04-28 um 21:55 schrieb Bandan Das:
> 
> If get_free_page() fails for nested bitmap area, it's evident that
> we are gonna get screwed anyway but returning failure because we failed
> allocating memory for a nested structure seems like an unnecessary big
> hammer. Also, save the call for later; after we are done with other
> non-nested allocations.

Frankly, I prefer failures over automatic degradations. And, as you
noted, the whole system will probably explode anyway if allocation of a
single page already fails. So what does this buy us?

What could makes sense is making the allocation of the vmread/write
bitmap depend on enable_shadow_vmcs, and that again depend on nested.

Jan
Nadav Amit April 29, 2015, 7:27 a.m. UTC | #2
Bandan Das <bsd@redhat.com> wrote:

> 
> If get_free_page() fails for nested bitmap area, it's evident that
> we are gonna get screwed anyway but returning failure because we failed
> allocating memory for a nested structure seems like an unnecessary big
> hammer. Also, save the call for later; after we are done with other
> non-nested allocations.
> 
> Signed-off-by: Bandan Das <bsd@redhat.com>
> ---
> arch/x86/kvm/vmx.c | 29 +++++++++++++++--------------
> 1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f7b6168..200bc5c 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6039,20 +6039,22 @@ static __init int hardware_setup(void)
> 	if (!vmx_msr_bitmap_longmode_x2apic)
> 		goto out4;
> 
> -	if (nested) {
> -		vmx_msr_bitmap_nested =
> -			(unsigned long *)__get_free_page(GFP_KERNEL);
> -		if (!vmx_msr_bitmap_nested)
> -			goto out5;
> -	}
> -
> 	vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
> 	if (!vmx_vmread_bitmap)
> -		goto out6;
> +		goto out5;
> 
> 	vmx_vmwrite_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
> 	if (!vmx_vmwrite_bitmap)
> -		goto out7;
> +		goto out6;
> +
> +	if (nested) {
> +		vmx_msr_bitmap_nested =
> +			(unsigned long *)__get_free_page(GFP_KERNEL);
> +		if (!vmx_msr_bitmap_nested) {
> +			printk(KERN_WARNING
> +			       "vmx: Failed getting memory for nested bitmap\n");
> +			nested = 0;
> +	}
> 
> 	memset(vmx_vmread_bitmap, 0xff, PAGE_SIZE);
> 	memset(vmx_vmwrite_bitmap, 0xff, PAGE_SIZE);
> @@ -6073,7 +6075,7 @@ static __init int hardware_setup(void)
> 
> 	if (setup_vmcs_config(&vmcs_config) < 0) {
> 		r = -EIO;
> -		goto out8;
> +		goto out7;
> 	}
> 
> 	if (boot_cpu_has(X86_FEATURE_NX))
> @@ -6190,13 +6192,12 @@ static __init int hardware_setup(void)
> 
> 	return alloc_kvm_area();
> 
> -out8:
> -	free_page((unsigned long)vmx_vmwrite_bitmap);
> out7:
> -	free_page((unsigned long)vmx_vmread_bitmap);
> -out6:
> 	if (nested)
> 		free_page((unsigned long)vmx_msr_bitmap_nested);
> +	free_page((unsigned long)vmx_vmwrite_bitmap);
> +out6:
> +	free_page((unsigned long)vmx_vmread_bitmap);
> out5:
> 	free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
> out4:
> 
free_page appears to check whether the address is zero before it actually
frees the page. Perhaps it is better to leverage this behaviour to remove
all the outX and simplify the code.

Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini April 29, 2015, 8:17 a.m. UTC | #3
On 29/04/2015 09:27, Nadav Amit wrote:
> free_page appears to check whether the address is zero before it actually
> frees the page. Perhaps it is better to leverage this behaviour to remove
> all the outX and simplify the code.

Agreed.  Regarding this patch, I agree with Jan.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bandan Das April 29, 2015, 12:55 p.m. UTC | #4
Jan Kiszka <jan.kiszka@siemens.com> writes:

> Am 2015-04-28 um 21:55 schrieb Bandan Das:
>> 
>> If get_free_page() fails for nested bitmap area, it's evident that
>> we are gonna get screwed anyway but returning failure because we failed
>> allocating memory for a nested structure seems like an unnecessary big
>> hammer. Also, save the call for later; after we are done with other
>> non-nested allocations.
>
> Frankly, I prefer failures over automatic degradations. And, as you
> noted, the whole system will probably explode anyway if allocation of a
> single page already fails. So what does this buy us?

Yeah... I hear you. Ok, let me put it this way - Assume that we can
defer this allocation up until the point that the nested subsystem is
actually used i.e L1 tries running a guest and we try to allocate this
area. If get_free_page() failed in that case, would we still want to
kill L1 too ? I guess no.

Also, assume we had a printk in there - "Failed allocating memory for
nested bitmap", the novice user is going to get confused why he's
getting an error about nested virtualization (for the not so distant
future when nested is enabled by default :))

> What could makes sense is making the allocation of the vmread/write
> bitmap depend on enable_shadow_vmcs, and that again depend on nested.

Thanks for the suggestion. I will take a look at this one.

> Jan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka April 29, 2015, 1:05 p.m. UTC | #5
Am 2015-04-29 um 14:55 schrieb Bandan Das:
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> Am 2015-04-28 um 21:55 schrieb Bandan Das:
>>>
>>> If get_free_page() fails for nested bitmap area, it's evident that
>>> we are gonna get screwed anyway but returning failure because we failed
>>> allocating memory for a nested structure seems like an unnecessary big
>>> hammer. Also, save the call for later; after we are done with other
>>> non-nested allocations.
>>
>> Frankly, I prefer failures over automatic degradations. And, as you
>> noted, the whole system will probably explode anyway if allocation of a
>> single page already fails. So what does this buy us?
> 
> Yeah... I hear you. Ok, let me put it this way - Assume that we can
> defer this allocation up until the point that the nested subsystem is
> actually used i.e L1 tries running a guest and we try to allocate this
> area. If get_free_page() failed in that case, would we still want to
> kill L1 too ? I guess no.

We could block the hypervisor thread on the allocation, just like it
would block on faults for swapped out pages or new ones that have to be
reclaimed from the page cache first.

Jan
Paolo Bonzini April 29, 2015, 1:23 p.m. UTC | #6
On 29/04/2015 15:05, Jan Kiszka wrote:
> > Yeah... I hear you. Ok, let me put it this way - Assume that we can
> > defer this allocation up until the point that the nested subsystem is
> > actually used i.e L1 tries running a guest and we try to allocate this
> > area. If get_free_page() failed in that case, would we still want to
> > kill L1 too ? I guess no.
>
> We could block the hypervisor thread on the allocation, just like it
> would block on faults for swapped out pages or new ones that have to be
> reclaimed from the page cache first.

In that case we should avoid making the allocation GFP_ATOMIC to begin with.

If a GFP_KERNEL allocation failed, returning -ENOMEM from KVM_RUN (which
practically means killing the guest) would actually be a very real
possibility.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bandan Das April 29, 2015, 4:08 p.m. UTC | #7
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 29/04/2015 15:05, Jan Kiszka wrote:
>> > Yeah... I hear you. Ok, let me put it this way - Assume that we can
>> > defer this allocation up until the point that the nested subsystem is
>> > actually used i.e L1 tries running a guest and we try to allocate this
>> > area. If get_free_page() failed in that case, would we still want to
>> > kill L1 too ? I guess no.
>>
>> We could block the hypervisor thread on the allocation, just like it
>> would block on faults for swapped out pages or new ones that have to be
>> reclaimed from the page cache first.

So, block on a failure hoping that eventually it will succeed ?

> In that case we should avoid making the allocation GFP_ATOMIC to begin with.
>
> If a GFP_KERNEL allocation failed, returning -ENOMEM from KVM_RUN (which
> practically means killing the guest) would actually be a very real
> possibility.

Sorry Paolo, I missed your point. Isn't the allocation already GFP_KERNEL ?

> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini April 29, 2015, 4:39 p.m. UTC | #8
On 29/04/2015 18:08, Bandan Das wrote:
>>>> >> > Yeah... I hear you. Ok, let me put it this way - Assume that we can
>>>> >> > defer this allocation up until the point that the nested subsystem is
>>>> >> > actually used i.e L1 tries running a guest and we try to allocate this
>>>> >> > area. If get_free_page() failed in that case, would we still want to
>>>> >> > kill L1 too ? I guess no.
>>> >>
>>> >> We could block the hypervisor thread on the allocation, just like it
>>> >> would block on faults for swapped out pages or new ones that have to be
>>> >> reclaimed from the page cache first.
> So, block on a failure hoping that eventually it will succeed ?
> 
>> > In that case we should avoid making the allocation GFP_ATOMIC to begin with.
>> >
>> > If a GFP_KERNEL allocation failed, returning -ENOMEM from KVM_RUN (which
>> > practically means killing the guest) would actually be a very real
>> > possibility.
> Sorry Paolo, I missed your point. Isn't the allocation already GFP_KERNEL ?

I mean if it were done lazily as in your thought-experiment.  Then:

- a GFP_ATOMIC allocation would be bad

- a GFP_KERNEL allocation would block like Jan said; if it failed, I
would be okay with returning -ENOMEM to userspace, even if that in
practice means killing the guest.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f7b6168..200bc5c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6039,20 +6039,22 @@  static __init int hardware_setup(void)
 	if (!vmx_msr_bitmap_longmode_x2apic)
 		goto out4;
 
-	if (nested) {
-		vmx_msr_bitmap_nested =
-			(unsigned long *)__get_free_page(GFP_KERNEL);
-		if (!vmx_msr_bitmap_nested)
-			goto out5;
-	}
-
 	vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
 	if (!vmx_vmread_bitmap)
-		goto out6;
+		goto out5;
 
 	vmx_vmwrite_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
 	if (!vmx_vmwrite_bitmap)
-		goto out7;
+		goto out6;
+
+	if (nested) {
+		vmx_msr_bitmap_nested =
+			(unsigned long *)__get_free_page(GFP_KERNEL);
+		if (!vmx_msr_bitmap_nested) {
+			printk(KERN_WARNING
+			       "vmx: Failed getting memory for nested bitmap\n");
+			nested = 0;
+	}
 
 	memset(vmx_vmread_bitmap, 0xff, PAGE_SIZE);
 	memset(vmx_vmwrite_bitmap, 0xff, PAGE_SIZE);
@@ -6073,7 +6075,7 @@  static __init int hardware_setup(void)
 
 	if (setup_vmcs_config(&vmcs_config) < 0) {
 		r = -EIO;
-		goto out8;
+		goto out7;
 	}
 
 	if (boot_cpu_has(X86_FEATURE_NX))
@@ -6190,13 +6192,12 @@  static __init int hardware_setup(void)
 
 	return alloc_kvm_area();
 
-out8:
-	free_page((unsigned long)vmx_vmwrite_bitmap);
 out7:
-	free_page((unsigned long)vmx_vmread_bitmap);
-out6:
 	if (nested)
 		free_page((unsigned long)vmx_msr_bitmap_nested);
+	free_page((unsigned long)vmx_vmwrite_bitmap);
+out6:
+	free_page((unsigned long)vmx_vmread_bitmap);
 out5:
 	free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic);
 out4: