nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken

Message ID	54355344.5050301@siemens.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Message-ID: <54355344.5050301@siemens.com> Date: Wed, 08 Oct 2014 17:07:48 +0200 From: Jan Kiszka <jan.kiszka@siemens.com> User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666 MIME-Version: 1.0 To: Paolo Bonzini <pbonzini@redhat.com>, Wanpeng Li <wanpeng.li@linux.intel.com> CC: kvm <kvm@vger.kernel.org>, Bandan Das <bsd@redhat.com> Subject: Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken References: <5434F5F9.3030803@siemens.com> <20141008092539.GA16561@kernel> <5435092A.3090704@siemens.com> <54350FD4.10403@redhat.com> <543511FE.3060108@siemens.com> <54351336.4030005@redhat.com> In-Reply-To: <54351336.4030005@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk

Message ID

54355344.5050301@siemens.com (mailing list archive)

State

New, archived

Headers

Message-ID: <54355344.5050301@siemens.com>
Date: Wed, 08 Oct 2014 17:07:48 +0200
From: Jan Kiszka <jan.kiszka@siemens.com>
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de;
	rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1
	Thunderbird/2.0.0.12 Mnenhy/0.7.5.666
MIME-Version: 1.0
To: Paolo Bonzini <pbonzini@redhat.com>,
	Wanpeng Li <wanpeng.li@linux.intel.com>
CC: kvm <kvm@vger.kernel.org>, Bandan Das <bsd@redhat.com>
Subject: Re: nVMX: Shadowing of CPU_BASED_VM_EXEC_CONTROL broken
References: <5434F5F9.3030803@siemens.com> <20141008092539.GA16561@kernel>
	<5435092A.3090704@siemens.com> <54350FD4.10403@redhat.com>
	<543511FE.3060108@siemens.com> <54351336.4030005@redhat.com>
In-Reply-To: <54351336.4030005@redhat.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Sender: kvm-owner@vger.kernel.org
Precedence: bulk

Commit Message

Jan Kiszka Oct. 8, 2014, 3:07 p.m. UTC

On 2014-10-08 12:34, Paolo Bonzini wrote:
> Il 08/10/2014 12:29, Jan Kiszka ha scritto:
>>>> But it would write to the vmcs02, not to the shadow VMCS; the shadow
>>>> VMCS is active during copy_shadow_to_vmcs12/copy_vmcs12_to_shadow, and
>>>> at no other time.  It is not clear to me how the VIRTUAL_INTR_PENDING
>>>> bit ended up from the vmcs02 (where it is perfectly fine) to the vmcs12.
>> Well, but somehow that bit ends up in vmcs12, that's a fact. Also that
>> the proble disappears when shadowing is disabled. Need to think about
>> the path again. Maybe there is just a bug, not a conceptual issue.
> 
> Yeah, and at this point we cannot actually exclude a processor bug.  Can
> you check that the bit is not in the shadow VMCS just before vmrun, or
> just after enable_irq_window?
> 
> Having a kvm-unit-tests testcase could also be of some help.

As usual, this was a nasty race that involved some concurrent VCPUs and
proper host load, so hard to write unit tests...

No proper patch yet because there might be a smarter approach without
using the preempt_disable() hammer. But the point is that we temporarily
load a vmcs without updating loaded_vmcs->vmcs. Now, if some other VCPU
is scheduling in right in the middle of this, the wrong vmcs will be
flushed and then reloaded - e.g. a non-shadow vmcs with that interrupt
window flag set...

Patch is currently under heavy load testing here, but it looks very good
as the bug was quickly reproducible before I applied it.

Jan

Comments

Paolo Bonzini Oct. 8, 2014, 3:44 p.m. UTC | #1

Il 08/10/2014 17:07, Jan Kiszka ha scritto:
> As usual, this was a nasty race that involved some concurrent VCPUs and
> proper host load, so hard to write unit tests...
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 04fa1b8..d6bcaca 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6417,6 +6417,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>  	const unsigned long *fields = shadow_read_write_fields;
>  	const int num_fields = max_shadow_read_write_fields;
>  
> +	preempt_disable();
> +
>  	vmcs_load(shadow_vmcs);
>  
>  	for (i = 0; i < num_fields; i++) {
> @@ -6440,6 +6442,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>  
>  	vmcs_clear(shadow_vmcs);
>  	vmcs_load(vmx->loaded_vmcs->vmcs);
> +
> +	preempt_enable();
>  }
>  
>  static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
> @@ -6457,6 +6461,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>  	u64 field_value = 0;
>  	struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
>  
> +	preempt_disable();
> +
>  	vmcs_load(shadow_vmcs);
>  
>  	for (q = 0; q < ARRAY_SIZE(fields); q++) {
> @@ -6483,6 +6489,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>  
>  	vmcs_clear(shadow_vmcs);
>  	vmcs_load(vmx->loaded_vmcs->vmcs);
> +
> +	preempt_enable();
>  }
>  
> No proper patch yet because there might be a smarter approach without
> using the preempt_disable() hammer.

copy_vmcs12_to_shadow already runs with preemption disabled; for stable@
it's not that bad to do the same in copy_shadow_to_vmcs12.

For 3.18 it could be nice of course to use loaded_vmcs properly, but it
would also incur some overhead.

Paolo

> But the point is that we temporarily
> load a vmcs without updating loaded_vmcs->vmcs. Now, if some other VCPU
> is scheduling in right in the middle of this, the wrong vmcs will be
> flushed and then reloaded - e.g. a non-shadow vmcs with that interrupt
> window flag set...
> 
> Patch is currently under heavy load testing here, but it looks very good
> as the bug was quickly reproducible before I applied it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Kiszka Oct. 8, 2014, 4:07 p.m. UTC | #2

On 2014-10-08 17:44, Paolo Bonzini wrote:
> Il 08/10/2014 17:07, Jan Kiszka ha scritto:
>> As usual, this was a nasty race that involved some concurrent VCPUs and
>> proper host load, so hard to write unit tests...
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 04fa1b8..d6bcaca 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -6417,6 +6417,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>>  	const unsigned long *fields = shadow_read_write_fields;
>>  	const int num_fields = max_shadow_read_write_fields;
>>  
>> +	preempt_disable();
>> +
>>  	vmcs_load(shadow_vmcs);
>>  
>>  	for (i = 0; i < num_fields; i++) {
>> @@ -6440,6 +6442,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>>  
>>  	vmcs_clear(shadow_vmcs);
>>  	vmcs_load(vmx->loaded_vmcs->vmcs);
>> +
>> +	preempt_enable();
>>  }
>>  
>>  static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>> @@ -6457,6 +6461,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>>  	u64 field_value = 0;
>>  	struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
>>  
>> +	preempt_disable();
>> +
>>  	vmcs_load(shadow_vmcs);
>>  
>>  	for (q = 0; q < ARRAY_SIZE(fields); q++) {
>> @@ -6483,6 +6489,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>>  
>>  	vmcs_clear(shadow_vmcs);
>>  	vmcs_load(vmx->loaded_vmcs->vmcs);
>> +
>> +	preempt_enable();
>>  }
>>  
>> No proper patch yet because there might be a smarter approach without
>> using the preempt_disable() hammer.
> 
> copy_vmcs12_to_shadow already runs with preemption disabled; for stable@
> it's not that bad to do the same in copy_shadow_to_vmcs12.
> 
> For 3.18 it could be nice of course to use loaded_vmcs properly, but it
> would also incur some overhead.

If the other direction is already under preempt_disable, I'm not sure if
there is much to gain for this direction.

Anyway, fix sent.

Jan

Wanpeng Li Oct. 8, 2014, 11:34 p.m. UTC | #3

On Wed, Oct 08, 2014 at 05:07:48PM +0200, Jan Kiszka wrote:
>On 2014-10-08 12:34, Paolo Bonzini wrote:
>> Il 08/10/2014 12:29, Jan Kiszka ha scritto:
>>>>> But it would write to the vmcs02, not to the shadow VMCS; the shadow
>>>>> VMCS is active during copy_shadow_to_vmcs12/copy_vmcs12_to_shadow, and
>>>>> at no other time.  It is not clear to me how the VIRTUAL_INTR_PENDING
>>>>> bit ended up from the vmcs02 (where it is perfectly fine) to the vmcs12.
>>> Well, but somehow that bit ends up in vmcs12, that's a fact. Also that
>>> the proble disappears when shadowing is disabled. Need to think about
>>> the path again. Maybe there is just a bug, not a conceptual issue.
>> 
>> Yeah, and at this point we cannot actually exclude a processor bug.  Can
>> you check that the bit is not in the shadow VMCS just before vmrun, or
>> just after enable_irq_window?
>> 
>> Having a kvm-unit-tests testcase could also be of some help.
>
>As usual, this was a nasty race that involved some concurrent VCPUs and
>proper host load, so hard to write unit tests...
>
>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>index 04fa1b8..d6bcaca 100644
>--- a/arch/x86/kvm/vmx.c
>+++ b/arch/x86/kvm/vmx.c
>@@ -6417,6 +6417,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
> 	const unsigned long *fields = shadow_read_write_fields;
> 	const int num_fields = max_shadow_read_write_fields;
> 
>+	preempt_disable();
>+
> 	vmcs_load(shadow_vmcs);
> 
> 	for (i = 0; i < num_fields; i++) {
>@@ -6440,6 +6442,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
> 
> 	vmcs_clear(shadow_vmcs);
> 	vmcs_load(vmx->loaded_vmcs->vmcs);
>+
>+	preempt_enable();
> }
> 
> static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>@@ -6457,6 +6461,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
> 	u64 field_value = 0;
> 	struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
> 
>+	preempt_disable();
>+
> 	vmcs_load(shadow_vmcs);
> 
> 	for (q = 0; q < ARRAY_SIZE(fields); q++) {
>@@ -6483,6 +6489,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
> 
> 	vmcs_clear(shadow_vmcs);
> 	vmcs_load(vmx->loaded_vmcs->vmcs);
>+
>+	preempt_enable();
> }
> 
> /*
>
>No proper patch yet because there might be a smarter approach without
>using the preempt_disable() hammer. But the point is that we temporarily
>load a vmcs without updating loaded_vmcs->vmcs. Now, if some other VCPU
>is scheduling in right in the middle of this, the wrong vmcs will be
>flushed and then reloaded - e.g. a non-shadow vmcs with that interrupt
>window flag set...

If non-shadow vmcs and shadow vmcs can present in one system simultaneously? 

Regards,
Wanpeng Li 

>
>Patch is currently under heavy load testing here, but it looks very good
>as the bug was quickly reproducible before I applied it.
>
>Jan
>
>-- 
>Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>Corporate Competence Center Embedded Linux
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wanpeng Li Oct. 8, 2014, 11:58 p.m. UTC | #4

On Thu, Oct 09, 2014 at 07:34:47AM +0800, Wanpeng Li wrote:
>On Wed, Oct 08, 2014 at 05:07:48PM +0200, Jan Kiszka wrote:
>>On 2014-10-08 12:34, Paolo Bonzini wrote:
>>> Il 08/10/2014 12:29, Jan Kiszka ha scritto:
>>>>>> But it would write to the vmcs02, not to the shadow VMCS; the shadow
>>>>>> VMCS is active during copy_shadow_to_vmcs12/copy_vmcs12_to_shadow, and
>>>>>> at no other time.  It is not clear to me how the VIRTUAL_INTR_PENDING
>>>>>> bit ended up from the vmcs02 (where it is perfectly fine) to the vmcs12.
>>>> Well, but somehow that bit ends up in vmcs12, that's a fact. Also that
>>>> the proble disappears when shadowing is disabled. Need to think about
>>>> the path again. Maybe there is just a bug, not a conceptual issue.
>>> 
>>> Yeah, and at this point we cannot actually exclude a processor bug.  Can
>>> you check that the bit is not in the shadow VMCS just before vmrun, or
>>> just after enable_irq_window?
>>> 
>>> Having a kvm-unit-tests testcase could also be of some help.
>>
>>As usual, this was a nasty race that involved some concurrent VCPUs and
>>proper host load, so hard to write unit tests...
>>
>>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>index 04fa1b8..d6bcaca 100644
>>--- a/arch/x86/kvm/vmx.c
>>+++ b/arch/x86/kvm/vmx.c
>>@@ -6417,6 +6417,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>> 	const unsigned long *fields = shadow_read_write_fields;
>> 	const int num_fields = max_shadow_read_write_fields;
>> 
>>+	preempt_disable();
>>+
>> 	vmcs_load(shadow_vmcs);
>> 
>> 	for (i = 0; i < num_fields; i++) {
>>@@ -6440,6 +6442,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
>> 
>> 	vmcs_clear(shadow_vmcs);
>> 	vmcs_load(vmx->loaded_vmcs->vmcs);
>>+
>>+	preempt_enable();
>> }
>> 
>> static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>>@@ -6457,6 +6461,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>> 	u64 field_value = 0;
>> 	struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
>> 
>>+	preempt_disable();
>>+
>> 	vmcs_load(shadow_vmcs);
>> 
>> 	for (q = 0; q < ARRAY_SIZE(fields); q++) {
>>@@ -6483,6 +6489,8 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
>> 
>> 	vmcs_clear(shadow_vmcs);
>> 	vmcs_load(vmx->loaded_vmcs->vmcs);
>>+
>>+	preempt_enable();
>> }
>> 
>> /*
>>
>>No proper patch yet because there might be a smarter approach without
>>using the preempt_disable() hammer. But the point is that we temporarily
>>load a vmcs without updating loaded_vmcs->vmcs. Now, if some other VCPU
>>is scheduling in right in the middle of this, the wrong vmcs will be
>>flushed and then reloaded - e.g. a non-shadow vmcs with that interrupt
>>window flag set...
>
>If non-shadow vmcs and shadow vmcs can present in one system simultaneously? 

Ah, got it, you mean non-current-shadow vmcs.

Regards,
Wanpeng Li 

>
>Regards,
>Wanpeng Li 
>
>>
>>Patch is currently under heavy load testing here, but it looks very good
>>as the bug was quickly reproducible before I applied it.
>>
>>Jan
>>
>>-- 
>>Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>>Corporate Competence Center Embedded Linux
>>--
>>To unsubscribe from this list: send the line "unsubscribe kvm" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 04fa1b8..d6bcaca 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6417,6 +6417,8 @@  static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
 	const unsigned long *fields = shadow_read_write_fields;
 	const int num_fields = max_shadow_read_write_fields;
 
+	preempt_disable();
+
 	vmcs_load(shadow_vmcs);
 
 	for (i = 0; i < num_fields; i++) {
@@ -6440,6 +6442,8 @@  static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
 
 	vmcs_clear(shadow_vmcs);
 	vmcs_load(vmx->loaded_vmcs->vmcs);
+
+	preempt_enable();
 }
 
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
@@ -6457,6 +6461,8 @@  static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
 	u64 field_value = 0;
 	struct vmcs *shadow_vmcs = vmx->nested.current_shadow_vmcs;
 
+	preempt_disable();
+
 	vmcs_load(shadow_vmcs);
 
 	for (q = 0; q < ARRAY_SIZE(fields); q++) {
@@ -6483,6 +6489,8 @@  static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
 
 	vmcs_clear(shadow_vmcs);
 	vmcs_load(vmx->loaded_vmcs->vmcs);
+
+	preempt_enable();
 }
 
 /*