diff mbox

[v3,1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest

Message ID 1467863216-5521-1-git-send-email-wanpeng.li@hotmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li July 7, 2016, 3:46 a.m. UTC
From: Wanpeng Li <wanpeng.li@hotmail.com>

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] SMP
Call Trace:
 ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
 handle_preemption_timer+0xe/0x20 [kvm_intel]
 vmx_handle_exit+0x169/0x15a0 [kvm_intel]
 ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
 kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
 ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
 ? vcpu_load+0x1c/0x60 [kvm]
 ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
 kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
 do_vfs_ioctl+0x96/0x6a0
 ? __fget_light+0x2a/0x90
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x68/0x180
 entry_SYSCALL64_slow_path+0x25/0x25
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
 RSP <ffff8800b5263c48>
CR2: 0000000000000000
---[ end trace 9c70c48b1a2bc66e ]---

This can be reproduced readily by preemption timer enabled on L0 and disabled
on L1.

Preemption timer for nested VMX is emulated by hrtimer which is started on L2
entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
nested_vmx_exit_handled is always return true for preemption timer vmexit, then
the L1 preemption timer vmexit is captured and be treated as a L2 preemption
timer vmexit, incurr a nested vmexit dereference NULL pointer.

This patch fix it by depending on check_nested_events to capture L2 preemption
timer(emulated hrtimer) expire and nested vmexit.

Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v2 -> v3:
 * update patch subject
v1 -> v2:
 * fix typo in patch description

 arch/x86/kvm/vmx.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Haozhong Zhang July 7, 2016, 6:48 a.m. UTC | #1
On 07/07/16 11:46, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: [<          (null)>]           (null)
> PGD 0
> Oops: 0010 [#1] SMP
> Call Trace:
>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
>  handle_preemption_timer+0xe/0x20 [kvm_intel]
>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>  ? vcpu_load+0x1c/0x60 [kvm]
>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>  do_vfs_ioctl+0x96/0x6a0
>  ? __fget_light+0x2a/0x90
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x68/0x180
>  entry_SYSCALL64_slow_path+0x25/0x25
> Code:  Bad RIP value.
> RIP  [<          (null)>]           (null)
>  RSP <ffff8800b5263c48>
> CR2: 0000000000000000
> ---[ end trace 9c70c48b1a2bc66e ]---
> 
> This can be reproduced readily by preemption timer enabled on L0 and disabled
> on L1.
> 
> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
> entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
> nested_vmx_exit_handled is always return true for preemption timer vmexit, then
> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
> timer vmexit, incurr a nested vmexit dereference NULL pointer.
> 
> This patch fix it by depending on check_nested_events to capture L2 preemption
> timer(emulated hrtimer) expire and nested vmexit.
> 
> Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> Cc: Jan Kiszka <jan.kiszka@siemens.com>
> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> v2 -> v3:
>  * update patch subject
> v1 -> v2:
>  * fix typo in patch description
> 
>  arch/x86/kvm/vmx.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 85e2f0a..29c16a8 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
>  		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
>  	case EXIT_REASON_PCOMMIT:
>  		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
> +	case EXIT_REASON_PREEMPTION_TIMER:
> +		return false;

If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
will this one still be needed?

Haozhong


>  	default:
>  		return true;
>  	}
> -- 
> 1.9.1
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li July 7, 2016, 6:56 a.m. UTC | #2
2016-07-07 14:48 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> On 07/07/16 11:46, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> BUG: unable to handle kernel NULL pointer dereference at           (null)
>> IP: [<          (null)>]           (null)
>> PGD 0
>> Oops: 0010 [#1] SMP
>> Call Trace:
>>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
>>  handle_preemption_timer+0xe/0x20 [kvm_intel]
>>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
>>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
>>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>>  ? vcpu_load+0x1c/0x60 [kvm]
>>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
>>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>>  do_vfs_ioctl+0x96/0x6a0
>>  ? __fget_light+0x2a/0x90
>>  SyS_ioctl+0x79/0x90
>>  do_syscall_64+0x68/0x180
>>  entry_SYSCALL64_slow_path+0x25/0x25
>> Code:  Bad RIP value.
>> RIP  [<          (null)>]           (null)
>>  RSP <ffff8800b5263c48>
>> CR2: 0000000000000000
>> ---[ end trace 9c70c48b1a2bc66e ]---
>>
>> This can be reproduced readily by preemption timer enabled on L0 and disabled
>> on L1.
>>
>> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
>> entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
>> nested_vmx_exit_handled is always return true for preemption timer vmexit, then
>> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
>> timer vmexit, incurr a nested vmexit dereference NULL pointer.
>>
>> This patch fix it by depending on check_nested_events to capture L2 preemption
>> timer(emulated hrtimer) expire and nested vmexit.
>>
>> Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
>> Cc: Jan Kiszka <jan.kiszka@siemens.com>
>> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> v2 -> v3:
>>  * update patch subject
>> v1 -> v2:
>>  * fix typo in patch description
>>
>>  arch/x86/kvm/vmx.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 85e2f0a..29c16a8 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
>>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
>>       case EXIT_REASON_PCOMMIT:
>>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
>> +     case EXIT_REASON_PREEMPTION_TIMER:
>> +             return false;
>
> If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
> will this one still be needed?

After complete "L1 TSC deadline timer to trigger while L2 is running",
L0's preemption timer fire when L2 is running can result in
(is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) be true, right?

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haozhong Zhang July 7, 2016, 7:02 a.m. UTC | #3
On 07/07/16 14:56, Wanpeng Li wrote:
> 2016-07-07 14:48 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> > On 07/07/16 11:46, Wanpeng Li wrote:
> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at           (null)
> >> IP: [<          (null)>]           (null)
> >> PGD 0
> >> Oops: 0010 [#1] SMP
> >> Call Trace:
> >>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
> >>  handle_preemption_timer+0xe/0x20 [kvm_intel]
> >>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
> >>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
> >>  ? vcpu_load+0x1c/0x60 [kvm]
> >>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
> >>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
> >>  do_vfs_ioctl+0x96/0x6a0
> >>  ? __fget_light+0x2a/0x90
> >>  SyS_ioctl+0x79/0x90
> >>  do_syscall_64+0x68/0x180
> >>  entry_SYSCALL64_slow_path+0x25/0x25
> >> Code:  Bad RIP value.
> >> RIP  [<          (null)>]           (null)
> >>  RSP <ffff8800b5263c48>
> >> CR2: 0000000000000000
> >> ---[ end trace 9c70c48b1a2bc66e ]---
> >>
> >> This can be reproduced readily by preemption timer enabled on L0 and disabled
> >> on L1.
> >>
> >> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
> >> entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
> >> nested_vmx_exit_handled is always return true for preemption timer vmexit, then
> >> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
> >> timer vmexit, incurr a nested vmexit dereference NULL pointer.
> >>
> >> This patch fix it by depending on check_nested_events to capture L2 preemption
> >> timer(emulated hrtimer) expire and nested vmexit.
> >>
> >> Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: Radim Krčmář <rkrcmar@redhat.com>
> >> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> >> Cc: Jan Kiszka <jan.kiszka@siemens.com>
> >> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> >> ---
> >> v2 -> v3:
> >>  * update patch subject
> >> v1 -> v2:
> >>  * fix typo in patch description
> >>
> >>  arch/x86/kvm/vmx.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index 85e2f0a..29c16a8 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
> >>       case EXIT_REASON_PCOMMIT:
> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
> >> +     case EXIT_REASON_PREEMPTION_TIMER:
> >> +             return false;
> >
> > If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
> > will this one still be needed?
> 
> After complete "L1 TSC deadline timer to trigger while L2 is running",
> L0's preemption timer fire when L2 is running can result in
> (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) be true, right?
>

In prepare_vmcs02():

    exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
    ...
    vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);

so preemption timer will never be enabled while L2 guest is running.

Haozhong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li July 7, 2016, 7:07 a.m. UTC | #4
2016-07-07 15:02 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> On 07/07/16 14:56, Wanpeng Li wrote:
>> 2016-07-07 14:48 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
>> > On 07/07/16 11:46, Wanpeng Li wrote:
>> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
>> >>
>> >> BUG: unable to handle kernel NULL pointer dereference at           (null)
>> >> IP: [<          (null)>]           (null)
>> >> PGD 0
>> >> Oops: 0010 [#1] SMP
>> >> Call Trace:
>> >>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
>> >>  handle_preemption_timer+0xe/0x20 [kvm_intel]
>> >>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
>> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>> >>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
>> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>> >>  ? vcpu_load+0x1c/0x60 [kvm]
>> >>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
>> >>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>> >>  do_vfs_ioctl+0x96/0x6a0
>> >>  ? __fget_light+0x2a/0x90
>> >>  SyS_ioctl+0x79/0x90
>> >>  do_syscall_64+0x68/0x180
>> >>  entry_SYSCALL64_slow_path+0x25/0x25
>> >> Code:  Bad RIP value.
>> >> RIP  [<          (null)>]           (null)
>> >>  RSP <ffff8800b5263c48>
>> >> CR2: 0000000000000000
>> >> ---[ end trace 9c70c48b1a2bc66e ]---
>> >>
>> >> This can be reproduced readily by preemption timer enabled on L0 and disabled
>> >> on L1.
>> >>
>> >> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
>> >> entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
>> >> nested_vmx_exit_handled is always return true for preemption timer vmexit, then
>> >> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
>> >> timer vmexit, incurr a nested vmexit dereference NULL pointer.
>> >>
>> >> This patch fix it by depending on check_nested_events to capture L2 preemption
>> >> timer(emulated hrtimer) expire and nested vmexit.
>> >>
>> >> Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> >> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> >> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
>> >> Cc: Jan Kiszka <jan.kiszka@siemens.com>
>> >> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
>> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> >> ---
>> >> v2 -> v3:
>> >>  * update patch subject
>> >> v1 -> v2:
>> >>  * fix typo in patch description
>> >>
>> >>  arch/x86/kvm/vmx.c | 2 ++
>> >>  1 file changed, 2 insertions(+)
>> >>
>> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> >> index 85e2f0a..29c16a8 100644
>> >> --- a/arch/x86/kvm/vmx.c
>> >> +++ b/arch/x86/kvm/vmx.c
>> >> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
>> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
>> >>       case EXIT_REASON_PCOMMIT:
>> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
>> >> +     case EXIT_REASON_PREEMPTION_TIMER:
>> >> +             return false;
>> >
>> > If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
>> > will this one still be needed?
>>
>> After complete "L1 TSC deadline timer to trigger while L2 is running",
>> L0's preemption timer fire when L2 is running can result in
>> (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) be true, right?
>>
>
> In prepare_vmcs02():
>
>     exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>     ...
>     vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);
>
> so preemption timer will never be enabled while L2 guest is running.

It is not true after Paolo's patch.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haozhong Zhang July 7, 2016, 7:25 a.m. UTC | #5
On 07/07/16 15:07, Wanpeng Li wrote:
> 2016-07-07 15:02 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> > On 07/07/16 14:56, Wanpeng Li wrote:
> >> 2016-07-07 14:48 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> >> > On 07/07/16 11:46, Wanpeng Li wrote:
> >> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
> >> >>
> >> >> BUG: unable to handle kernel NULL pointer dereference at           (null)
> >> >> IP: [<          (null)>]           (null)
> >> >> PGD 0
> >> >> Oops: 0010 [#1] SMP
> >> >> Call Trace:
> >> >>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
> >> >>  handle_preemption_timer+0xe/0x20 [kvm_intel]
> >> >>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
> >> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
> >> >>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
> >> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
> >> >>  ? vcpu_load+0x1c/0x60 [kvm]
> >> >>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
> >> >>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
> >> >>  do_vfs_ioctl+0x96/0x6a0
> >> >>  ? __fget_light+0x2a/0x90
> >> >>  SyS_ioctl+0x79/0x90
> >> >>  do_syscall_64+0x68/0x180
> >> >>  entry_SYSCALL64_slow_path+0x25/0x25
> >> >> Code:  Bad RIP value.
> >> >> RIP  [<          (null)>]           (null)
> >> >>  RSP <ffff8800b5263c48>
> >> >> CR2: 0000000000000000
> >> >> ---[ end trace 9c70c48b1a2bc66e ]---
> >> >>
> >> >> This can be reproduced readily by preemption timer enabled on L0 and disabled
> >> >> on L1.
> >> >>
> >> >> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
> >> >> entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
> >> >> nested_vmx_exit_handled is always return true for preemption timer vmexit, then
> >> >> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
> >> >> timer vmexit, incurr a nested vmexit dereference NULL pointer.
> >> >>
> >> >> This patch fix it by depending on check_nested_events to capture L2 preemption
> >> >> timer(emulated hrtimer) expire and nested vmexit.
> >> >>
> >> >> Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> >> Cc: Radim Krčmář <rkrcmar@redhat.com>
> >> >> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> >> >> Cc: Jan Kiszka <jan.kiszka@siemens.com>
> >> >> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
> >> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> >> >> ---
> >> >> v2 -> v3:
> >> >>  * update patch subject
> >> >> v1 -> v2:
> >> >>  * fix typo in patch description
> >> >>
> >> >>  arch/x86/kvm/vmx.c | 2 ++
> >> >>  1 file changed, 2 insertions(+)
> >> >>
> >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> >> index 85e2f0a..29c16a8 100644
> >> >> --- a/arch/x86/kvm/vmx.c
> >> >> +++ b/arch/x86/kvm/vmx.c
> >> >> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
> >> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
> >> >>       case EXIT_REASON_PCOMMIT:
> >> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
> >> >> +     case EXIT_REASON_PREEMPTION_TIMER:
> >> >> +             return false;
> >> >
> >> > If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
> >> > will this one still be needed?
> >>
> >> After complete "L1 TSC deadline timer to trigger while L2 is running",
> >> L0's preemption timer fire when L2 is running can result in
> >> (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) be true, right?
> >>
> >
> > In prepare_vmcs02():
> >
> >     exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
> >     ...
> >     vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);
> >
> > so preemption timer will never be enabled while L2 guest is running.
> 
> It is not true after Paolo's patch.
>

OK, I didn't quite follow discussions before v3 and just noticed
Paolo's patch in v2 comments is to enable L1 preemption timer while L2
is running. Then this one looks reasonable.

Thanks,
Haozhong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 85e2f0a..29c16a8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8041,6 +8041,8 @@  static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
 	case EXIT_REASON_PCOMMIT:
 		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
+	case EXIT_REASON_PREEMPTION_TIMER:
+		return false;
 	default:
 		return true;
 	}