diff mbox

[3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf

Message ID 1497334094-6982-4-git-send-email-wanpeng.li@hotmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li June 13, 2017, 6:08 a.m. UTC
From: Wanpeng Li <wanpeng.li@hotmail.com>

Add an async_page_fault field to vcpu->arch.exception to identify an async 
page fault, and constructs the expected vm-exit information fields. Force 
a nested VM exit from nested_vmx_check_exception() if the injected #PF 
is async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/asm/kvm_emulate.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/vmx.c                 | 21 ++++++++++++++++++---
 arch/x86/kvm/x86.c                 |  3 +++
 4 files changed, 23 insertions(+), 3 deletions(-)

Comments

Radim Krčmář June 13, 2017, 6:55 p.m. UTC | #1
2017-06-12 23:08-0700, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> Add an async_page_fault field to vcpu->arch.exception to identify an async 
> page fault, and constructs the expected vm-exit information fields. Force 
> a nested VM exit from nested_vmx_check_exception() if the injected #PF 
> is async page fault.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -2422,13 +2422,28 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>  static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)

This function could use the same treatment as vmx_queue_exception(), so
we are not mixing 'nr' with 'vcpu->arch.exception.*'.

>  {
>  	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> +	u32 intr_info = 0;
> +	unsigned long exit_qualification = 0;
>  
> -	if (!(vmcs12->exception_bitmap & (1u << nr)))
> +	if (!((vmcs12->exception_bitmap & (1u << nr)) ||
> +		(nr == PF_VECTOR && vcpu->arch.exception.async_page_fault)))
>  		return 0;
>  
> +	intr_info = nr | INTR_INFO_VALID_MASK;
> +	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);

This part still uses EXIT_QUALIFICATION(), which means it is not general
and I think it would be nicer to just do simple special case on top:

	if (vcpu->arch.exception.async_page_fault) {
		vmcs_write32(VM_EXIT_INTR_ERROR_CODE, vcpu->arch.exception.error_code);
		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
		                  PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
		                  INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
		                  vcpu->arch.cr2);
		return 1;
	}

Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
exits;  isn't this going to change the CR2 visible in L2 guest after a
nested VM entry?

Btw. nested_vmx_check_exception() didn't support emulated exceptions at
all (it only passed through ones we got from hardware), or have I missed
something?

Thanks.
Wanpeng Li June 14, 2017, 1:07 a.m. UTC | #2
2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-12 23:08-0700, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> Add an async_page_fault field to vcpu->arch.exception to identify an async
>> page fault, and constructs the expected vm-exit information fields. Force
>> a nested VM exit from nested_vmx_check_exception() if the injected #PF
>> is async page fault.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -2422,13 +2422,28 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>>  static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
>
> This function could use the same treatment as vmx_queue_exception(), so
> we are not mixing 'nr' with 'vcpu->arch.exception.*'.
>
>>  {
>>       struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>> +     u32 intr_info = 0;
>> +     unsigned long exit_qualification = 0;
>>
>> -     if (!(vmcs12->exception_bitmap & (1u << nr)))
>> +     if (!((vmcs12->exception_bitmap & (1u << nr)) ||
>> +             (nr == PF_VECTOR && vcpu->arch.exception.async_page_fault)))
>>               return 0;
>>
>> +     intr_info = nr | INTR_INFO_VALID_MASK;
>> +     exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>
> This part still uses EXIT_QUALIFICATION(), which means it is not general
> and I think it would be nicer to just do simple special case on top:
>
>         if (vcpu->arch.exception.async_page_fault) {
>                 vmcs_write32(VM_EXIT_INTR_ERROR_CODE, vcpu->arch.exception.error_code);
>                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
>                                   PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
>                                   INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
>                                   vcpu->arch.cr2);
>                 return 1;
>         }

Good point.

>
> Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
> exits;  isn't this going to change the CR2 visible in L2 guest after a
> nested VM entry?

Sorry, I don't fully understand the question. As you know this
vcpu->arch.cr2 which includes token is set before async pf injection,
and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
why it can change the CR2 visible in L2 guest after a nested VM entry?

>
> Btw. nested_vmx_check_exception() didn't support emulated exceptions at
> all (it only passed through ones we got from hardware),

I think so.

Regards,
Wanpeng Li
Radim Krčmář June 14, 2017, 12:52 p.m. UTC | #3
2017-06-14 09:07+0800, Wanpeng Li:
> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
> > exits;  isn't this going to change the CR2 visible in L2 guest after a
> > nested VM entry?
> 
> Sorry, I don't fully understand the question. As you know this
> vcpu->arch.cr2 which includes token is set before async pf injection,

Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.

> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,

Right, so we do not need to have the token in CR2, because L1 is not
going to look at it.

> why it can change the CR2 visible in L2 guest after a nested VM entry?

Sorry, the situation is too convoluted to be expressed in one sentence:

1) L2 is running with CR2 = L2CR2
3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
   vcpu->arch.cr2
2) APF for L1 has completed
4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
7) L1 stores APFT as L2's CR2
8) L1 handles APF, maybe reschedules, but eventually comes back to this
   L2's thread
9) after some time, L1 enters L2 with CR2 = APFT
10) L2 is running with CR2 = APTF

The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
at it, e.g. it was in a process of handling its #PF.
Wanpeng Li June 14, 2017, 1:02 p.m. UTC | #4
2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-14 09:07+0800, Wanpeng Li:
>> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
>> > exits;  isn't this going to change the CR2 visible in L2 guest after a
>> > nested VM entry?
>>
>> Sorry, I don't fully understand the question. As you know this
>> vcpu->arch.cr2 which includes token is set before async pf injection,
>
> Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
>
>> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
>
> Right, so we do not need to have the token in CR2, because L1 is not
> going to look at it.
>
>> why it can change the CR2 visible in L2 guest after a nested VM entry?
>
> Sorry, the situation is too convoluted to be expressed in one sentence:
>
> 1) L2 is running with CR2 = L2CR2
> 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
>    vcpu->arch.cr2
> 2) APF for L1 has completed
> 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
> 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
> 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
> 7) L1 stores APFT as L2's CR2
> 8) L1 handles APF, maybe reschedules, but eventually comes back to this
>    L2's thread
> 9) after some time, L1 enters L2 with CR2 = APFT
> 10) L2 is running with CR2 = APTF
>
> The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
> at it, e.g. it was in a process of handling its #PF.

Good point. What's your proposal? :)

Regards,
Wanpeng Li
Radim Krčmář June 14, 2017, 1:20 p.m. UTC | #5
2017-06-14 21:02+0800, Wanpeng Li:
> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> > 2017-06-14 09:07+0800, Wanpeng Li:
> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
> >> > nested VM entry?
> >>
> >> Sorry, I don't fully understand the question. As you know this
> >> vcpu->arch.cr2 which includes token is set before async pf injection,
> >
> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
> >
> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
> >
> > Right, so we do not need to have the token in CR2, because L1 is not
> > going to look at it.
> >
> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
> >
> > Sorry, the situation is too convoluted to be expressed in one sentence:
> >
> > 1) L2 is running with CR2 = L2CR2
> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
> >    vcpu->arch.cr2
> > 2) APF for L1 has completed
> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
> > 7) L1 stores APFT as L2's CR2
> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
> >    L2's thread
> > 9) after some time, L1 enters L2 with CR2 = APFT
> > 10) L2 is running with CR2 = APTF
> >
> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
> > at it, e.g. it was in a process of handling its #PF.
> 
> Good point. What's your proposal? :)

Get rid of async_pf. :) Optimal solutions aside, I think it would be
best to add a new injection function for APF.  One that injects a normal
#PF for non-nested guests and directly triggers a #PF VM exit otherwise,
and call it from kvm_arch_async_page_*present().

Do you think that just moving the nested VM exit from
nested_vmx_check_exception() would work?

Thanks.
Wanpeng Li June 14, 2017, 1:27 p.m. UTC | #6
2017-06-14 21:20 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-14 21:02+0800, Wanpeng Li:
>> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > 2017-06-14 09:07+0800, Wanpeng Li:
>> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
>> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
>> >> > nested VM entry?
>> >>
>> >> Sorry, I don't fully understand the question. As you know this
>> >> vcpu->arch.cr2 which includes token is set before async pf injection,
>> >
>> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
>> >
>> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
>> >
>> > Right, so we do not need to have the token in CR2, because L1 is not
>> > going to look at it.
>> >
>> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
>> >
>> > Sorry, the situation is too convoluted to be expressed in one sentence:
>> >
>> > 1) L2 is running with CR2 = L2CR2
>> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
>> >    vcpu->arch.cr2
>> > 2) APF for L1 has completed
>> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
>> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
>> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
>> > 7) L1 stores APFT as L2's CR2
>> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
>> >    L2's thread
>> > 9) after some time, L1 enters L2 with CR2 = APFT
>> > 10) L2 is running with CR2 = APTF
>> >
>> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
>> > at it, e.g. it was in a process of handling its #PF.
>>
>> Good point. What's your proposal? :)
>
> Get rid of async_pf. :) Optimal solutions aside, I think it would be
> best to add a new injection function for APF.  One that injects a normal
> #PF for non-nested guests and directly triggers a #PF VM exit otherwise,
> and call it from kvm_arch_async_page_*present().
>
> Do you think that just moving the nested VM exit from
> nested_vmx_check_exception() would work?

That's the same as what commit  9bc1f09f6fa76
Wanpeng Li June 14, 2017, 1:32 p.m. UTC | #7
2017-06-14 21:20 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-14 21:02+0800, Wanpeng Li:
>> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > 2017-06-14 09:07+0800, Wanpeng Li:
>> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
>> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
>> >> > nested VM entry?
>> >>
>> >> Sorry, I don't fully understand the question. As you know this
>> >> vcpu->arch.cr2 which includes token is set before async pf injection,
>> >
>> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
>> >
>> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
>> >
>> > Right, so we do not need to have the token in CR2, because L1 is not
>> > going to look at it.
>> >
>> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
>> >
>> > Sorry, the situation is too convoluted to be expressed in one sentence:
>> >
>> > 1) L2 is running with CR2 = L2CR2
>> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
>> >    vcpu->arch.cr2
>> > 2) APF for L1 has completed
>> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
>> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
>> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
>> > 7) L1 stores APFT as L2's CR2
>> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
>> >    L2's thread
>> > 9) after some time, L1 enters L2 with CR2 = APFT
>> > 10) L2 is running with CR2 = APTF
>> >
>> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
>> > at it, e.g. it was in a process of handling its #PF.
>>
>> Good point. What's your proposal? :)
>
> Get rid of async_pf. :) Optimal solutions aside, I think it would be
> best to add a new injection function for APF.  One that injects a normal
> #PF for non-nested guests and directly triggers a #PF VM exit otherwise,
> and call it from kvm_arch_async_page_*present().
>
> Do you think that just moving the nested VM exit from
> nested_vmx_check_exception() would work?

That's the same as what commit  9bc1f09f6fa76 (KVM: async_pf: avoid
async pf injection when in guest mode) do I think, how about
introducing a field like nested_apf_token to vcpu->arch to keep the
token in kvm_inject_page_fault if
(vcpu->arch.exception.async_page_fault && is_guest_mode(vcpu)) is
true.

Regards,
Wanpeng Li
Wanpeng Li June 14, 2017, 2:32 p.m. UTC | #8
2017-06-14 21:20 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-14 21:02+0800, Wanpeng Li:
>> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > 2017-06-14 09:07+0800, Wanpeng Li:
>> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
>> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
>> >> > nested VM entry?
>> >>
>> >> Sorry, I don't fully understand the question. As you know this
>> >> vcpu->arch.cr2 which includes token is set before async pf injection,
>> >
>> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
>> >
>> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
>> >
>> > Right, so we do not need to have the token in CR2, because L1 is not
>> > going to look at it.
>> >
>> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
>> >
>> > Sorry, the situation is too convoluted to be expressed in one sentence:
>> >
>> > 1) L2 is running with CR2 = L2CR2
>> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
>> >    vcpu->arch.cr2
>> > 2) APF for L1 has completed
>> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
>> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
>> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
>> > 7) L1 stores APFT as L2's CR2
>> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
>> >    L2's thread
>> > 9) after some time, L1 enters L2 with CR2 = APFT
>> > 10) L2 is running with CR2 = APTF
>> >
>> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
>> > at it, e.g. it was in a process of handling its #PF.
>>
>> Good point. What's your proposal? :)
>
> Get rid of async_pf. :) Optimal solutions aside, I think it would be
> best to add a new injection function for APF.  One that injects a normal
> #PF for non-nested guests and directly triggers a #PF VM exit otherwise,
> and call it from kvm_arch_async_page_*present().

In addition, nested vmexit in kvm_arch_async_page_*present() directly
instead of through inject_pending_event() before vmentry, or nested
vmexit after vmexit on L0 looks strange. So how about the proposal of
the nested_apf_token stuff? Radim, Paolo?

Regards,
Wanpeng Li
Radim Krčmář June 14, 2017, 4:18 p.m. UTC | #9
2017-06-14 22:32+0800, Wanpeng Li:
> 2017-06-14 21:20 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> > 2017-06-14 21:02+0800, Wanpeng Li:
> >> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> >> > 2017-06-14 09:07+0800, Wanpeng Li:
> >> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> >> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
> >> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
> >> >> > nested VM entry?
> >> >>
> >> >> Sorry, I don't fully understand the question. As you know this
> >> >> vcpu->arch.cr2 which includes token is set before async pf injection,
> >> >
> >> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
> >> >
> >> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
> >> >
> >> > Right, so we do not need to have the token in CR2, because L1 is not
> >> > going to look at it.
> >> >
> >> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
> >> >
> >> > Sorry, the situation is too convoluted to be expressed in one sentence:
> >> >
> >> > 1) L2 is running with CR2 = L2CR2
> >> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
> >> >    vcpu->arch.cr2
> >> > 2) APF for L1 has completed
> >> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
> >> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
> >> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
> >> > 7) L1 stores APFT as L2's CR2
> >> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
> >> >    L2's thread
> >> > 9) after some time, L1 enters L2 with CR2 = APFT
> >> > 10) L2 is running with CR2 = APTF
> >> >
> >> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
> >> > at it, e.g. it was in a process of handling its #PF.
> >>
> >> Good point. What's your proposal? :)
> >
> > Get rid of async_pf. :) Optimal solutions aside, I think it would be
> > best to add a new injection function for APF.  One that injects a normal
> > #PF for non-nested guests and directly triggers a #PF VM exit otherwise,
> > and call it from kvm_arch_async_page_*present().
> 
> In addition, nested vmexit in kvm_arch_async_page_*present() directly
> instead of through inject_pending_event() before vmentry, or nested
> vmexit after vmexit on L0 looks strange.

Right, it might be tricky if another exception can get queued in
between.  (Which shouldn't happen, though, because async_pf exceptions
must not cause double faults for no good reason.)

>                                          So how about the proposal of
> the nested_apf_token stuff? Radim, Paolo?

I think it is worth exploring.  We need to make sure that interfacing
with userspace through kvm_vcpu_ioctl_x86_{set,get}_vcpu_events() is
right, but it should be possible without any extension as migration is
already covered by unconditional async_pf wakeup on the destination.

At this point, using a structure other than arch.exception would make
sense too -- async_pf uses the exception injection path mostly for
convenience, but the paravirt exception does not want to mix with real
exceptions.
Wanpeng Li June 15, 2017, 2:43 a.m. UTC | #10
2017-06-15 0:18 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-14 22:32+0800, Wanpeng Li:
>> 2017-06-14 21:20 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > 2017-06-14 21:02+0800, Wanpeng Li:
>> >> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> >> > 2017-06-14 09:07+0800, Wanpeng Li:
>> >> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> >> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
>> >> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
>> >> >> > nested VM entry?
>> >> >>
>> >> >> Sorry, I don't fully understand the question. As you know this
>> >> >> vcpu->arch.cr2 which includes token is set before async pf injection,
>> >> >
>> >> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
>> >> >
>> >> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
>> >> >
>> >> > Right, so we do not need to have the token in CR2, because L1 is not
>> >> > going to look at it.
>> >> >
>> >> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
>> >> >
>> >> > Sorry, the situation is too convoluted to be expressed in one sentence:
>> >> >
>> >> > 1) L2 is running with CR2 = L2CR2
>> >> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
>> >> >    vcpu->arch.cr2
>> >> > 2) APF for L1 has completed
>> >> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
>> >> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
>> >> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
>> >> > 7) L1 stores APFT as L2's CR2
>> >> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
>> >> >    L2's thread
>> >> > 9) after some time, L1 enters L2 with CR2 = APFT
>> >> > 10) L2 is running with CR2 = APTF
>> >> >
>> >> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
>> >> > at it, e.g. it was in a process of handling its #PF.
>> >>
>> >> Good point. What's your proposal? :)
>> >
>> > Get rid of async_pf. :) Optimal solutions aside, I think it would be
>> > best to add a new injection function for APF.  One that injects a normal
>> > #PF for non-nested guests and directly triggers a #PF VM exit otherwise,
>> > and call it from kvm_arch_async_page_*present().
>>
>> In addition, nested vmexit in kvm_arch_async_page_*present() directly
>> instead of through inject_pending_event() before vmentry, or nested
>> vmexit after vmexit on L0 looks strange.
>
> Right, it might be tricky if another exception can get queued in
> between.  (Which shouldn't happen, though, because async_pf exceptions
> must not cause double faults for no good reason.)
>
>>                                          So how about the proposal of
>> the nested_apf_token stuff? Radim, Paolo?
>
> I think it is worth exploring.  We need to make sure that interfacing
> with userspace through kvm_vcpu_ioctl_x86_{set,get}_vcpu_events() is
> right, but it should be possible without any extension as migration is
> already covered by unconditional async_pf wakeup on the destination.
>
> At this point, using a structure other than arch.exception would make
> sense too -- async_pf uses the exception injection path mostly for
> convenience, but the paravirt exception does not want to mix with real
> exceptions.

Yeah, but maybe need more reconstruct, I just send out v2 to fix it
simply and avoid too aggressive. :)

Regards,
Wanpeng Li
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 0559626..b5bcad9 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -23,6 +23,7 @@  struct x86_exception {
 	u16 error_code;
 	bool nested_page_fault;
 	u64 address; /* cr2 or nested page fault gpa */
+	bool async_page_fault;
 };
 
 /*
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1f01bfb..d30576a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -545,6 +545,7 @@  struct kvm_vcpu_arch {
 		bool reinject;
 		u8 nr;
 		u32 error_code;
+		bool async_page_fault;
 	} exception;
 
 	struct kvm_queued_interrupt {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f533cc1..da81e48 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2422,13 +2422,28 @@  static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+	u32 intr_info = 0;
+	unsigned long exit_qualification = 0;
 
-	if (!(vmcs12->exception_bitmap & (1u << nr)))
+	if (!((vmcs12->exception_bitmap & (1u << nr)) ||
+		(nr == PF_VECTOR && vcpu->arch.exception.async_page_fault)))
 		return 0;
 
+	intr_info = nr | INTR_INFO_VALID_MASK;
+	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+	if (vcpu->arch.exception.has_error_code) {
+		vmcs_write32(VM_EXIT_INTR_ERROR_CODE, vcpu->arch.exception.error_code);
+		intr_info |= INTR_INFO_DELIVER_CODE_MASK;
+	}
+	if (kvm_exception_is_soft(nr))
+		intr_info |= INTR_TYPE_SOFT_EXCEPTION;
+	else
+		intr_info |= INTR_TYPE_HARD_EXCEPTION;
+	if (nr == PF_VECTOR && vcpu->arch.exception.async_page_fault)
+		exit_qualification = vcpu->arch.cr2;
+
 	nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
-			  vmcs_read32(VM_EXIT_INTR_INFO),
-			  vmcs_readl(EXIT_QUALIFICATION));
+			intr_info, exit_qualification);
 	return 1;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b28a31..d7f1a49 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -453,6 +453,7 @@  void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 {
 	++vcpu->stat.pf_guest;
 	vcpu->arch.cr2 = fault->address;
+	vcpu->arch.exception.async_page_fault = fault->async_page_fault;
 	kvm_queue_exception_e(vcpu, PF_VECTOR, fault->error_code);
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
@@ -8571,6 +8572,7 @@  void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 		fault.error_code = 0;
 		fault.nested_page_fault = false;
 		fault.address = work->arch.token;
+		fault.async_page_fault = true;
 		kvm_inject_page_fault(vcpu, &fault);
 	}
 }
@@ -8593,6 +8595,7 @@  void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 		fault.error_code = 0;
 		fault.nested_page_fault = false;
 		fault.address = work->arch.token;
+		fault.async_page_fault = true;
 		kvm_inject_page_fault(vcpu, &fault);
 	}
 	vcpu->arch.apf.halted = false;