diff mbox series

[RFC,1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"

Message ID 20200429093634.1514902-2-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications | expand

Commit Message

Vitaly Kuznetsov April 29, 2020, 9:36 a.m. UTC
Commit 9a6e7c39810e (""KVM: async_pf: Fix #DF due to inject "Page not
Present" and "Page Ready" exceptions simultaneously") added a protection
against 'page ready' notification coming before 'page not ready' is
delivered. This situation seems to be impossible since commit 2a266f23550b
("KVM MMU: check pending exception before injecting APF) which added
'vcpu->arch.exception.pending' check to kvm_can_do_async_pf.

On x86, kvm_arch_async_page_present() has only one call site:
kvm_check_async_pf_completion() loop and we only enter the loop when
kvm_arch_can_inject_async_page_present(vcpu) which when async pf msr
is enabled, translates into kvm_can_do_async_pf().

There is also one problem with the cancellation mechanism. We don't seem
to check that the 'page not ready' notification we're cancelling matches
the 'page ready' notification so in theory, we may erroneously drop two
valid events.

Revert the commit. apf_get_user() stays as we will need it for the new
'page ready notifications via interrupt' mechanism.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/x86.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

Comments

Gavin Shan May 5, 2020, 1:22 a.m. UTC | #1
On 4/29/20 7:36 PM, Vitaly Kuznetsov wrote:
> Commit 9a6e7c39810e (""KVM: async_pf: Fix #DF due to inject "Page not
> Present" and "Page Ready" exceptions simultaneously") added a protection
> against 'page ready' notification coming before 'page not ready' is
> delivered. This situation seems to be impossible since commit 2a266f23550b
> ("KVM MMU: check pending exception before injecting APF) which added
> 'vcpu->arch.exception.pending' check to kvm_can_do_async_pf.
> 
> On x86, kvm_arch_async_page_present() has only one call site:
> kvm_check_async_pf_completion() loop and we only enter the loop when
> kvm_arch_can_inject_async_page_present(vcpu) which when async pf msr
> is enabled, translates into kvm_can_do_async_pf().
> 
> There is also one problem with the cancellation mechanism. We don't seem
> to check that the 'page not ready' notification we're cancelling matches
> the 'page ready' notification so in theory, we may erroneously drop two
> valid events.
> 
> Revert the commit. apf_get_user() stays as we will need it for the new
> 'page ready notifications via interrupt' mechanism.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>   arch/x86/kvm/x86.c | 16 +---------------
>   1 file changed, 1 insertion(+), 15 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>
Vivek Goyal May 5, 2020, 2:16 p.m. UTC | #2
On Wed, Apr 29, 2020 at 11:36:29AM +0200, Vitaly Kuznetsov wrote:
> Commit 9a6e7c39810e (""KVM: async_pf: Fix #DF due to inject "Page not
> Present" and "Page Ready" exceptions simultaneously") added a protection
> against 'page ready' notification coming before 'page not ready' is
> delivered.

Hi Vitaly,

Description of the commit seems to suggest that it is solving double
fault issue. That is both "page not present" and "page ready" exceptions
got queued and on next vcpu entry, it will result in double fault.

It does not seem to solve the issue of "page not ready" being delivered
before "page ready". That guest can handle already and its not an issue.

> This situation seems to be impossible since commit 2a266f23550b
> ("KVM MMU: check pending exception before injecting APF) which added
> 'vcpu->arch.exception.pending' check to kvm_can_do_async_pf.

This original commit description is confusing too. It says.

"For example, when two APF's for page ready happen after one exit and
 the first one becomes pending, the second one will result in #DF.
 Instead, just handle the second page fault synchronously."

So it seems to be trying to protect against that two "page ready"
exceptions don't get queued simultaneously. But you can't fall back
to synchronous mechanism once you have started the async pf prototocol.
Once you have started async page fault protocol by sending "page not
reay", you have to send "page ready". So I am not sure how above commit
solved the issue of two "page ready" not being queued at the same time.

I am wondering what problem did this commit solve. It looks like it
can avoid queueing two "page not ready" events. But can that even happen?

> 
> On x86, kvm_arch_async_page_present() has only one call site:
> kvm_check_async_pf_completion() loop and we only enter the loop when
> kvm_arch_can_inject_async_page_present(vcpu) which when async pf msr
> is enabled, translates into kvm_can_do_async_pf().

kvm_check_async_pf_completion() skips injecting "page ready" if fault
can't be injected now. Does that mean we leave it queued and it will
be injected after next exit.

If yes, then previous commit kind of makes sense. When it will not
queue up two exceptions at the same time but will wait for queuing
up the exception after next exit. But commit description still seems
to be wrong in the sense it is not falling back to synchronous page
fault for "page ready" events.

try_async_pf() also calls kvm_can_do_async_pf(). And IIUC, it will
fall back to synchrounous fault if injecting async_pf is not possible
at this point of time. So that means despite the fact that async pf
is enabled, all the page faults might not take that route and some
will fall back to synchrounous faults. I am concerned that how will
this work for reporting errors back to guest (for virtiofs use case).
If we are relying on async pf mechanism to also be able to report
errors back, then we can't afford to do synchrounous page faults becase
we don't have a way to report errors back to guest and it will hang (if
page can't be faulted in).

So either we need a way to report errors back while doing synchrounous
page faults or we can't fall back to synchorounous page faults while
async page faults are enabled. 

While we are reworking async page mechanism, want to make sure that
error reporting part has been taken care of as part of design. Don't
want to be dealing with it after the fact.

Thanks
Vivek


> 
> There is also one problem with the cancellation mechanism. We don't seem
> to check that the 'page not ready' notification we're cancelling matches
> the 'page ready' notification so in theory, we may erroneously drop two
> valid events.
> 
> Revert the commit. apf_get_user() stays as we will need it for the new
> 'page ready notifications via interrupt' mechanism.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 16 +---------------
>  1 file changed, 1 insertion(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c5835f9cb9ad..b93133ee07ba 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10430,7 +10430,6 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  				 struct kvm_async_pf *work)
>  {
>  	struct x86_exception fault;
> -	u32 val;
>  
>  	if (work->wakeup_all)
>  		work->arch.token = ~0; /* broadcast wakeup */
> @@ -10439,19 +10438,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
>  
>  	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
> -	    !apf_get_user(vcpu, &val)) {
> -		if (val == KVM_PV_REASON_PAGE_NOT_PRESENT &&
> -		    vcpu->arch.exception.pending &&
> -		    vcpu->arch.exception.nr == PF_VECTOR &&
> -		    !apf_put_user(vcpu, 0)) {
> -			vcpu->arch.exception.injected = false;
> -			vcpu->arch.exception.pending = false;
> -			vcpu->arch.exception.nr = 0;
> -			vcpu->arch.exception.has_error_code = false;
> -			vcpu->arch.exception.error_code = 0;
> -			vcpu->arch.exception.has_payload = false;
> -			vcpu->arch.exception.payload = 0;
> -		} else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
> +	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
>  			fault.vector = PF_VECTOR;
>  			fault.error_code_valid = true;
>  			fault.error_code = 0;
> @@ -10459,7 +10446,6 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  			fault.address = work->arch.token;
>  			fault.async_page_fault = true;
>  			kvm_inject_page_fault(vcpu, &fault);
> -		}
>  	}
>  	vcpu->arch.apf.halted = false;
>  	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> -- 
> 2.25.3
>
Vitaly Kuznetsov May 6, 2020, 3:17 p.m. UTC | #3
Vivek Goyal <vgoyal@redhat.com> writes:

> On Wed, Apr 29, 2020 at 11:36:29AM +0200, Vitaly Kuznetsov wrote:
>> Commit 9a6e7c39810e (""KVM: async_pf: Fix #DF due to inject "Page not
>> Present" and "Page Ready" exceptions simultaneously") added a protection
>> against 'page ready' notification coming before 'page not ready' is
>> delivered.
>
> Hi Vitaly,
>
> Description of the commit seems to suggest that it is solving double
> fault issue. That is both "page not present" and "page ready" exceptions
> got queued and on next vcpu entry, it will result in double fault.
>
> It does not seem to solve the issue of "page not ready" being delivered
> before "page ready". That guest can handle already and its not an issue.
>
>> This situation seems to be impossible since commit 2a266f23550b
>> ("KVM MMU: check pending exception before injecting APF) which added
>> 'vcpu->arch.exception.pending' check to kvm_can_do_async_pf.
>
> This original commit description is confusing too. It says.
>
> "For example, when two APF's for page ready happen after one exit and
>  the first one becomes pending, the second one will result in #DF.
>  Instead, just handle the second page fault synchronously."
>
> So it seems to be trying to protect against that two "page ready"
> exceptions don't get queued simultaneously. But you can't fall back
> to synchronous mechanism once you have started the async pf prototocol.
> Once you have started async page fault protocol by sending "page not
> reay", you have to send "page ready". So I am not sure how above commit
> solved the issue of two "page ready" not being queued at the same time.
>
> I am wondering what problem did this commit solve. It looks like it
> can avoid queueing two "page not ready" events. But can that even happen?
>
>> 
>> On x86, kvm_arch_async_page_present() has only one call site:
>> kvm_check_async_pf_completion() loop and we only enter the loop when
>> kvm_arch_can_inject_async_page_present(vcpu) which when async pf msr
>> is enabled, translates into kvm_can_do_async_pf().
>
> kvm_check_async_pf_completion() skips injecting "page ready" if fault
> can't be injected now. Does that mean we leave it queued and it will
> be injected after next exit.
>
> If yes, then previous commit kind of makes sense. When it will not
> queue up two exceptions at the same time but will wait for queuing
> up the exception after next exit. But commit description still seems
> to be wrong in the sense it is not falling back to synchronous page
> fault for "page ready" events.

As I'll be dropping 'page ready' event delivery via #PF this doesn't
really matter much but after staring at the code for a while I managed
to convince myself that this particular code patch is just unreachable.
I don't think we can get #DF now with two 'page not ready' exceptions,
kvm_can_do_async_pf() should now allow that.

>
> try_async_pf() also calls kvm_can_do_async_pf(). And IIUC, it will
> fall back to synchrounous fault if injecting async_pf is not possible
> at this point of time. So that means despite the fact that async pf
> is enabled, all the page faults might not take that route and some
> will fall back to synchrounous faults. 

This sounds correct.

> I am concerned that how will
> this work for reporting errors back to guest (for virtiofs use case).
> If we are relying on async pf mechanism to also be able to report
> errors back, then we can't afford to do synchrounous page faults becase
> we don't have a way to report errors back to guest and it will hang (if
> page can't be faulted in).

Currently, if 'page not ready' event was delivered then we can't skip
'page ready' event, we'll have to deliver it. 'Page not ready' is still
going to be delivered as exception and unless we really need to solve
complex problems like delivering nested exception this should work.

>
> So either we need a way to report errors back while doing synchrounous
> page faults or we can't fall back to synchorounous page faults while
> async page faults are enabled.
>
> While we are reworking async page mechanism, want to make sure that
> error reporting part has been taken care of as part of design. Don't
> want to be dealing with it after the fact.

The main issue I'm seeing here is that we'll need to deliver these
errors 'right now' and not some time later. Generally, exceptions
(e.g. #VE) should work but there are some corner cases, I remember Paolo
and Andy discussing these (just hoping they'll jump in with their
conclusions :-). If we somehow manage to exclude interrupts-disabled
context from our scope we should be good, I don't see reasons to skip
delivering #VE there.

For the part this series touches, "page ready" notifications, we don't
skip them but at the same time there is no timely delivery guarantee, we
just queue an interrupt. I'm not sure you'll need these for virtio-fs
though.

Thanks for the feedback!
Vivek Goyal May 11, 2020, 7:17 p.m. UTC | #4
On Wed, May 06, 2020 at 05:17:57PM +0200, Vitaly Kuznetsov wrote:

[..]
> >
> > So either we need a way to report errors back while doing synchrounous
> > page faults or we can't fall back to synchorounous page faults while
> > async page faults are enabled.
> >
> > While we are reworking async page mechanism, want to make sure that
> > error reporting part has been taken care of as part of design. Don't
> > want to be dealing with it after the fact.
> 
> The main issue I'm seeing here is that we'll need to deliver these
> errors 'right now' and not some time later. Generally, exceptions
> (e.g. #VE) should work but there are some corner cases, I remember Paolo
> and Andy discussing these (just hoping they'll jump in with their
> conclusions :-). If we somehow manage to exclude interrupts-disabled
> context from our scope we should be good, I don't see reasons to skip
> delivering #VE there.

Hi Vitaly,

If we can't find a good solution for interrupt disabled regions, then
I guess I will live with error reporting with interrupts enabled only
for now. It should solve a class of problems. Once users show up which
need error handling with interrupts disabled, then we will need to
solve it.

> 
> For the part this series touches, "page ready" notifications, we don't
> skip them but at the same time there is no timely delivery guarantee, we
> just queue an interrupt. I'm not sure you'll need these for virtio-fs
> though.


I think virtiofs will need both (synchronous as well as asynchrous
error reporting).

- If we can deliver async pf to guest, then we will send "page not present"
  to guest and try to fault in the page. If we figure out that page can't be
  faulted in, then we can send "page fault error" notification using interrupt
  (as you are doing for "page ready").

- If async page fault can't be injected in guest and we fall back to
  synchronous fault, and we figure out that fault can't be completed,
  we need to inject error using #VE (or some exception).

Thanks
Vivek

> 
> Thanks for the feedback!
> 
> -- 
> Vitaly
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c5835f9cb9ad..b93133ee07ba 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10430,7 +10430,6 @@  void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work)
 {
 	struct x86_exception fault;
-	u32 val;
 
 	if (work->wakeup_all)
 		work->arch.token = ~0; /* broadcast wakeup */
@@ -10439,19 +10438,7 @@  void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
 	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
-	    !apf_get_user(vcpu, &val)) {
-		if (val == KVM_PV_REASON_PAGE_NOT_PRESENT &&
-		    vcpu->arch.exception.pending &&
-		    vcpu->arch.exception.nr == PF_VECTOR &&
-		    !apf_put_user(vcpu, 0)) {
-			vcpu->arch.exception.injected = false;
-			vcpu->arch.exception.pending = false;
-			vcpu->arch.exception.nr = 0;
-			vcpu->arch.exception.has_error_code = false;
-			vcpu->arch.exception.error_code = 0;
-			vcpu->arch.exception.has_payload = false;
-			vcpu->arch.exception.payload = 0;
-		} else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
 			fault.vector = PF_VECTOR;
 			fault.error_code_valid = true;
 			fault.error_code = 0;
@@ -10459,7 +10446,6 @@  void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 			fault.address = work->arch.token;
 			fault.async_page_fault = true;
 			kvm_inject_page_fault(vcpu, &fault);
-		}
 	}
 	vcpu->arch.apf.halted = false;
 	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;