diff mbox series

KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block()

Message ID 168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org (mailing list archive)
State New, archived
Headers show
Series KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block() | expand

Commit Message

David Woodhouse Oct. 23, 2021, 7:47 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before
making a final check whether the vCPU should be woken from HLT by any
incoming interrupt.

This is a problem for the get_user() in __kvm_xen_has_interrupt(), which
really shouldn't be sleeping when the task state has already been set.
I think it's actually harmless as it would just manifest itself as a
spurious wakeup, but it's causing a debug warning:

[  230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80

Fix the warning by turning it into an *explicit* spurious wakeup. When
invoked with !task_is_running(current) (and we might as well add
in_atomic() there while we're at it), just return 1 to indicate that
an IRQ is pending, which will cause a wakeup and then something will
call it again in a context that *can* sleep so it can fault the page
back in.

Cc: stable@vger.kernel.org
Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

---
 arch/x86/kvm/xen.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

Comments

Vitaly Kuznetsov Oct. 25, 2021, 11:28 a.m. UTC | #1
David Woodhouse <dwmw2@infradead.org> writes:

> From: David Woodhouse <dwmw@amazon.co.uk>
>
> In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before
> making a final check whether the vCPU should be woken from HLT by any
> incoming interrupt.
>
> This is a problem for the get_user() in __kvm_xen_has_interrupt(), which
> really shouldn't be sleeping when the task state has already been set.
> I think it's actually harmless as it would just manifest itself as a
> spurious wakeup, but it's causing a debug warning:
>
> [  230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80
>
> Fix the warning by turning it into an *explicit* spurious wakeup. When
> invoked with !task_is_running(current) (and we might as well add
> in_atomic() there while we're at it), just return 1 to indicate that
> an IRQ is pending, which will cause a wakeup and then something will
> call it again in a context that *can* sleep so it can fault the page
> back in.
>
> Cc: stable@vger.kernel.org
> Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall")
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>
> ---
>  arch/x86/kvm/xen.c | 27 ++++++++++++++++++++++-----
>  1 file changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 9ea9c3dabe37..8f62baebd028 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -190,6 +190,7 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state)
>  
>  int __kvm_xen_has_interrupt(struct kvm_vcpu *v)
>  {
> +	int err;
>  	u8 rc = 0;
>  
>  	/*
> @@ -216,13 +217,29 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v)
>  	if (likely(slots->generation == ghc->generation &&
>  		   !kvm_is_error_hva(ghc->hva) && ghc->memslot)) {
>  		/* Fast path */
> -		__get_user(rc, (u8 __user *)ghc->hva + offset);
> -	} else {
> -		/* Slow path */
> -		kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset,
> -					     sizeof(rc));
> +		pagefault_disable();
> +		err = __get_user(rc, (u8 __user *)ghc->hva + offset);
> +		pagefault_enable();

This reminds me of copy_from_user_nofault() -- can we use it instead maybe?

> +		if (!err)
> +			return rc;
>  	}
>  
> +	/* Slow path */
> +
> +	/*
> +	 * This function gets called from kvm_vcpu_block() after setting the
> +	 * task to TASK_INTERRUPTIBLE, to see if it needs to wake immediately
> +	 * from a HLT. So we really mustn't sleep. If the page ended up absent
> +	 * at that point, just return 1 in order to trigger an immediate wake,
> +	 * and we'll end up getting called again from a context where we *can*
> +	 * fault in the page and wait for it.
> +	 */
> +	if (in_atomic() || !task_is_running(current))
> +		return 1;
> +
> +	kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset,
> +				     sizeof(rc));
> +
>  	return rc;
>  }
>  
>
>
Paolo Bonzini Oct. 25, 2021, 1:10 p.m. UTC | #2
On 23/10/21 21:47, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before
> making a final check whether the vCPU should be woken from HLT by any
> incoming interrupt.
> 
> This is a problem for the get_user() in __kvm_xen_has_interrupt(), which
> really shouldn't be sleeping when the task state has already been set.
> I think it's actually harmless as it would just manifest itself as a
> spurious wakeup, but it's causing a debug warning:
> 
> [  230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80
> 
> Fix the warning by turning it into an *explicit* spurious wakeup. When
> invoked with !task_is_running(current) (and we might as well add
> in_atomic() there while we're at it), just return 1 to indicate that
> an IRQ is pending, which will cause a wakeup and then something will
> call it again in a context that *can* sleep so it can fault the page
> back in.
> 
> Cc: stable@vger.kernel.org
> Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall")
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> 
> ---
>   arch/x86/kvm/xen.c | 27 ++++++++++++++++++++++-----
>   1 file changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 9ea9c3dabe37..8f62baebd028 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -190,6 +190,7 @@ void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state)
>   
>   int __kvm_xen_has_interrupt(struct kvm_vcpu *v)
>   {
> +	int err;
>   	u8 rc = 0;
>   
>   	/*
> @@ -216,13 +217,29 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v)
>   	if (likely(slots->generation == ghc->generation &&
>   		   !kvm_is_error_hva(ghc->hva) && ghc->memslot)) {
>   		/* Fast path */
> -		__get_user(rc, (u8 __user *)ghc->hva + offset);
> -	} else {
> -		/* Slow path */
> -		kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset,
> -					     sizeof(rc));
> +		pagefault_disable();
> +		err = __get_user(rc, (u8 __user *)ghc->hva + offset);
> +		pagefault_enable();
> +		if (!err)
> +			return rc;
>   	}
>   
> +	/* Slow path */
> +
> +	/*
> +	 * This function gets called from kvm_vcpu_block() after setting the
> +	 * task to TASK_INTERRUPTIBLE, to see if it needs to wake immediately
> +	 * from a HLT. So we really mustn't sleep. If the page ended up absent
> +	 * at that point, just return 1 in order to trigger an immediate wake,
> +	 * and we'll end up getting called again from a context where we *can*
> +	 * fault in the page and wait for it.
> +	 */
> +	if (in_atomic() || !task_is_running(current))
> +		return 1;
> +
> +	kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset,
> +				     sizeof(rc));
> +
>   	return rc;
>   }
>   
> 
> 

Queued, thanks.

Paolo
David Woodhouse Oct. 25, 2021, 1:18 p.m. UTC | #3
On Mon, 2021-10-25 at 13:28 +0200, Vitaly Kuznetsov wrote:
> > +             pagefault_disable();
> > +             err = __get_user(rc, (u8 __user *)ghc->hva + offset);
> > +             pagefault_enable();
> 
> This reminds me of copy_from_user_nofault() -- can we use it instead maybe?

That's a lot of extra out of line function calls and redundant (I
believe) setup/checks, and would make the fast path fairly pointless
for the purpose it was *originally* introduced, which was to optimise
the case where we're entering the vCPU and just want to check
vcpu_info->evtchn_upcall_pending with a simple (fault-handled)
dereference.
diff mbox series

Patch

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 9ea9c3dabe37..8f62baebd028 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -190,6 +190,7 @@  void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, int state)
 
 int __kvm_xen_has_interrupt(struct kvm_vcpu *v)
 {
+	int err;
 	u8 rc = 0;
 
 	/*
@@ -216,13 +217,29 @@  int __kvm_xen_has_interrupt(struct kvm_vcpu *v)
 	if (likely(slots->generation == ghc->generation &&
 		   !kvm_is_error_hva(ghc->hva) && ghc->memslot)) {
 		/* Fast path */
-		__get_user(rc, (u8 __user *)ghc->hva + offset);
-	} else {
-		/* Slow path */
-		kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset,
-					     sizeof(rc));
+		pagefault_disable();
+		err = __get_user(rc, (u8 __user *)ghc->hva + offset);
+		pagefault_enable();
+		if (!err)
+			return rc;
 	}
 
+	/* Slow path */
+
+	/*
+	 * This function gets called from kvm_vcpu_block() after setting the
+	 * task to TASK_INTERRUPTIBLE, to see if it needs to wake immediately
+	 * from a HLT. So we really mustn't sleep. If the page ended up absent
+	 * at that point, just return 1 in order to trigger an immediate wake,
+	 * and we'll end up getting called again from a context where we *can*
+	 * fault in the page and wait for it.
+	 */
+	if (in_atomic() || !task_is_running(current))
+		return 1;
+
+	kvm_read_guest_offset_cached(v->kvm, ghc, &rc, offset,
+				     sizeof(rc));
+
 	return rc;
 }