diff mbox

[v2] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c

Message ID 17EC94B0A072C34B8DCF0D30AD16044A02874DB5@BPXM09GP.gisp.nec.co.jp (mailing list archive)
State New, archived
Headers show

Commit Message

Kosuke Tatsukawa Oct. 9, 2015, 12:21 p.m. UTC
async_pf_execute() seems to be missing a memory barrier which might
cause the waker to not notice the waiter and miss sending a wake_up as
in the following figure.

        async_pf_execute                    kvm_vcpu_block
------------------------------------------------------------------------
spin_lock(&vcpu->async_pf.lock);
if (waitqueue_active(&vcpu->wq))
/* The CPU might reorder the test for
   the waitqueue up here, before
   prior writes complete */
                                    prepare_to_wait(&vcpu->wq, &wait,
                                      TASK_INTERRUPTIBLE);
                                    /*if (kvm_vcpu_check_block(vcpu) < 0) */
                                     /*if (kvm_arch_vcpu_runnable(vcpu)) { */
                                      ...
                                      return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
                                        !vcpu->arch.apf.halted)
                                        || !list_empty_careful(&vcpu->async_pf.done)
                                     ...
                                     return 0;
list_add_tail(&apf->link,
  &vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
                                    waited = true;
                                    schedule();
------------------------------------------------------------------------

The attached patch adds the missing memory barrier.

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c  (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
---
v2:
  - Fixed comment based on feedback from Paolo
v1:
  - https://lkml.org/lkml/2015/10/8/994
---
 virt/kvm/async_pf.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

Comments

Peter Zijlstra Oct. 9, 2015, 12:55 p.m. UTC | #1
On Fri, Oct 09, 2015 at 12:21:55PM +0000, Kosuke Tatsukawa wrote:

> +	 * Memory barrier is required here to make sure change to
> +	 * vcpu->async_pf.done is visible from other CPUs.  This memory
> +	 * barrier pairs with prepare_to_wait's set_current_state()

That is not how memory barriers work; they don't 'make visible'. They
simply ensure order between operations.

  X = Y = 0

	CPU0			CPU1

	[S] X=1			[S] Y=1
	 MB			 MB
	[L] y=Y			[L] x=X

  assert(x || y)

The issue of the memory barrier does not mean the store is visible, it
merely means that the load _must_ happen after the store (in the above
scenario).

This gives a guarantee that not both x and y can be 0. Because either
being 0, means the other has not yet executed and must therefore observe
your store.

Nothing more, nothing less.

So your comment is misleading at best.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Oct. 9, 2015, 1:56 p.m. UTC | #2
On 09/10/2015 14:55, Peter Zijlstra wrote:
> On Fri, Oct 09, 2015 at 12:21:55PM +0000, Kosuke Tatsukawa wrote:
> 
>> +	 * Memory barrier is required here to make sure change to
>> +	 * vcpu->async_pf.done is visible from other CPUs.  This memory
>> +	 * barrier pairs with prepare_to_wait's set_current_state()
> 
> That is not how memory barriers work; they don't 'make visible'. They
> simply ensure order between operations.
> 
>   X = Y = 0
> 
> 	CPU0			CPU1
> 
> 	[S] X=1			[S] Y=1
> 	 MB			 MB
> 	[L] y=Y			[L] x=X
> 
>   assert(x || y)
> 
> The issue of the memory barrier does not mean the store is visible, it
> merely means that the load _must_ happen after the store (in the above
> scenario).
> 
> This gives a guarantee that not both x and y can be 0. Because either
> being 0, means the other has not yet executed and must therefore observe
> your store.
> 
> Nothing more, nothing less.
> 
> So your comment is misleading at best.

Yeah, I will just reword the comment when applying.  Thanks Kosuke-san
for your contribution!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
William Dauchy Nov. 6, 2015, 1:02 p.m. UTC | #3
Hi Paolo,

Looking at the history of this function, is it reasonable to say it
fixes the following commit?
af585b9 KVM: Halt vcpu if page it tries to access is swapped out

Does it make it a good candidate for -stable?

Thanks,

On Oct09 12:21, Kosuke Tatsukawa wrote:
> async_pf_execute() seems to be missing a memory barrier which might
> cause the waker to not notice the waiter and miss sending a wake_up as
> in the following figure.
> 
>         async_pf_execute                    kvm_vcpu_block
> ------------------------------------------------------------------------
> spin_lock(&vcpu->async_pf.lock);
> if (waitqueue_active(&vcpu->wq))
> /* The CPU might reorder the test for
>    the waitqueue up here, before
>    prior writes complete */
>                                     prepare_to_wait(&vcpu->wq, &wait,
>                                       TASK_INTERRUPTIBLE);
>                                     /*if (kvm_vcpu_check_block(vcpu) < 0) */
>                                      /*if (kvm_arch_vcpu_runnable(vcpu)) { */
>                                       ...
>                                       return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>                                         !vcpu->arch.apf.halted)
>                                         || !list_empty_careful(&vcpu->async_pf.done)
>                                      ...
>                                      return 0;
> list_add_tail(&apf->link,
>   &vcpu->async_pf.done);
> spin_unlock(&vcpu->async_pf.lock);
>                                     waited = true;
>                                     schedule();
> ------------------------------------------------------------------------
> 
> The attached patch adds the missing memory barrier.
> 
> I found this issue when I was looking through the linux source code
> for places calling waitqueue_active() before wake_up*(), but without
> preceding memory barriers, after sending a patch to fix a similar
> issue in drivers/tty/n_tty.c  (Details about the original issue can be
> found here: https://lkml.org/lkml/2015/9/28/849).
> 
> Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
> ---
> v2:
>   - Fixed comment based on feedback from Paolo
> v1:
>   - https://lkml.org/lkml/2015/10/8/994
> ---
>  virt/kvm/async_pf.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index 44660ae..a0999d7 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -94,6 +94,12 @@ static void async_pf_execute(struct work_struct *work)
>  
>  	trace_kvm_async_pf_completed(addr, gva);
>  
> +	/*
> +	 * Memory barrier is required here to make sure change to
> +	 * vcpu->async_pf.done is visible from other CPUs.  This memory
> +	 * barrier pairs with prepare_to_wait's set_current_state()
> +	 */
> +	smp_mb();
>  	if (waitqueue_active(&vcpu->wq))
>  		wake_up_interruptible(&vcpu->wq);
>  
> -- 
> 1.7.1
Paolo Bonzini Nov. 6, 2015, 4:30 p.m. UTC | #4
On 06/11/2015 14:02, William Dauchy wrote:
> Hi Paolo,
> 
> Looking at the history of this function, is it reasonable to say
> it fixes the following commit? af585b9 KVM: Halt vcpu if page it
> tries to access is swapped out
> 
> Does it make it a good candidate for -stable?

It's just a theoretical race, with no known reproducer, so it's
explicitly "forbidden" by the -stable rules.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 44660ae..a0999d7 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -94,6 +94,12 @@  static void async_pf_execute(struct work_struct *work)
 
 	trace_kvm_async_pf_completed(addr, gva);
 
+	/*
+	 * Memory barrier is required here to make sure change to
+	 * vcpu->async_pf.done is visible from other CPUs.  This memory
+	 * barrier pairs with prepare_to_wait's set_current_state()
+	 */
+	smp_mb();
 	if (waitqueue_active(&vcpu->wq))
 		wake_up_interruptible(&vcpu->wq);