Message ID | 17EC94B0A072C34B8DCF0D30AD16044A028747C1@BPXM09GP.gisp.nec.co.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 09/10/2015 02:35, Kosuke Tatsukawa wrote: > async_pf_execute kvm_vcpu_block > ------------------------------------------------------------------------ > spin_lock(&vcpu->async_pf.lock); > if (waitqueue_active(&vcpu->wq)) > /* The CPU might reorder the test for > the waitqueue up here, before > prior writes complete */ > prepare_to_wait(&vcpu->wq, &wait, > TASK_INTERRUPTIBLE); > /*if (kvm_vcpu_check_block(vcpu) < 0) */ > /*if (kvm_arch_vcpu_runnable(vcpu)) { */ > ... > return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && > !vcpu->arch.apf.halted) > || !list_empty_careful(&vcpu->async_pf.done) > ... The new memory barrier isn't "paired" with any other, and in fact I think that the same issue exists on the other side: list_empty_careful(&vcpu->async_pf.done) may be reordered up, before the prepare_to_wait: spin_lock(&vcpu->async_pf.lock); (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && !vcpu->arch.apf.halted) || !list_empty_careful(&vcpu->async_pf.done) ... prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); /*if (kvm_vcpu_check_block(vcpu) < 0) */ /*if (kvm_arch_vcpu_runnable(vcpu)) { */ ... return 0; list_add_tail(&apf->link, &vcpu->async_pf.done); spin_unlock(&vcpu->async_pf.lock); waited = true; schedule(); if (waitqueue_active(&vcpu->wq)) So you need another smp_mb() after prepare_to_wait(). I'm not sure if it's needed also for your original tty report, but I think it is for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active without memory barrier in mei drivers"). I wonder if it makes sense to introduce smp_mb__before_spin_lock() and smp_mb__after_spin_unlock(). On x86 the former could be a simple compiler barrier, and on s390 both of them could. But that should be a separate patch. Thanks, Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 09, 2015 at 10:45:32AM +0200, Paolo Bonzini wrote: > So you need another smp_mb() after prepare_to_wait(). I'm not sure > if it's needed also for your original tty report, but I think it is > for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active > without memory barrier in mei drivers"). > > I wonder if it makes sense to introduce smp_mb__before_spin_lock() > and smp_mb__after_spin_unlock(). On x86 the former could be a > simple compiler barrier, and on s390 both of them could. But that > should be a separate patch. Not having actually read or thought about the issue at hand, its perfectly valid to pair an smp_mb() with either spin_lock() or spin_unlock(). IOW. MB <-> {ACQUIRE, RELEASE} is a valid pairing. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Paolo Bonzini wrote: > On 09/10/2015 02:35, Kosuke Tatsukawa wrote: >> async_pf_execute kvm_vcpu_block >> ------------------------------------------------------------------------ >> spin_lock(&vcpu->async_pf.lock); >> if (waitqueue_active(&vcpu->wq)) >> /* The CPU might reorder the test for >> the waitqueue up here, before >> prior writes complete */ >> prepare_to_wait(&vcpu->wq, &wait, >> TASK_INTERRUPTIBLE); >> /*if (kvm_vcpu_check_block(vcpu) < 0) */ >> /*if (kvm_arch_vcpu_runnable(vcpu)) { */ >> ... >> return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && >> !vcpu->arch.apf.halted) >> || !list_empty_careful(&vcpu->async_pf.done) >> ... > > The new memory barrier isn't "paired" with any other, and in > fact I think that the same issue exists on the other side: > list_empty_careful(&vcpu->async_pf.done) may be reordered up, > before the prepare_to_wait: smp_store_mb() called from set_current_state(), which is called from prepare_to_wait() should prevent reordering such as below from happening. wait_event*() also calls set_current_state() inside. > spin_lock(&vcpu->async_pf.lock); > (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && > !vcpu->arch.apf.halted) > || !list_empty_careful(&vcpu->async_pf.done) > ... > prepare_to_wait(&vcpu->wq, &wait, > TASK_INTERRUPTIBLE); > /*if (kvm_vcpu_check_block(vcpu) < 0) */ > /*if (kvm_arch_vcpu_runnable(vcpu)) { */ > ... > return 0; > list_add_tail(&apf->link, > &vcpu->async_pf.done); > spin_unlock(&vcpu->async_pf.lock); > waited = true; > schedule(); > if (waitqueue_active(&vcpu->wq)) > > So you need another smp_mb() after prepare_to_wait(). I'm not sure > if it's needed also for your original tty report, but I think it is > for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active > without memory barrier in mei drivers"). > > I wonder if it makes sense to introduce smp_mb__before_spin_lock() > and smp_mb__after_spin_unlock(). On x86 the former could be a > simple compiler barrier, and on s390 both of them could. But that > should be a separate patch. The problem on the waiter side occurs if add_wait_queue() or __add_wait_queue() is directly called without memory barriers nor locks protecting it. > Thanks, > > Paolo --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/10/2015 10:50, Peter Zijlstra wrote: > Not having actually read or thought about the issue at hand, its > perfectly valid to pair an smp_mb() with either spin_lock() or > spin_unlock(). > > IOW. MB <-> {ACQUIRE, RELEASE} is a valid pairing. In this case it's an smp_mb() (store-load barrier) being paired with another smp_mb(), so spin_unlock()'s release is not enough. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/10/2015 11:04, Kosuke Tatsukawa wrote: > smp_store_mb() called from set_current_state(), which is called from > prepare_to_wait() should prevent reordering such as below from > happening. wait_event*() also calls set_current_state() inside. Ah, I missed that set_current_state has a memory barrier in it. The patch is okay, but please expand the comment to say that this memory barrier pairs with prepare_to_wait's set_current_state(). Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Paolo Bonzini wrote: > On 09/10/2015 11:04, Kosuke Tatsukawa wrote: >> smp_store_mb() called from set_current_state(), which is called from >> prepare_to_wait() should prevent reordering such as below from >> happening. wait_event*() also calls set_current_state() inside. > > Ah, I missed that set_current_state has a memory barrier in it. The > patch is okay, but please expand the comment to say that this memory > barrier pairs with prepare_to_wait's set_current_state(). thank you for the comment. I'll send a new version of the patch with the modification. --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index 44660ae..303fc7f 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -94,6 +94,11 @@ static void async_pf_execute(struct work_struct *work) trace_kvm_async_pf_completed(addr, gva); + /* + * Memory barrier is required here to make sure change to + * vcpu->async_pf.done is visible from other CPUs + */ + smp_mb(); if (waitqueue_active(&vcpu->wq)) wake_up_interruptible(&vcpu->wq);
async_pf_execute() seems to be missing a memory barrier which might cause the waker to not notice the waiter and miss sending a wake_up as in the following figure. async_pf_execute kvm_vcpu_block ------------------------------------------------------------------------ spin_lock(&vcpu->async_pf.lock); if (waitqueue_active(&vcpu->wq)) /* The CPU might reorder the test for the waitqueue up here, before prior writes complete */ prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); /*if (kvm_vcpu_check_block(vcpu) < 0) */ /*if (kvm_arch_vcpu_runnable(vcpu)) { */ ... return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && !vcpu->arch.apf.halted) || !list_empty_careful(&vcpu->async_pf.done) ... return 0; list_add_tail(&apf->link, &vcpu->async_pf.done); spin_unlock(&vcpu->async_pf.lock); waited = true; schedule(); ------------------------------------------------------------------------ The attached patch adds the missing memory barrier. I found this issue when I was looking through the linux source code for places calling waitqueue_active() before wake_up*(), but without preceding memory barriers, after sending a patch to fix a similar issue in drivers/tty/n_tty.c (Details about the original issue can be found here: https://lkml.org/lkml/2015/9/28/849). Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> --- virt/kvm/async_pf.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-)