Message ID | 17EC94B0A072C34B8DCF0D30AD16044A02874794@BPXM09GP.gisp.nec.co.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/08/2015 05:35 PM, Kosuke Tatsukawa wrote: > btrfs_bio_counter_sub() seems to be missing a memory barrier which might > cause the waker to not notice the waiter and miss sending a wake_up as > in the following figure. > > btrfs_bio_counter_sub btrfs_rm_dev_replace_blocked > ------------------------------------------------------------------------ > if (waitqueue_active(&fs_info->replace_wait)) > /* The CPU might reorder the test for > the waitqueue up here, before > prior writes complete */ > /* wait_event */ > /* __wait_event */ > /* ___wait_event */ > long __int = prepare_to_wait_event(&wq, > &__wait, state); > if (!percpu_counter_sum(&fs_info->bio_counter)) > percpu_counter_sub(&fs_info->bio_counter, > amount); > schedule() percpu_counter_sub can't be reordered, in its most basic form it does preempt_disable/enable which in its most basic form does barrier(). Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik wrote: > On 10/08/2015 05:35 PM, Kosuke Tatsukawa wrote: >> btrfs_bio_counter_sub() seems to be missing a memory barrier which might >> cause the waker to not notice the waiter and miss sending a wake_up as >> in the following figure. >> >> btrfs_bio_counter_sub btrfs_rm_dev_replace_blocked >> ------------------------------------------------------------------------ >> if (waitqueue_active(&fs_info->replace_wait)) >> /* The CPU might reorder the test for >> the waitqueue up here, before >> prior writes complete */ >> /* wait_event */ >> /* __wait_event */ >> /* ___wait_event */ >> long __int = prepare_to_wait_event(&wq, >> &__wait, state); >> if (!percpu_counter_sum(&fs_info->bio_counter)) >> percpu_counter_sub(&fs_info->bio_counter, >> amount); >> schedule() > > percpu_counter_sub can't be reordered, in its most basic form it does > preempt_disable/enable which in its most basic form does barrier(). Thanks, It's not the compiler, but the CPU that is doing the reordering. The CPU can delay the write of the counter, so that the following read of &fs_info->replace_wait is completed first. Hence a memory barrier is required, and not just a barrier. --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 09, 2015 at 12:35:48AM +0000, Kosuke Tatsukawa wrote: > This patch removes the call to waitqueue_active() leaving just wake_up() > behind. This fixes the problem because the call to spin_lock_irqsave() > in wake_up() will be an ACQUIRE operation. Either we can switch it to wake_up or put the barrier before the check. Not all instances of waitqueue_active need the barrier though. > I found this issue when I was looking through the linux source code > for places calling waitqueue_active() before wake_up*(), but without > preceding memory barriers, after sending a patch to fix a similar > issue in drivers/tty/n_tty.c (Details about the original issue can be > found here: https://lkml.org/lkml/2015/9/28/849). There are more in btrfs: https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg41914.html > @@ -918,9 +918,7 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info) > void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount) > { > percpu_counter_sub(&fs_info->bio_counter, amount); > - > - if (waitqueue_active(&fs_info->replace_wait)) > - wake_up(&fs_info->replace_wait); > + wake_up(&fs_info->replace_wait); Chris had a comment on that one in https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg42551.html it's in performance critial context and the explicit wake_up is even worse than the barrier. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba wrote: > On Fri, Oct 09, 2015 at 12:35:48AM +0000, Kosuke Tatsukawa wrote: >> This patch removes the call to waitqueue_active() leaving just wake_up() >> behind. This fixes the problem because the call to spin_lock_irqsave() >> in wake_up() will be an ACQUIRE operation. > > Either we can switch it to wake_up or put the barrier before the check. > Not all instances of waitqueue_active need the barrier though. > >> I found this issue when I was looking through the linux source code >> for places calling waitqueue_active() before wake_up*(), but without >> preceding memory barriers, after sending a patch to fix a similar >> issue in drivers/tty/n_tty.c (Details about the original issue can be >> found here: https://lkml.org/lkml/2015/9/28/849). > > There are more in btrfs: > > https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg41914.html Thank you for the pointer. Your patch seems better than mine. I think the other places in btrfs that use waitqueue_active() before wake_up are preceded by either a smp_mb or some kind of atomic operation. The latter still needs smp_mb__after_atomic() but it's light-weight compared to smp_mb(). >> @@ -918,9 +918,7 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info) >> void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount) >> { >> percpu_counter_sub(&fs_info->bio_counter, amount); >> - >> - if (waitqueue_active(&fs_info->replace_wait)) >> - wake_up(&fs_info->replace_wait); >> + wake_up(&fs_info->replace_wait); > > Chris had a comment on that one in > https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg42551.html > it's in performance critial context and the explicit wake_up is even > worse than the barrier. --- Kosuke TATSUKAWA | 3rd IT Platform Department | IT Platform Division, NEC Corporation | tatsu@ab.jp.nec.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index e54dd59..ecb3e71 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -918,9 +918,7 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info) void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount) { percpu_counter_sub(&fs_info->bio_counter, amount); - - if (waitqueue_active(&fs_info->replace_wait)) - wake_up(&fs_info->replace_wait); + wake_up(&fs_info->replace_wait); } void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
btrfs_bio_counter_sub() seems to be missing a memory barrier which might cause the waker to not notice the waiter and miss sending a wake_up as in the following figure. btrfs_bio_counter_sub btrfs_rm_dev_replace_blocked ------------------------------------------------------------------------ if (waitqueue_active(&fs_info->replace_wait)) /* The CPU might reorder the test for the waitqueue up here, before prior writes complete */ /* wait_event */ /* __wait_event */ /* ___wait_event */ long __int = prepare_to_wait_event(&wq, &__wait, state); if (!percpu_counter_sum(&fs_info->bio_counter)) percpu_counter_sub(&fs_info->bio_counter, amount); schedule() ------------------------------------------------------------------------ This patch removes the call to waitqueue_active() leaving just wake_up() behind. This fixes the problem because the call to spin_lock_irqsave() in wake_up() will be an ACQUIRE operation. I found this issue when I was looking through the linux source code for places calling waitqueue_active() before wake_up*(), but without preceding memory barriers, after sending a patch to fix a similar issue in drivers/tty/n_tty.c (Details about the original issue can be found here: https://lkml.org/lkml/2015/9/28/849). Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> --- fs/btrfs/dev-replace.c | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-)