diff mbox

btrfs: Relax memory barrier in btrfs_tree_unlock

Message ID 1518611846-26918-1-git-send-email-nborisov@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nikolay Borisov Feb. 14, 2018, 12:37 p.m. UTC
When performing an unlock on an extent buffer we'd like to order the
decrement of extent_buffer::blocking_writers with waking up any
waiters. In such situations it's sufficient to use smp_mb__after_atomic
rather than the heavy smp_mb. On architectures where atomic operations
are fully ordered (such as x86 or s390) unconditionally executing
a heavyweight smp_mb instruction causes a severe hit to performance
while bringin no improvements in terms of correctness.

The better thing is to use the appropriate smp_mb__after_atomic routine
which will do the correct thing (invoke a full smp_mb or in the case
of ordered atomics insert a compiler barrier). Put another way,
an RMW atomic op + smp_load__after_atomic equals, in terms of
semantics, to a full smp_mb. This ensures that none of the problems
described in the accompanying comment of waitqueue_active occur.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/locking.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Sterba Feb. 24, 2018, 12:14 a.m. UTC | #1
On Wed, Feb 14, 2018 at 02:37:26PM +0200, Nikolay Borisov wrote:
> When performing an unlock on an extent buffer we'd like to order the
> decrement of extent_buffer::blocking_writers with waking up any
> waiters. In such situations it's sufficient to use smp_mb__after_atomic
> rather than the heavy smp_mb. On architectures where atomic operations
> are fully ordered (such as x86 or s390) unconditionally executing
> a heavyweight smp_mb instruction causes a severe hit to performance
> while bringin no improvements in terms of correctness.

Have you measured this severe performance hit? There is an impact, but I
doubt you'll ever notice it in the profiles given where the
btrfs_tree_unlock appears.

> The better thing is to use the appropriate smp_mb__after_atomic routine
> which will do the correct thing (invoke a full smp_mb or in the case
> of ordered atomics insert a compiler barrier). Put another way,
> an RMW atomic op + smp_load__after_atomic equals, in terms of
> semantics, to a full smp_mb. This ensures that none of the problems
> described in the accompanying comment of waitqueue_active occur.
> No functional changes.

I tend to agree.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov Feb. 24, 2018, 10:59 a.m. UTC | #2
On 24.02.2018 02:14, David Sterba wrote:
> On Wed, Feb 14, 2018 at 02:37:26PM +0200, Nikolay Borisov wrote:
>> When performing an unlock on an extent buffer we'd like to order the
>> decrement of extent_buffer::blocking_writers with waking up any
>> waiters. In such situations it's sufficient to use smp_mb__after_atomic
>> rather than the heavy smp_mb. On architectures where atomic operations
>> are fully ordered (such as x86 or s390) unconditionally executing
>> a heavyweight smp_mb instruction causes a severe hit to performance
>> while bringin no improvements in terms of correctness.
> 
> Have you measured this severe performance hit? There is an impact, but I
> doubt you'll ever notice it in the profiles given where the
> btrfs_tree_unlock appears.

Admittedly I haven't :) But I'd say "every little bit helps"

> 
>> The better thing is to use the appropriate smp_mb__after_atomic routine
>> which will do the correct thing (invoke a full smp_mb or in the case
>> of ordered atomics insert a compiler barrier). Put another way,
>> an RMW atomic op + smp_load__after_atomic equals, in terms of
>> semantics, to a full smp_mb. This ensures that none of the problems
>> described in the accompanying comment of waitqueue_active occur.
>> No functional changes.
> 
> I tend to agree.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba March 7, 2018, 4:05 p.m. UTC | #3
On Wed, Feb 14, 2018 at 02:37:26PM +0200, Nikolay Borisov wrote:
> When performing an unlock on an extent buffer we'd like to order the
> decrement of extent_buffer::blocking_writers with waking up any
> waiters. In such situations it's sufficient to use smp_mb__after_atomic
> rather than the heavy smp_mb. On architectures where atomic operations
> are fully ordered (such as x86 or s390) unconditionally executing
> a heavyweight smp_mb instruction causes a severe hit to performance
> while bringin no improvements in terms of correctness.
> 
> The better thing is to use the appropriate smp_mb__after_atomic routine
> which will do the correct thing (invoke a full smp_mb or in the case
> of ordered atomics insert a compiler barrier). Put another way,
> an RMW atomic op + smp_load__after_atomic equals, in terms of
> semantics, to a full smp_mb. This ensures that none of the problems
> described in the accompanying comment of waitqueue_active occur.
> No functional changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index d13128c70ddd..621083f8932c 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -290,7 +290,7 @@  void btrfs_tree_unlock(struct extent_buffer *eb)
 		/*
 		 * Make sure counter is updated before we wake up waiters.
 		 */
-		smp_mb();
+		smp_mb__after_atomic();
 		if (waitqueue_active(&eb->write_lock_wq))
 			wake_up(&eb->write_lock_wq);
 	} else {