diff mbox series

btrfs: stop incrementing log batch when joining log transaction

Message ID 5c0655d43b2809932ec8aa40d99b94295469e3f1.1605266377.git.fdmanana@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: stop incrementing log batch when joining log transaction | expand

Commit Message

Filipe Manana Nov. 13, 2020, 11:23 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

When joining a log transaction we acquire the root's log mutex, then
increment the root's log batch and log writers counters while holding
the mutex. However we don't need to increment the log batch there,
because we are holding the mutex and incremented the log writers counter
as well, so any other task trying to sync log will wait for the current
task to finish its logging and still achieve the desired log batching.

Since the log batch counter is an atomic counter and is incremented twice
at the very beginning of the fsync callback (btrfs_sync_file()), once
before flushing delalloc and once again after waiting for writeback to
complete, eliminating its increment when joining the log transaction
may provide some performance gains in case we have multiple concurrent
tasks doing fsyncs against different files in the same subvolume, as it
reduces contention on the atomic (locking the cacheline and bouncing it).

When testing fio with 32 jobs, on a 8 cores vm, doing fsyncs against
different files of the same subvolume, on top of a zram device, I could
consistently see gains (higher throughput) between 1% to 2%, which is a
very low value and possibly hard to be observed with a real device (I
couldn't observe consistent gains with my low/mid end NVMe device).
So this change is mostly motivated to just simplify the logic, as updating
the log batch counter is only relevant when an fsync starts and while not
holding the root's log mutex.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-log.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Josef Bacik Nov. 13, 2020, 6:11 p.m. UTC | #1
On 11/13/20 6:23 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When joining a log transaction we acquire the root's log mutex, then
> increment the root's log batch and log writers counters while holding
> the mutex. However we don't need to increment the log batch there,
> because we are holding the mutex and incremented the log writers counter
> as well, so any other task trying to sync log will wait for the current
> task to finish its logging and still achieve the desired log batching.
> 
> Since the log batch counter is an atomic counter and is incremented twice
> at the very beginning of the fsync callback (btrfs_sync_file()), once
> before flushing delalloc and once again after waiting for writeback to
> complete, eliminating its increment when joining the log transaction
> may provide some performance gains in case we have multiple concurrent
> tasks doing fsyncs against different files in the same subvolume, as it
> reduces contention on the atomic (locking the cacheline and bouncing it).
> 
> When testing fio with 32 jobs, on a 8 cores vm, doing fsyncs against
> different files of the same subvolume, on top of a zram device, I could
> consistently see gains (higher throughput) between 1% to 2%, which is a
> very low value and possibly hard to be observed with a real device (I
> couldn't observe consistent gains with my low/mid end NVMe device).
> So this change is mostly motivated to just simplify the logic, as updating
> the log batch counter is only relevant when an fsync starts and while not
> holding the root's log mutex.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

You had me at "simplify the logic"

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
David Sterba Nov. 16, 2020, 3:29 p.m. UTC | #2
On Fri, Nov 13, 2020 at 11:23:30AM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When joining a log transaction we acquire the root's log mutex, then
> increment the root's log batch and log writers counters while holding
> the mutex. However we don't need to increment the log batch there,
> because we are holding the mutex and incremented the log writers counter
> as well, so any other task trying to sync log will wait for the current
> task to finish its logging and still achieve the desired log batching.
> 
> Since the log batch counter is an atomic counter and is incremented twice
> at the very beginning of the fsync callback (btrfs_sync_file()), once
> before flushing delalloc and once again after waiting for writeback to
> complete, eliminating its increment when joining the log transaction
> may provide some performance gains in case we have multiple concurrent
> tasks doing fsyncs against different files in the same subvolume, as it
> reduces contention on the atomic (locking the cacheline and bouncing it).
> 
> When testing fio with 32 jobs, on a 8 cores vm, doing fsyncs against
> different files of the same subvolume, on top of a zram device, I could
> consistently see gains (higher throughput) between 1% to 2%, which is a
> very low value and possibly hard to be observed with a real device (I
> couldn't observe consistent gains with my low/mid end NVMe device).
> So this change is mostly motivated to just simplify the logic, as updating
> the log batch counter is only relevant when an fsync starts and while not
> holding the root's log mutex.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Added to misc-next, thanks.
diff mbox series

Patch

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 17a08a7bf6cc..bc3396dfec10 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -172,7 +172,6 @@  static int start_log_trans(struct btrfs_trans_handle *trans,
 		root->log_start_pid = current->pid;
 	}
 
-	atomic_inc(&root->log_batch);
 	atomic_inc(&root->log_writers);
 	if (ctx && !ctx->logging_new_name) {
 		int index = root->log_transid % 2;