Message ID | 20200702113240.171572-1-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/4] btrfs: only commit the delayed inode when doing a full fsync | expand |
On 7/2/20 7:32 AM, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When syncing the log, we used to update the log root tree without holding > neither the log_mutex of the subvolume root nor the log_mutex of log root > tree. > > We used to have two critical sections delimited by the log_mutex of the > log root tree, so in the first one we incremented the log_writers of the > log root tree and on the second one we decremented it and waited for the > log_writers counter to go down to zero. This was because the update of > the log root tree happened between the two critical sections. > > The use of two critical sections allowed a little bit more of parallelism > and required the use of the log_writers counter, necessary to make sure > we didn't miss any log root tree update when we have multiple tasks trying > to sync the log in parallel. > > However after commit 06989c799f0481 ("Btrfs: fix race updating log root > item during fsync") the log root tree update was moved into a critical > section delimited by the subvolume's log_mutex. Later another commit > moved the log tree update from that critical section into the second > critical section delimited by the log_mutex of the log root tree. Both > commits addressed different bugs. > > The end result is that the first critical section delimited by the > log_mutex of the log root tree became pointless, since there's nothing > done between it and the second critical section, we just have an unlock > of the log_mutex followed by a lock operation. This means we can merge > both critical sections, as the first one does almost nothing now, and we > can stop using the log_writers counter of the log root tree, which was > incremented in the first critical section and decremented in the second > criticial section, used to make sure no one in the second critical section > started writeback of the log root tree before some other task updated it. > > So just remove the mutex_unlock() followed by mutex_lock() of the log root > tree, as well as the use of the log_writers counter for the log root tree. > > This patch is part of a series that has the following patches: > > 1/4 btrfs: only commit the delayed inode when doing a full fsync > 2/4 btrfs: only commit delayed items at fsync if we are logging a directory > 3/4 btrfs: stop incremening log_batch for the log root tree when syncing log > 4/4 btrfs: remove no longer needed use of log_writers for the log root tree > > After the entire patchset applied I saw about 12% decrease on max latency > reported by dbench. The test was done on a qemu vm, with 8 cores, 16Gb of > ram, using kvm and using a raw NVMe device directly (no intermediary fs on > the host). The test was invoked like the following: > > mkfs.btrfs -f /dev/sdk > mount -o ssd -o nospace_cache /dev/sdk /mnt/sdk > dbench -D /mnt/sdk -t 300 8 > umount /mnt/dsk > > Signed-off-by: Filipe Manana <fdmanana@suse.com> Lol oops, did I leave it like that? Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Thu, Jul 2, 2020 at 2:04 PM Josef Bacik <josef@toxicpanda.com> wrote: > > On 7/2/20 7:32 AM, fdmanana@kernel.org wrote: > > From: Filipe Manana <fdmanana@suse.com> > > > > When syncing the log, we used to update the log root tree without holding > > neither the log_mutex of the subvolume root nor the log_mutex of log root > > tree. > > > > We used to have two critical sections delimited by the log_mutex of the > > log root tree, so in the first one we incremented the log_writers of the > > log root tree and on the second one we decremented it and waited for the > > log_writers counter to go down to zero. This was because the update of > > the log root tree happened between the two critical sections. > > > > The use of two critical sections allowed a little bit more of parallelism > > and required the use of the log_writers counter, necessary to make sure > > we didn't miss any log root tree update when we have multiple tasks trying > > to sync the log in parallel. > > > > However after commit 06989c799f0481 ("Btrfs: fix race updating log root > > item during fsync") the log root tree update was moved into a critical > > section delimited by the subvolume's log_mutex. Later another commit > > moved the log tree update from that critical section into the second > > critical section delimited by the log_mutex of the log root tree. Both > > commits addressed different bugs. > > > > The end result is that the first critical section delimited by the > > log_mutex of the log root tree became pointless, since there's nothing > > done between it and the second critical section, we just have an unlock > > of the log_mutex followed by a lock operation. This means we can merge > > both critical sections, as the first one does almost nothing now, and we > > can stop using the log_writers counter of the log root tree, which was > > incremented in the first critical section and decremented in the second > > criticial section, used to make sure no one in the second critical section > > started writeback of the log root tree before some other task updated it. > > > > So just remove the mutex_unlock() followed by mutex_lock() of the log root > > tree, as well as the use of the log_writers counter for the log root tree. > > > > This patch is part of a series that has the following patches: > > > > 1/4 btrfs: only commit the delayed inode when doing a full fsync > > 2/4 btrfs: only commit delayed items at fsync if we are logging a directory > > 3/4 btrfs: stop incremening log_batch for the log root tree when syncing log > > 4/4 btrfs: remove no longer needed use of log_writers for the log root tree > > > > After the entire patchset applied I saw about 12% decrease on max latency > > reported by dbench. The test was done on a qemu vm, with 8 cores, 16Gb of > > ram, using kvm and using a raw NVMe device directly (no intermediary fs on > > the host). The test was invoked like the following: > > > > mkfs.btrfs -f /dev/sdk > > mount -o ssd -o nospace_cache /dev/sdk /mnt/sdk > > dbench -D /mnt/sdk -t 300 8 > > umount /mnt/dsk > > > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > Lol oops, did I leave it like that? Well, before you moved the log root tree update, I moved it first. So back then I didn't notice it or left it for later and then forgot. And later you missed it too, so we're even ;) > > Reviewed-by: Josef Bacik <josef@toxicpanda.com> > > Thanks, > > Josef
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a79e8164b26b..8544d5a25fa2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1060,6 +1060,7 @@ struct btrfs_root { wait_queue_head_t log_writer_wait; wait_queue_head_t log_commit_wait[2]; struct list_head log_ctxs[2]; + /* Used only for log trees of subvolumes, not for the log root tree. */ atomic_t log_writers; atomic_t log_commit[2]; /* Used only for log trees of subvolumes, not for the log root tree. */ diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index ccc6b59f6729..aaa449153d9c 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3116,28 +3116,17 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, btrfs_init_log_ctx(&root_log_ctx, NULL); mutex_lock(&log_root_tree->log_mutex); - atomic_inc(&log_root_tree->log_writers); index2 = log_root_tree->log_transid % 2; list_add_tail(&root_log_ctx.list, &log_root_tree->log_ctxs[index2]); root_log_ctx.log_transid = log_root_tree->log_transid; - mutex_unlock(&log_root_tree->log_mutex); - - mutex_lock(&log_root_tree->log_mutex); - /* * Now we are safe to update the log_root_tree because we're under the * log_mutex, and we're a current writer so we're holding the commit * open until we drop the log_mutex. */ ret = update_log_root(trans, log, &new_root_item); - - if (atomic_dec_and_test(&log_root_tree->log_writers)) { - /* atomic_dec_and_test implies a barrier */ - cond_wake_up_nomb(&log_root_tree->log_writer_wait); - } - if (ret) { if (!list_empty(&root_log_ctx.list)) list_del_init(&root_log_ctx.list); @@ -3183,8 +3172,6 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, root_log_ctx.log_transid - 1); } - wait_for_writer(log_root_tree); - /* * now that we've moved on to the tree of log tree roots, * check the full commit flag again