Message ID | cf8cd6170bd2283524a89a8192eeaba769a98fd6.1611627788.git.naohiro.aota@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: zoned block device support | expand |
On 1/25/21 9:25 PM, Naohiro Aota wrote: > This is the 2/3 patch to enable tree-log on ZONED mode. > > Since we can start more than one log transactions per subvolume > simultaneously, nodes from multiple transactions can be allocated > interleaved. Such mixed allocation results in non-sequential writes at the > time of log transaction commit. The nodes of the global log root tree > (fs_info->log_root_tree), also have the same mixed allocation problem. > > This patch serializes log transactions by waiting for a committing > transaction when someone tries to start a new transaction, to avoid the > mixed allocation problem. We must also wait for running log transactions > from another subvolume, but there is no easy way to detect which subvolume > root is running a log transaction. So, this patch forbids starting a new > log transaction when other subvolumes already allocated the global log root > tree. > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Tue, Jan 26, 2021 at 5:53 AM Naohiro Aota <naohiro.aota@wdc.com> wrote: > > This is the 2/3 patch to enable tree-log on ZONED mode. > > Since we can start more than one log transactions per subvolume > simultaneously, nodes from multiple transactions can be allocated > interleaved. Such mixed allocation results in non-sequential writes at the > time of log transaction commit. The nodes of the global log root tree > (fs_info->log_root_tree), also have the same mixed allocation problem. > > This patch serializes log transactions by waiting for a committing > transaction when someone tries to start a new transaction, to avoid the > mixed allocation problem. We must also wait for running log transactions > from another subvolume, but there is no easy way to detect which subvolume > root is running a log transaction. So, this patch forbids starting a new > log transaction when other subvolumes already allocated the global log root > tree. > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > --- > fs/btrfs/tree-log.c | 29 +++++++++++++++++++++++++++++ > 1 file changed, 29 insertions(+) > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index 930e752686b4..71a1c0b5bc26 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -105,6 +105,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans, > struct btrfs_root *log, > struct btrfs_path *path, > u64 dirid, int del_all); > +static void wait_log_commit(struct btrfs_root *root, int transid); > > /* > * tree logging is a special write ahead log used to make sure that > @@ -140,6 +141,7 @@ static int start_log_trans(struct btrfs_trans_handle *trans, > { > struct btrfs_fs_info *fs_info = root->fs_info; > struct btrfs_root *tree_root = fs_info->tree_root; > + const bool zoned = btrfs_is_zoned(fs_info); > int ret = 0; > > /* > @@ -160,12 +162,20 @@ static int start_log_trans(struct btrfs_trans_handle *trans, > > mutex_lock(&root->log_mutex); > > +again: > if (root->log_root) { > + int index = (root->log_transid + 1) % 2; > + > if (btrfs_need_log_full_commit(trans)) { > ret = -EAGAIN; > goto out; > } > > + if (zoned && atomic_read(&root->log_commit[index])) { > + wait_log_commit(root, root->log_transid - 1); > + goto again; > + } > + > if (!root->log_start_pid) { > clear_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); > root->log_start_pid = current->pid; > @@ -173,6 +183,17 @@ static int start_log_trans(struct btrfs_trans_handle *trans, > set_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); > } > } else { > + if (zoned) { > + mutex_lock(&fs_info->tree_log_mutex); > + if (fs_info->log_root_tree) > + ret = -EAGAIN; > + else > + ret = btrfs_init_log_root_tree(trans, fs_info); > + mutex_unlock(&fs_info->tree_log_mutex); > + } Hum, so looking at this in the for-next branch, this does not seem to make much sense now, probably because these patches started to be developed before the following commit that landed in 5.10: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=47876f7ceffa0e6af7476e052b3c061f1f2c1d9f So if we are the first task doing an fsync after a transaction commit, and there are no other concurrent tasks doing an fsync: 1) We create fs_info->log_root_tree at the top of start_log_trans() because test_bit(BTRFS_ROOT_HAS_LOG_TREE, &tree_root->state) returns false; 2) Then, we enter this code for zoned mode only, and fs_info->log_root_tree is not NULL, because we just created it before, so we always return -EAGAIN and every fsync is converted to a full transaction commit. For this case, of no concurrency, and being the first task doing an fsync, it was not supposed to fallback to a transaction commit - that defeats the goal of this patch unless I missed something. Also, fs_info->log_root_tree is protected by tree_root->log_mutex and not anymore by fs_info->tree_log_mutex (since that specific commit). > + if (ret) > + goto out; Also this "if (ret)" check could be moved inside the previous "if (zoned)" block after unlocking the mutex. Thanks, sorry for the very late review. > + > ret = btrfs_add_log_tree(trans, root); > if (ret) > goto out; > @@ -201,14 +222,22 @@ static int start_log_trans(struct btrfs_trans_handle *trans, > */ > static int join_running_log_trans(struct btrfs_root *root) > { > + const bool zoned = btrfs_is_zoned(root->fs_info); > int ret = -ENOENT; > > if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &root->state)) > return ret; > > mutex_lock(&root->log_mutex); > +again: > if (root->log_root) { > + int index = (root->log_transid + 1) % 2; > + > ret = 0; > + if (zoned && atomic_read(&root->log_commit[index])) { > + wait_log_commit(root, root->log_transid - 1); > + goto again; > + } > atomic_inc(&root->log_writers); > } > mutex_unlock(&root->log_mutex); > -- > 2.27.0 >
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 930e752686b4..71a1c0b5bc26 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -105,6 +105,7 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans, struct btrfs_root *log, struct btrfs_path *path, u64 dirid, int del_all); +static void wait_log_commit(struct btrfs_root *root, int transid); /* * tree logging is a special write ahead log used to make sure that @@ -140,6 +141,7 @@ static int start_log_trans(struct btrfs_trans_handle *trans, { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *tree_root = fs_info->tree_root; + const bool zoned = btrfs_is_zoned(fs_info); int ret = 0; /* @@ -160,12 +162,20 @@ static int start_log_trans(struct btrfs_trans_handle *trans, mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + if (btrfs_need_log_full_commit(trans)) { ret = -EAGAIN; goto out; } + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } + if (!root->log_start_pid) { clear_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); root->log_start_pid = current->pid; @@ -173,6 +183,17 @@ static int start_log_trans(struct btrfs_trans_handle *trans, set_bit(BTRFS_ROOT_MULTI_LOG_TASKS, &root->state); } } else { + if (zoned) { + mutex_lock(&fs_info->tree_log_mutex); + if (fs_info->log_root_tree) + ret = -EAGAIN; + else + ret = btrfs_init_log_root_tree(trans, fs_info); + mutex_unlock(&fs_info->tree_log_mutex); + } + if (ret) + goto out; + ret = btrfs_add_log_tree(trans, root); if (ret) goto out; @@ -201,14 +222,22 @@ static int start_log_trans(struct btrfs_trans_handle *trans, */ static int join_running_log_trans(struct btrfs_root *root) { + const bool zoned = btrfs_is_zoned(root->fs_info); int ret = -ENOENT; if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &root->state)) return ret; mutex_lock(&root->log_mutex); +again: if (root->log_root) { + int index = (root->log_transid + 1) % 2; + ret = 0; + if (zoned && atomic_read(&root->log_commit[index])) { + wait_log_commit(root, root->log_transid - 1); + goto again; + } atomic_inc(&root->log_writers); } mutex_unlock(&root->log_mutex);
This is the 2/3 patch to enable tree-log on ZONED mode. Since we can start more than one log transactions per subvolume simultaneously, nodes from multiple transactions can be allocated interleaved. Such mixed allocation results in non-sequential writes at the time of log transaction commit. The nodes of the global log root tree (fs_info->log_root_tree), also have the same mixed allocation problem. This patch serializes log transactions by waiting for a committing transaction when someone tries to start a new transaction, to avoid the mixed allocation problem. We must also wait for running log transactions from another subvolume, but there is no easy way to detect which subvolume root is running a log transaction. So, this patch forbids starting a new log transaction when other subvolumes already allocated the global log root tree. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- fs/btrfs/tree-log.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)