diff mbox series

[v7,01/12] btrfs: lift rw mount setup from mount and remount

Message ID ac259f3ceafae5a8bb9b6c554375588705aa55b2.1605736355.git.boris@bur.io (mailing list archive)
State New, archived
Headers show
Series btrfs: free space tree mounting fixes | expand

Commit Message

Boris Burkov Nov. 18, 2020, 11:06 p.m. UTC
Mounting rw and remounting from ro to rw naturally share invariants and
functionality which result in a correctly setup rw filesystem. Luckily,
there is even a strong unity in the code which implements them. In
mount's open_ctree, these operations mostly happen after an early return
for ro file systems, and in remount, they happen in a section devoted to
remounting ro->rw, after some remount specific validation passes.

However, there are unfortunately a few differences. There are small
deviations in the order of some of the operations, remount does not
cleanup orphan inodes in root_tree or fs_tree, remount does not create
the free space tree, and remount does not handle "one-shot" mount
options like clear_cache and uuid tree rescan.

Since we want to add building the free space tree to remount, and since
it is possible to leak orphans on a filesystem mounted as ro then
remounted rw (common for the root filesystem when booting), we would
benefit from unifying the logic between the two codepaths.

This patch only lifts the existing common functionality, and leaves a
natural path for fixing the discrepancies.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/disk-io.c | 91 ++++++++++++++++++++++++++--------------------
 fs/btrfs/disk-io.h |  1 +
 fs/btrfs/super.c   | 37 +++----------------
 3 files changed, 59 insertions(+), 70 deletions(-)

Comments

David Sterba Nov. 23, 2020, 4:50 p.m. UTC | #1
On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> Mounting rw and remounting from ro to rw naturally share invariants and
> functionality which result in a correctly setup rw filesystem. Luckily,
> there is even a strong unity in the code which implements them. In
> mount's open_ctree, these operations mostly happen after an early return
> for ro file systems, and in remount, they happen in a section devoted to
> remounting ro->rw, after some remount specific validation passes.
> 
> However, there are unfortunately a few differences. There are small
> deviations in the order of some of the operations, remount does not
> cleanup orphan inodes in root_tree or fs_tree, remount does not create
> the free space tree, and remount does not handle "one-shot" mount
> options like clear_cache and uuid tree rescan.
> 
> Since we want to add building the free space tree to remount, and since
> it is possible to leak orphans on a filesystem mounted as ro then
> remounted rw

The statement is not specific if the orphans are files or roots. But I
don't agree that a leak is possible, or need a proof of the claim above.

The mount-time orphan cleanup will start early, but otherwise orphan
cleanup is checked and started on dentry lookups (btrfs_lookup_dentry).
Deleted but not clened tree roorts are all found and removed, regardless
of rw or ro->rw mount.

So I wonder if you claim there's a leak just by lack of an explicit call
on the remount path.
David Sterba Nov. 23, 2020, 4:57 p.m. UTC | #2
On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> +/*
> + * Mounting logic specific to read-write file systems. Shared by open_ctree
> + * and btrfs_remount when remounting from read-only to read-write.
> + */
> +int btrfs_mount_rw(struct btrfs_fs_info *fs_info)

The function could be named better, to reflect that it's starting some
operations prior to the rw mount. As its now it would read as "mount the
filesystem as read-write", we already have 'btrfs_mount' as callback for
mount.

As this is a cosmetic fix it's not blocking the patchset but I'd like to
have that fixed for the final version. I don't have a suggestion for
now.
Boris Burkov Nov. 30, 2020, 11:31 p.m. UTC | #3
On Mon, Nov 23, 2020 at 05:50:40PM +0100, David Sterba wrote:
> On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> > Mounting rw and remounting from ro to rw naturally share invariants and
> > functionality which result in a correctly setup rw filesystem. Luckily,
> > there is even a strong unity in the code which implements them. In
> > mount's open_ctree, these operations mostly happen after an early return
> > for ro file systems, and in remount, they happen in a section devoted to
> > remounting ro->rw, after some remount specific validation passes.
> > 
> > However, there are unfortunately a few differences. There are small
> > deviations in the order of some of the operations, remount does not
> > cleanup orphan inodes in root_tree or fs_tree, remount does not create
> > the free space tree, and remount does not handle "one-shot" mount
> > options like clear_cache and uuid tree rescan.
> > 
> > Since we want to add building the free space tree to remount, and since
> > it is possible to leak orphans on a filesystem mounted as ro then
> > remounted rw
> 
> The statement is not specific if the orphans are files or roots. But I
> don't agree that a leak is possible, or need a proof of the claim above.
> 
> The mount-time orphan cleanup will start early, but otherwise orphan
> cleanup is checked and started on dentry lookups (btrfs_lookup_dentry).
> Deleted but not clened tree roorts are all found and removed, regardless
> of rw or ro->rw mount.
> 
> So I wonder if you claim there's a leak just by lack of an explicit call
> on the remount path.

For what it's worth, the example I had in mind is the free space inode
orphans after a block_group delete or the new "clear v1 space cache"
code in this stack.

I hadn't considered btrfs_lookup_dentry because I was focused on those
specific inodes, but it's possible that gets called in a way that would
clean them too.

However, another thing I think I overlooked is that it doesn't look
like remount would affect the set of delayed_iputs, so that mechanism for
removing the orphans should still work. Further, the new function only
runs when going from ro->rw, but any ro mount would run delayed iputs
before completing as part of btrfs_commit_super.

So with all that, I agree with you that there isn't a leak. Going
forward with this, I can certainly fix the commit messages, or even get
rid of the patch that does the orphan cleanup in remount. I can't think
of a reason that the cleanup would be bad, but on the other hand, just
"unity" is a flimsy justification for adding it. Let me know what you
prefer.

Thanks for the review,
Boris
Boris Burkov Dec. 1, 2020, 12:01 a.m. UTC | #4
On Mon, Nov 23, 2020 at 05:57:04PM +0100, David Sterba wrote:
> On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> > +/*
> > + * Mounting logic specific to read-write file systems. Shared by open_ctree
> > + * and btrfs_remount when remounting from read-only to read-write.
> > + */
> > +int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
> 
> The function could be named better, to reflect that it's starting some
> operations prior to the rw mount. As its now it would read as "mount the
> filesystem as read-write", we already have 'btrfs_mount' as callback for
> mount.
> 
> As this is a cosmetic fix it's not blocking the patchset but I'd like to
> have that fixed for the final version. I don't have a suggestion for
> now.

I agree, the name implies a function which does a rw mount end to end.

Some alternative ideas:
btrfs_prepare_rw_mount
btrfs_setup_rw_mount
btrfs_handle_rw_mount
btrfs_process_rw_mount

I think I personally like 'prepare' the most out of those.
David Sterba Dec. 15, 2020, 5:50 p.m. UTC | #5
On Mon, Nov 30, 2020 at 04:01:38PM -0800, Boris Burkov wrote:
> On Mon, Nov 23, 2020 at 05:57:04PM +0100, David Sterba wrote:
> > On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> > > +/*
> > > + * Mounting logic specific to read-write file systems. Shared by open_ctree
> > > + * and btrfs_remount when remounting from read-only to read-write.
> > > + */
> > > +int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
> > 
> > The function could be named better, to reflect that it's starting some
> > operations prior to the rw mount. As its now it would read as "mount the
> > filesystem as read-write", we already have 'btrfs_mount' as callback for
> > mount.
> > 
> > As this is a cosmetic fix it's not blocking the patchset but I'd like to
> > have that fixed for the final version. I don't have a suggestion for
> > now.
> 
> I agree, the name implies a function which does a rw mount end to end.
> 
> Some alternative ideas:
> btrfs_prepare_rw_mount
> btrfs_setup_rw_mount
> btrfs_handle_rw_mount
> btrfs_process_rw_mount
> 
> I think I personally like 'prepare' the most out of those.

I had not noticed your reply when I committed the patch, the function is
btrfs_start_pre_rw_mount. btrfs_prepare_rw_mount would be also good, we
can rename it eventually if it needed.
David Sterba Dec. 15, 2020, 6:01 p.m. UTC | #6
On Mon, Nov 30, 2020 at 03:31:42PM -0800, Boris Burkov wrote:
> On Mon, Nov 23, 2020 at 05:50:40PM +0100, David Sterba wrote:
> > On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> > > Mounting rw and remounting from ro to rw naturally share invariants and
> > > functionality which result in a correctly setup rw filesystem. Luckily,
> > > there is even a strong unity in the code which implements them. In
> > > mount's open_ctree, these operations mostly happen after an early return
> > > for ro file systems, and in remount, they happen in a section devoted to
> > > remounting ro->rw, after some remount specific validation passes.
> > > 
> > > However, there are unfortunately a few differences. There are small
> > > deviations in the order of some of the operations, remount does not
> > > cleanup orphan inodes in root_tree or fs_tree, remount does not create
> > > the free space tree, and remount does not handle "one-shot" mount
> > > options like clear_cache and uuid tree rescan.
> > > 
> > > Since we want to add building the free space tree to remount, and since
> > > it is possible to leak orphans on a filesystem mounted as ro then
> > > remounted rw
> > 
> > The statement is not specific if the orphans are files or roots. But I
> > don't agree that a leak is possible, or need a proof of the claim above.
> > 
> > The mount-time orphan cleanup will start early, but otherwise orphan
> > cleanup is checked and started on dentry lookups (btrfs_lookup_dentry).
> > Deleted but not clened tree roorts are all found and removed, regardless
> > of rw or ro->rw mount.
> > 
> > So I wonder if you claim there's a leak just by lack of an explicit call
> > on the remount path.
> 
> For what it's worth, the example I had in mind is the free space inode
> orphans after a block_group delete or the new "clear v1 space cache"
> code in this stack.
> 
> I hadn't considered btrfs_lookup_dentry because I was focused on those
> specific inodes, but it's possible that gets called in a way that would
> clean them too.
> 
> However, another thing I think I overlooked is that it doesn't look
> like remount would affect the set of delayed_iputs, so that mechanism for
> removing the orphans should still work. Further, the new function only
> runs when going from ro->rw, but any ro mount would run delayed iputs
> before completing as part of btrfs_commit_super.
> 
> So with all that, I agree with you that there isn't a leak. Going
> forward with this, I can certainly fix the commit messages, or even get
> rid of the patch that does the orphan cleanup in remount. I can't think
> of a reason that the cleanup would be bad, but on the other hand, just
> "unity" is a flimsy justification for adding it. Let me know what you
> prefer.

Thanks for checking, the committed changelog does not contain 'leak' and
I slighthly rephrased only that sentence.
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8c87a1caefff..3e0de4986dbc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2884,6 +2884,53 @@  static int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info)
 	return 0;
 }
 
+/*
+ * Mounting logic specific to read-write file systems. Shared by open_ctree
+ * and btrfs_remount when remounting from read-only to read-write.
+ */
+int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	ret = btrfs_cleanup_fs_roots(fs_info);
+	if (ret)
+		goto out;
+
+	mutex_lock(&fs_info->cleaner_mutex);
+	ret = btrfs_recover_relocation(fs_info->tree_root);
+	mutex_unlock(&fs_info->cleaner_mutex);
+	if (ret < 0) {
+		btrfs_warn(fs_info, "failed to recover relocation: %d", ret);
+		goto out;
+	}
+
+	ret = btrfs_resume_balance_async(fs_info);
+	if (ret)
+		goto out;
+
+	ret = btrfs_resume_dev_replace_async(fs_info);
+	if (ret) {
+		btrfs_warn(fs_info, "failed to resume dev_replace");
+		goto out;
+	}
+
+	btrfs_qgroup_rescan_resume(fs_info);
+
+	if (!fs_info->uuid_root) {
+		btrfs_info(fs_info, "creating UUID tree");
+		ret = btrfs_create_uuid_tree(fs_info);
+		if (ret) {
+			btrfs_warn(fs_info,
+				   "failed to create the UUID tree %d",
+				   ret);
+			goto out;
+		}
+	}
+
+out:
+	return ret;
+}
+
 int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices,
 		      char *options)
 {
@@ -3290,22 +3337,6 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	if (ret)
 		goto fail_qgroup;
 
-	if (!sb_rdonly(sb)) {
-		ret = btrfs_cleanup_fs_roots(fs_info);
-		if (ret)
-			goto fail_qgroup;
-
-		mutex_lock(&fs_info->cleaner_mutex);
-		ret = btrfs_recover_relocation(tree_root);
-		mutex_unlock(&fs_info->cleaner_mutex);
-		if (ret < 0) {
-			btrfs_warn(fs_info, "failed to recover relocation: %d",
-					ret);
-			err = -EINVAL;
-			goto fail_qgroup;
-		}
-	}
-
 	fs_info->fs_root = btrfs_get_fs_root(fs_info, BTRFS_FS_TREE_OBJECTID, true);
 	if (IS_ERR(fs_info->fs_root)) {
 		err = PTR_ERR(fs_info->fs_root);
@@ -3358,35 +3389,17 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	}
 	up_read(&fs_info->cleanup_work_sem);
 
-	ret = btrfs_resume_balance_async(fs_info);
-	if (ret) {
-		btrfs_warn(fs_info, "failed to resume balance: %d", ret);
-		close_ctree(fs_info);
-		return ret;
-	}
-
-	ret = btrfs_resume_dev_replace_async(fs_info);
+	ret = btrfs_mount_rw(fs_info);
 	if (ret) {
-		btrfs_warn(fs_info, "failed to resume device replace: %d", ret);
 		close_ctree(fs_info);
 		return ret;
 	}
-
-	btrfs_qgroup_rescan_resume(fs_info);
 	btrfs_discard_resume(fs_info);
 
-	if (!fs_info->uuid_root) {
-		btrfs_info(fs_info, "creating UUID tree");
-		ret = btrfs_create_uuid_tree(fs_info);
-		if (ret) {
-			btrfs_warn(fs_info,
-				"failed to create the UUID tree: %d", ret);
-			close_ctree(fs_info);
-			return ret;
-		}
-	} else if (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) ||
-		   fs_info->generation !=
-				btrfs_super_uuid_tree_generation(disk_super)) {
+	if (fs_info->uuid_root &&
+	    (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) ||
+	     fs_info->generation !=
+			btrfs_super_uuid_tree_generation(disk_super))) {
 		btrfs_info(fs_info, "checking UUID tree");
 		ret = btrfs_check_uuid_tree(fs_info);
 		if (ret) {
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index e75ea6092942..945914d1747f 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -50,6 +50,7 @@  struct extent_buffer *btrfs_find_create_tree_block(
 						u64 bytenr, u64 owner_root,
 						int level);
 void btrfs_clean_tree_block(struct extent_buffer *buf);
+int btrfs_mount_rw(struct btrfs_fs_info *fs_info);
 int __cold open_ctree(struct super_block *sb,
 	       struct btrfs_fs_devices *fs_devices,
 	       char *options);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 6693cfc14dfd..ad5a78970389 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1878,7 +1878,6 @@  static inline void btrfs_remount_cleanup(struct btrfs_fs_info *fs_info,
 static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
-	struct btrfs_root *root = fs_info->tree_root;
 	unsigned old_flags = sb->s_flags;
 	unsigned long old_opts = fs_info->mount_opt;
 	unsigned long old_compress_type = fs_info->compress_type;
@@ -1971,39 +1970,15 @@  static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			goto restore;
 		}
 
-		ret = btrfs_cleanup_fs_roots(fs_info);
-		if (ret)
-			goto restore;
-
-		/* recover relocation */
-		mutex_lock(&fs_info->cleaner_mutex);
-		ret = btrfs_recover_relocation(root);
-		mutex_unlock(&fs_info->cleaner_mutex);
-		if (ret)
-			goto restore;
-
-		ret = btrfs_resume_balance_async(fs_info);
+		/*
+		 * NOTE: when remounting with a change that does writes, don't
+		 * put it anywhere above this point, as we are not sure to be
+		 * safe to write until we pass the above checks.
+		 */
+		ret = btrfs_mount_rw(fs_info);
 		if (ret)
 			goto restore;
 
-		ret = btrfs_resume_dev_replace_async(fs_info);
-		if (ret) {
-			btrfs_warn(fs_info, "failed to resume dev_replace");
-			goto restore;
-		}
-
-		btrfs_qgroup_rescan_resume(fs_info);
-
-		if (!fs_info->uuid_root) {
-			btrfs_info(fs_info, "creating UUID tree");
-			ret = btrfs_create_uuid_tree(fs_info);
-			if (ret) {
-				btrfs_warn(fs_info,
-					   "failed to create the UUID tree %d",
-					   ret);
-				goto restore;
-			}
-		}
 		sb->s_flags &= ~SB_RDONLY;
 
 		set_bit(BTRFS_FS_OPEN, &fs_info->flags);