diff mbox series

[13/18] btrfs: handle the ro->rw transition for mounting different subovls

Message ID 0a73915edbb8d05e30d167351ea8c709a9bfe447.1699308010.git.josef@toxicpanda.com (mailing list archive)
State New
Headers show
Series btrfs: convert to the new mount API | expand

Commit Message

Josef Bacik Nov. 6, 2023, 10:08 p.m. UTC
This is an oddity that we've carried around since 0723a0473fb4 ("btrfs:
allow mounting btrfs subvolumes with different ro/rw options") where
we'll under the covers flip the file system to RW if you're mixing and
matching ro/rw options with different subvol mounts.  The first mount is
what the super gets setup as, so we'd handle this by remount the super
as rw under the covers to facilitate this behavior.

With the new mount API we can't really allow this, because user space
has the ability to specify the super block settings, and the mount
settings.  So if the user explicitly set the super block as read only,
and then tried to mount a rw mount with the super block we'll reject
this.  However the old API was less descriptive and thus we allowed this
kind of behavior.

This patch preserves this behavior for the old api calls.  This is
inspired by Christians work, and includes one of his comments, and thus
is included in the link below.

Link: https://lore.kernel.org/all/20230626-fs-btrfs-mount-api-v1-2-045e9735a00b@kernel.org/
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/super.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 132 insertions(+), 1 deletion(-)

Comments

Christian Brauner Nov. 8, 2023, 8:41 a.m. UTC | #1
On Mon, Nov 06, 2023 at 05:08:21PM -0500, Josef Bacik wrote:
> This is an oddity that we've carried around since 0723a0473fb4 ("btrfs:
> allow mounting btrfs subvolumes with different ro/rw options") where
> we'll under the covers flip the file system to RW if you're mixing and
> matching ro/rw options with different subvol mounts.  The first mount is
> what the super gets setup as, so we'd handle this by remount the super
> as rw under the covers to facilitate this behavior.
> 
> With the new mount API we can't really allow this, because user space
> has the ability to specify the super block settings, and the mount
> settings.  So if the user explicitly set the super block as read only,
> and then tried to mount a rw mount with the super block we'll reject
> this.  However the old API was less descriptive and thus we allowed this
> kind of behavior.
> 
> This patch preserves this behavior for the old api calls.  This is
> inspired by Christians work, and includes one of his comments, and thus
> is included in the link below.
> 
> Link: https://lore.kernel.org/all/20230626-fs-btrfs-mount-api-v1-2-045e9735a00b@kernel.org/
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---

Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>

Just note that all capitalization was removed from the comment
preceeding btrfs_reconfigure_for_mount() by accident. You might want to
fix that up/recopy that comment.

>  fs/btrfs/super.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 132 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 4ace42e08bff..e2ac0801211d 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2513,13 +2513,15 @@ static int btrfs_reconfigure(struct fs_context *fc)
>  	struct btrfs_fs_context *ctx = fc->fs_private;
>  	struct btrfs_fs_context old_ctx;
>  	int ret = 0;
> +	bool mount_reconfigure = (fc->s_fs_info != NULL);
>  
>  	btrfs_info_to_ctx(fs_info, &old_ctx);
>  
>  	sync_filesystem(sb);
>  	set_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state);
>  
> -	if (!check_options(fs_info, &ctx->mount_opt, fc->sb_flags))
> +	if (!mount_reconfigure &&
> +	    !check_options(fs_info, &ctx->mount_opt, fc->sb_flags))
>  		return -EINVAL;
>  
>  	ret = btrfs_check_features(fs_info, !(fc->sb_flags & SB_RDONLY));
> @@ -2922,6 +2924,133 @@ static int btrfs_get_tree_super(struct fs_context *fc)
>  	return ret;
>  }
>  
> +/*
> + * Christian wrote this long comment about what we're doing here, preserving it
> + * so the history of this change is preserved.
> + *
> + * ever since commit 0723a0473fb4 ("btrfs: allow * mounting btrfs subvolumes
> + * with different ro/rw * options") the following works:
> + *
> + *        (i) mount /dev/sda3 -o subvol=foo,ro /mnt/foo
> + *       (ii) mount /dev/sda3 -o subvol=bar,rw /mnt/bar
> + *
> + * which looks nice and innocent but is actually pretty * intricate and
> + * deserves a long comment.
> + *
> + * on another filesystem a subvolume mount is close to * something like:
> + *
> + *	(iii) # create rw superblock + initial mount
> + *	      mount -t xfs /dev/sdb /opt/
> + *
> + *	      # create ro bind mount
> + *	      mount --bind -o ro /opt/foo /mnt/foo
> + *
> + *	      # unmount initial mount
> + *	      umount /opt
> + *
> + * of course, there's some special subvolume sauce and there's the fact that the
> + * sb->s_root dentry is really swapped after mount_subtree(). but conceptually
> + * it's very close and will help us understand the issue.
> + *
> + * the old mount api didn't cleanly distinguish between a mount being made ro
> + * and a superblock being made ro.  the only way to change the ro state of
> + * either object was by passing ms_rdonly. if a new mount was created via
> + * mount(2) such as:
> + *
> + *      mount("/dev/sdb", "/mnt", "xfs", ms_rdonly, null);
> + *
> + * the ms_rdonly flag being specified had two effects:
> + *
> + * (1) mnt_readonly was raised -> the resulting mount got
> + *     @mnt->mnt_flags |= mnt_readonly raised.
> + *
> + * (2) ms_rdonly was passed to the filesystem's mount method and the filesystems
> + *     made the superblock ro. note, how sb_rdonly has the same value as
> + *     ms_rdonly and is raised whenever ms_rdonly is passed through mount(2).
> + *
> + * creating a subtree mount via (iii) ends up leaving a rw superblock with a
> + * subtree mounted ro.
> + *
> + * but consider the effect on the old mount api on btrfs subvolume mounting
> + * which combines the distinct step in (iii) into a a single step.
> + *
> + * by issuing (i) both the mount and the superblock are turned ro. now when (ii)
> + * is issued the superblock is ro and thus even if the mount created for (ii) is
> + * rw it wouldn't help. hence, btrfs needed to transition the superblock from ro
> + * to rw for (ii) which it did using an internal remount call (a bold
> + * choice...).
> + *
> + * iow, subvolume mounting was inherently messy due to the ambiguity of
> + * ms_rdonly in mount(2). note, this ambiguity has mount(8) always translate
> + * "ro" to ms_rdonly. iow, in both (i) and (ii) "ro" becomes ms_rdonly when
> + * passed by mount(8) to mount(2).
> + *
> + * enter the new mount api. the new mount api disambiguates making a mount ro
> + * and making a superblock ro.
> + *
> + * (3) to turn a mount ro the mount_attr_rdonly flag can be used with either
> + *     fsmount() or mount_setattr() this is a pure vfs level change for a
> + *     specific mount or mount tree that is never seen by the filesystem itself.
> + *
> + * (4) to turn a superblock ro the "ro" flag must be used with
> + *     fsconfig(fsconfig_set_flag, "ro"). this option is seen by the filesytem
> + *     in fc->sb_flags.
> + *
> + * this disambiguation has rather positive consequences.  mounting a subvolume
> + * ro will not also turn the superblock ro. only the mount for the subvolume
> + * will become ro.
> + *
> + * so, if the superblock creation request comes from the new mount api the
> + * caller must've explicitly done:
> + *
> + *      fsconfig(fsconfig_set_flag, "ro")
> + *      fsmount/mount_setattr(mount_attr_rdonly)
> + *
> + * iow, at some point the caller must have explicitly turned the whole
> + * superblock ro and we shouldn't just undo it like we did for the old mount
> + * api. in any case, it lets us avoid this nasty hack in the new mount api.
> + *
> + * Consequently, the remounting hack must only be used for requests originating
> + * from the old mount api and should be marked for full deprecation so it can be
> + * turned off in a couple of years.
> + *
> + * The new mount api has no reason to support this hack.
> + */
> +static struct vfsmount *btrfs_reconfigure_for_mount(struct fs_context *fc)
> +{
> +	struct vfsmount *mnt;
> +	int ret;
> +	bool ro2rw = !(fc->sb_flags & SB_RDONLY);
> +
> +	/*
> +	 * We got an EBUSY because our SB_RDONLY flag didn't match the existing
> +	 * super block, so invert our setting here and re-try the mount so we
> +	 * can get our vfsmount.
> +	 */
> +	if (ro2rw)
> +		fc->sb_flags |= SB_RDONLY;
> +	else
> +		fc->sb_flags &= ~SB_RDONLY;
> +
> +	mnt = fc_mount(fc);
> +	if (IS_ERR(mnt))
> +		return mnt;
> +
> +	if (!fc->oldapi || !ro2rw)
> +		return mnt;
> +
> +	/* We need to convert to rw, call reconfigure */
> +	fc->sb_flags &= ~SB_RDONLY;
> +	down_write(&mnt->mnt_sb->s_umount);
> +	ret = btrfs_reconfigure(fc);
> +	up_write(&mnt->mnt_sb->s_umount);
> +	if (ret) {
> +		mntput(mnt);
> +		return ERR_PTR(ret);
> +	}
> +	return mnt;
> +}
> +
>  static int btrfs_get_tree_subvol(struct fs_context *fc)
>  {
>  	struct btrfs_fs_info *fs_info = NULL;
> @@ -2971,6 +3100,8 @@ static int btrfs_get_tree_subvol(struct fs_context *fc)
>  	fc->security = NULL;
>  
>  	mnt = fc_mount(dup_fc);
> +	if (PTR_ERR_OR_ZERO(mnt) == -EBUSY)
> +		mnt = btrfs_reconfigure_for_mount(dup_fc);
>  	put_fs_context(dup_fc);
>  	if (IS_ERR(mnt))
>  		return PTR_ERR(mnt);
> -- 
> 2.41.0
>
Josef Bacik Nov. 8, 2023, 3:53 p.m. UTC | #2
On Wed, Nov 08, 2023 at 09:41:33AM +0100, Christian Brauner wrote:
> On Mon, Nov 06, 2023 at 05:08:21PM -0500, Josef Bacik wrote:
> > This is an oddity that we've carried around since 0723a0473fb4 ("btrfs:
> > allow mounting btrfs subvolumes with different ro/rw options") where
> > we'll under the covers flip the file system to RW if you're mixing and
> > matching ro/rw options with different subvol mounts.  The first mount is
> > what the super gets setup as, so we'd handle this by remount the super
> > as rw under the covers to facilitate this behavior.
> > 
> > With the new mount API we can't really allow this, because user space
> > has the ability to specify the super block settings, and the mount
> > settings.  So if the user explicitly set the super block as read only,
> > and then tried to mount a rw mount with the super block we'll reject
> > this.  However the old API was less descriptive and thus we allowed this
> > kind of behavior.
> > 
> > This patch preserves this behavior for the old api calls.  This is
> > inspired by Christians work, and includes one of his comments, and thus
> > is included in the link below.
> > 
> > Link: https://lore.kernel.org/all/20230626-fs-btrfs-mount-api-v1-2-045e9735a00b@kernel.org/
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> 
> Looks good to me,
> Reviewed-by: Christian Brauner <brauner@kernel.org>
> 
> Just note that all capitalization was removed from the comment
> preceeding btrfs_reconfigure_for_mount() by accident. You might want to
> fix that up/recopy that comment.
> 

Oops, I accidentally did something with vim that killed capitalization in the
whole file, I thought I undid all of it properly but apparently I didn't.  I'll
fix it up, thanks,

Josef
diff mbox series

Patch

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4ace42e08bff..e2ac0801211d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2513,13 +2513,15 @@  static int btrfs_reconfigure(struct fs_context *fc)
 	struct btrfs_fs_context *ctx = fc->fs_private;
 	struct btrfs_fs_context old_ctx;
 	int ret = 0;
+	bool mount_reconfigure = (fc->s_fs_info != NULL);
 
 	btrfs_info_to_ctx(fs_info, &old_ctx);
 
 	sync_filesystem(sb);
 	set_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state);
 
-	if (!check_options(fs_info, &ctx->mount_opt, fc->sb_flags))
+	if (!mount_reconfigure &&
+	    !check_options(fs_info, &ctx->mount_opt, fc->sb_flags))
 		return -EINVAL;
 
 	ret = btrfs_check_features(fs_info, !(fc->sb_flags & SB_RDONLY));
@@ -2922,6 +2924,133 @@  static int btrfs_get_tree_super(struct fs_context *fc)
 	return ret;
 }
 
+/*
+ * Christian wrote this long comment about what we're doing here, preserving it
+ * so the history of this change is preserved.
+ *
+ * ever since commit 0723a0473fb4 ("btrfs: allow * mounting btrfs subvolumes
+ * with different ro/rw * options") the following works:
+ *
+ *        (i) mount /dev/sda3 -o subvol=foo,ro /mnt/foo
+ *       (ii) mount /dev/sda3 -o subvol=bar,rw /mnt/bar
+ *
+ * which looks nice and innocent but is actually pretty * intricate and
+ * deserves a long comment.
+ *
+ * on another filesystem a subvolume mount is close to * something like:
+ *
+ *	(iii) # create rw superblock + initial mount
+ *	      mount -t xfs /dev/sdb /opt/
+ *
+ *	      # create ro bind mount
+ *	      mount --bind -o ro /opt/foo /mnt/foo
+ *
+ *	      # unmount initial mount
+ *	      umount /opt
+ *
+ * of course, there's some special subvolume sauce and there's the fact that the
+ * sb->s_root dentry is really swapped after mount_subtree(). but conceptually
+ * it's very close and will help us understand the issue.
+ *
+ * the old mount api didn't cleanly distinguish between a mount being made ro
+ * and a superblock being made ro.  the only way to change the ro state of
+ * either object was by passing ms_rdonly. if a new mount was created via
+ * mount(2) such as:
+ *
+ *      mount("/dev/sdb", "/mnt", "xfs", ms_rdonly, null);
+ *
+ * the ms_rdonly flag being specified had two effects:
+ *
+ * (1) mnt_readonly was raised -> the resulting mount got
+ *     @mnt->mnt_flags |= mnt_readonly raised.
+ *
+ * (2) ms_rdonly was passed to the filesystem's mount method and the filesystems
+ *     made the superblock ro. note, how sb_rdonly has the same value as
+ *     ms_rdonly and is raised whenever ms_rdonly is passed through mount(2).
+ *
+ * creating a subtree mount via (iii) ends up leaving a rw superblock with a
+ * subtree mounted ro.
+ *
+ * but consider the effect on the old mount api on btrfs subvolume mounting
+ * which combines the distinct step in (iii) into a a single step.
+ *
+ * by issuing (i) both the mount and the superblock are turned ro. now when (ii)
+ * is issued the superblock is ro and thus even if the mount created for (ii) is
+ * rw it wouldn't help. hence, btrfs needed to transition the superblock from ro
+ * to rw for (ii) which it did using an internal remount call (a bold
+ * choice...).
+ *
+ * iow, subvolume mounting was inherently messy due to the ambiguity of
+ * ms_rdonly in mount(2). note, this ambiguity has mount(8) always translate
+ * "ro" to ms_rdonly. iow, in both (i) and (ii) "ro" becomes ms_rdonly when
+ * passed by mount(8) to mount(2).
+ *
+ * enter the new mount api. the new mount api disambiguates making a mount ro
+ * and making a superblock ro.
+ *
+ * (3) to turn a mount ro the mount_attr_rdonly flag can be used with either
+ *     fsmount() or mount_setattr() this is a pure vfs level change for a
+ *     specific mount or mount tree that is never seen by the filesystem itself.
+ *
+ * (4) to turn a superblock ro the "ro" flag must be used with
+ *     fsconfig(fsconfig_set_flag, "ro"). this option is seen by the filesytem
+ *     in fc->sb_flags.
+ *
+ * this disambiguation has rather positive consequences.  mounting a subvolume
+ * ro will not also turn the superblock ro. only the mount for the subvolume
+ * will become ro.
+ *
+ * so, if the superblock creation request comes from the new mount api the
+ * caller must've explicitly done:
+ *
+ *      fsconfig(fsconfig_set_flag, "ro")
+ *      fsmount/mount_setattr(mount_attr_rdonly)
+ *
+ * iow, at some point the caller must have explicitly turned the whole
+ * superblock ro and we shouldn't just undo it like we did for the old mount
+ * api. in any case, it lets us avoid this nasty hack in the new mount api.
+ *
+ * Consequently, the remounting hack must only be used for requests originating
+ * from the old mount api and should be marked for full deprecation so it can be
+ * turned off in a couple of years.
+ *
+ * The new mount api has no reason to support this hack.
+ */
+static struct vfsmount *btrfs_reconfigure_for_mount(struct fs_context *fc)
+{
+	struct vfsmount *mnt;
+	int ret;
+	bool ro2rw = !(fc->sb_flags & SB_RDONLY);
+
+	/*
+	 * We got an EBUSY because our SB_RDONLY flag didn't match the existing
+	 * super block, so invert our setting here and re-try the mount so we
+	 * can get our vfsmount.
+	 */
+	if (ro2rw)
+		fc->sb_flags |= SB_RDONLY;
+	else
+		fc->sb_flags &= ~SB_RDONLY;
+
+	mnt = fc_mount(fc);
+	if (IS_ERR(mnt))
+		return mnt;
+
+	if (!fc->oldapi || !ro2rw)
+		return mnt;
+
+	/* We need to convert to rw, call reconfigure */
+	fc->sb_flags &= ~SB_RDONLY;
+	down_write(&mnt->mnt_sb->s_umount);
+	ret = btrfs_reconfigure(fc);
+	up_write(&mnt->mnt_sb->s_umount);
+	if (ret) {
+		mntput(mnt);
+		return ERR_PTR(ret);
+	}
+	return mnt;
+}
+
 static int btrfs_get_tree_subvol(struct fs_context *fc)
 {
 	struct btrfs_fs_info *fs_info = NULL;
@@ -2971,6 +3100,8 @@  static int btrfs_get_tree_subvol(struct fs_context *fc)
 	fc->security = NULL;
 
 	mnt = fc_mount(dup_fc);
+	if (PTR_ERR_OR_ZERO(mnt) == -EBUSY)
+		mnt = btrfs_reconfigure_for_mount(dup_fc);
 	put_fs_context(dup_fc);
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);