btrfs: do not return EBUSY on concurrent subvolume mounts
diff mbox

Message ID 1461770076-13000-1-git-send-email-dsterba@suse.com
State New
Headers show

Commit Message

David Sterba April 27, 2016, 3:14 p.m. UTC
A user reported mount failures with EBUSY during boot, there's root
partition and many subvolumes, mounted via /etc/fstab.

The failure depends on timing, when multiple subvolumes reach the code
between superblock creation in RO mode, while the subvolumes are RW.
This discrepancy leads to EBUSY and the code has been there since ages.

If the subvolumes are mounted after a short delay, there's no EBUSY.
There's no missing locking, the supreblock creation is atomic and the
error code seems to be just artificial. We support different RO/RW
mounts in mount_subvol and do the relevant adjustments if the flags do
not match.

There are no apparent problems if the check and EBUSY are dropped, no
wrong refcounting, umount works etc.

Reproducer:

mkdir -p mnt $(eval echo mnt{1..10})
mkfs.btrfs -f /dev/sdx
mount /dev/sdx mnt
for i in `seq 10`; do btrfs subvol create subv$i; done
loops=0
while [ $loops -lt 10 ]; do
  umount mnt*
  mount -o ro /dev/sdx mnt &
  for i in `seq 10`; do
    mount -o subvol=subv$i /dev/sdx mnt$i || \
      echo "FAIL: mnt$i, loop $loops" > log-$i-$loops &
    fi
  done
  wait
  : $((loops++))
done

Do 10 iterations, try to run subvolume mounts concurrently, log mount
failures to files, the actual EBUSY errors are on screen.

CC: stable@vger.kernel.org
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/super.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Josef Bacik April 27, 2016, 7:34 p.m. UTC | #1
On 04/27/2016 11:14 AM, David Sterba wrote:
> A user reported mount failures with EBUSY during boot, there's root
> partition and many subvolumes, mounted via /etc/fstab.
>
> The failure depends on timing, when multiple subvolumes reach the code
> between superblock creation in RO mode, while the subvolumes are RW.
> This discrepancy leads to EBUSY and the code has been there since ages.
>
> If the subvolumes are mounted after a short delay, there's no EBUSY.
> There's no missing locking, the supreblock creation is atomic and the
> error code seems to be just artificial. We support different RO/RW
> mounts in mount_subvol and do the relevant adjustments if the flags do
> not match.
>
> There are no apparent problems if the check and EBUSY are dropped, no
> wrong refcounting, umount works etc.
>
> Reproducer:
>
> mkdir -p mnt $(eval echo mnt{1..10})
> mkfs.btrfs -f /dev/sdx
> mount /dev/sdx mnt
> for i in `seq 10`; do btrfs subvol create subv$i; done
> loops=0
> while [ $loops -lt 10 ]; do
>   umount mnt*
>   mount -o ro /dev/sdx mnt &
>   for i in `seq 10`; do
>     mount -o subvol=subv$i /dev/sdx mnt$i || \
>       echo "FAIL: mnt$i, loop $loops" > log-$i-$loops &
>     fi
>   done
>   wait
>   : $((loops++))
> done
>
> Do 10 iterations, try to run subvolume mounts concurrently, log mount
> failures to files, the actual EBUSY errors are on screen.
>
> CC: stable@vger.kernel.org
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/super.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 00b8f37cc306..0c398643d2a1 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1588,8 +1588,6 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
>  	if (s->s_root) {
>  		btrfs_close_devices(fs_devices);
>  		free_fs_info(fs_info);
> -		if ((flags ^ s->s_flags) & MS_RDONLY)
> -			error = -EBUSY;
>  	} else {
>  		snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
>  		btrfs_sb(s)->bdev_holder = fs_type;
>

Reviewed-by: Josef Bacik <jbacik@fb.com>

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo April 27, 2016, 11:22 p.m. UTC | #2
On Wed, Apr 27, 2016 at 05:14:36PM +0200, David Sterba wrote:
> A user reported mount failures with EBUSY during boot, there's root
> partition and many subvolumes, mounted via /etc/fstab.
> 
> The failure depends on timing, when multiple subvolumes reach the code
> between superblock creation in RO mode, while the subvolumes are RW.
> This discrepancy leads to EBUSY and the code has been there since ages.
> 
> If the subvolumes are mounted after a short delay, there's no EBUSY.
> There's no missing locking, the supreblock creation is atomic and the
> error code seems to be just artificial. We support different RO/RW
> mounts in mount_subvol and do the relevant adjustments if the flags do
> not match.

Looks good to me.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

But What I'm worrying about is that mount_subvol() will help subvolume
mounting get correct mount flags by calling btrfs_remount(), and
btrfs_remount() can remove sb->s_flags's MS_RDONLY.  In all syscall cases
it's ok as we have mnt->mnt_flags, but for btrfs's add_dev ioctl, we
don't check mnt_want_write_file() while btrfs's rm_dev ioctl does the
check, should we add that for add_dev ioctl?

Thanks,

-liubo

> 
> There are no apparent problems if the check and EBUSY are dropped, no
> wrong refcounting, umount works etc.
> 
> Reproducer:
> 
> mkdir -p mnt $(eval echo mnt{1..10})
> mkfs.btrfs -f /dev/sdx
> mount /dev/sdx mnt
> for i in `seq 10`; do btrfs subvol create subv$i; done
> loops=0
> while [ $loops -lt 10 ]; do
>   umount mnt*
>   mount -o ro /dev/sdx mnt &
>   for i in `seq 10`; do
>     mount -o subvol=subv$i /dev/sdx mnt$i || \
>       echo "FAIL: mnt$i, loop $loops" > log-$i-$loops &
>     fi
>   done
>   wait
>   : $((loops++))
> done
> 
> Do 10 iterations, try to run subvolume mounts concurrently, log mount
> failures to files, the actual EBUSY errors are on screen.
> 
> CC: stable@vger.kernel.org
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/super.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 00b8f37cc306..0c398643d2a1 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1588,8 +1588,6 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
>  	if (s->s_root) {
>  		btrfs_close_devices(fs_devices);
>  		free_fs_info(fs_info);
> -		if ((flags ^ s->s_flags) & MS_RDONLY)
> -			error = -EBUSY;
>  	} else {
>  		snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
>  		btrfs_sb(s)->bdev_holder = fs_type;
> -- 
> 2.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba April 28, 2016, 9:12 a.m. UTC | #3
On Wed, Apr 27, 2016 at 04:22:17PM -0700, Liu Bo wrote:
> On Wed, Apr 27, 2016 at 05:14:36PM +0200, David Sterba wrote:
> > A user reported mount failures with EBUSY during boot, there's root
> > partition and many subvolumes, mounted via /etc/fstab.
> > 
> > The failure depends on timing, when multiple subvolumes reach the code
> > between superblock creation in RO mode, while the subvolumes are RW.
> > This discrepancy leads to EBUSY and the code has been there since ages.
> > 
> > If the subvolumes are mounted after a short delay, there's no EBUSY.
> > There's no missing locking, the supreblock creation is atomic and the
> > error code seems to be just artificial. We support different RO/RW
> > mounts in mount_subvol and do the relevant adjustments if the flags do
> > not match.
> 
> Looks good to me.
> 
> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
> 
> But What I'm worrying about is that mount_subvol() will help subvolume
> mounting get correct mount flags by calling btrfs_remount(), and
> btrfs_remount() can remove sb->s_flags's MS_RDONLY.  In all syscall cases
> it's ok as we have mnt->mnt_flags, but for btrfs's add_dev ioctl, we
> don't check mnt_want_write_file() while btrfs's rm_dev ioctl does the
> check, should we add that for add_dev ioctl?

That's right, I have patches to fix that and will send them.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana April 28, 2016, 10:01 a.m. UTC | #4
On Wed, Apr 27, 2016 at 4:14 PM, David Sterba <dsterba@suse.com> wrote:
> A user reported mount failures with EBUSY during boot, there's root
> partition and many subvolumes, mounted via /etc/fstab.
>
> The failure depends on timing, when multiple subvolumes reach the code
> between superblock creation in RO mode, while the subvolumes are RW.
> This discrepancy leads to EBUSY and the code has been there since ages.
>
> If the subvolumes are mounted after a short delay, there's no EBUSY.
> There's no missing locking, the supreblock creation is atomic and the
> error code seems to be just artificial. We support different RO/RW
> mounts in mount_subvol and do the relevant adjustments if the flags do
> not match.
>
> There are no apparent problems if the check and EBUSY are dropped, no
> wrong refcounting, umount works etc.
>
> Reproducer:
>
> mkdir -p mnt $(eval echo mnt{1..10})
> mkfs.btrfs -f /dev/sdx
> mount /dev/sdx mnt
> for i in `seq 10`; do btrfs subvol create subv$i; done
> loops=0
> while [ $loops -lt 10 ]; do
>   umount mnt*
>   mount -o ro /dev/sdx mnt &
>   for i in `seq 10`; do
>     mount -o subvol=subv$i /dev/sdx mnt$i || \
>       echo "FAIL: mnt$i, loop $loops" > log-$i-$loops &
>     fi
>   done
>   wait
>   : $((loops++))
> done
>
> Do 10 iterations, try to run subvolume mounts concurrently, log mount
> failures to files, the actual EBUSY errors are on screen.

This is easy to reproduce, can you please make a test case for xfstests?
Thanks.

>
> CC: stable@vger.kernel.org
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/super.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 00b8f37cc306..0c398643d2a1 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1588,8 +1588,6 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
>         if (s->s_root) {
>                 btrfs_close_devices(fs_devices);
>                 free_fs_info(fs_info);
> -               if ((flags ^ s->s_flags) & MS_RDONLY)
> -                       error = -EBUSY;
>         } else {
>                 snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
>                 btrfs_sb(s)->bdev_holder = fs_type;
> --
> 2.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba April 28, 2016, 10:05 a.m. UTC | #5
On Thu, Apr 28, 2016 at 11:12:20AM +0200, David Sterba wrote:
> On Wed, Apr 27, 2016 at 04:22:17PM -0700, Liu Bo wrote:
> > On Wed, Apr 27, 2016 at 05:14:36PM +0200, David Sterba wrote:
> > > A user reported mount failures with EBUSY during boot, there's root
> > > partition and many subvolumes, mounted via /etc/fstab.
> > > 
> > > The failure depends on timing, when multiple subvolumes reach the code
> > > between superblock creation in RO mode, while the subvolumes are RW.
> > > This discrepancy leads to EBUSY and the code has been there since ages.
> > > 
> > > If the subvolumes are mounted after a short delay, there's no EBUSY.
> > > There's no missing locking, the supreblock creation is atomic and the
> > > error code seems to be just artificial. We support different RO/RW
> > > mounts in mount_subvol and do the relevant adjustments if the flags do
> > > not match.
> > 
> > Looks good to me.
> > 
> > Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
> > 
> > But What I'm worrying about is that mount_subvol() will help subvolume
> > mounting get correct mount flags by calling btrfs_remount(), and
> > btrfs_remount() can remove sb->s_flags's MS_RDONLY.  In all syscall cases
> > it's ok as we have mnt->mnt_flags, but for btrfs's add_dev ioctl, we
> > don't check mnt_want_write_file() while btrfs's rm_dev ioctl does the
> > check, should we add that for add_dev ioctl?
> 
> That's right, I have patches to fix that and will send them.

So, after a bit of refreshing, the patches are not finished, as add
device must respect read only mount and seeding devices, that can turn
RO to RW. And I got stuck in the device replace case, where we have to
do the drop_mnt_write some time later. Not sure when I'll get to it,
pushed to my git repos, branch fix/ioctl-want-write-local .
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba April 28, 2016, 4:17 p.m. UTC | #6
On Wed, Apr 27, 2016 at 05:14:36PM +0200, David Sterba wrote:
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1588,8 +1588,6 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
>  	if (s->s_root) {
>  		btrfs_close_devices(fs_devices);
>  		free_fs_info(fs_info);
> -		if ((flags ^ s->s_flags) & MS_RDONLY)
> -			error = -EBUSY;

And this is a wrong fix. It will break the existing semantics of mixed
ro/rw mounts. With this change, it's not possible to mount the first
subvol ro and the rest with rw. The mount succeeds, but does not lead to
the right result.

At least now I have a better idea where the race happens, btrfs_subvol
between vfs_kern_mount and the remount.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 00b8f37cc306..0c398643d2a1 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1588,8 +1588,6 @@  static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
 	if (s->s_root) {
 		btrfs_close_devices(fs_devices);
 		free_fs_info(fs_info);
-		if ((flags ^ s->s_flags) & MS_RDONLY)
-			error = -EBUSY;
 	} else {
 		snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
 		btrfs_sb(s)->bdev_holder = fs_type;