diff mbox series

[stable-5.4.y,stable-5.10.y] btrfs: scan device in non-exclusive mode

Message ID de2889bd0a9ea5446c3473fe7b2086fbd954b9ab.1680496851.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series [stable-5.4.y,stable-5.10.y] btrfs: scan device in non-exclusive mode | expand

Commit Message

Anand Jain April 3, 2023, 5:46 a.m. UTC
commit 50d281fc434cb8e2497f5e70a309ccca6b1a09f0 upstream.

This fixes mkfs/mount/check failures due to race with systemd-udevd
scan.

During the device scan initiated by systemd-udevd, other user space
EXCL operations such as mkfs, mount, or check may get blocked and result
in a "Device or resource busy" error. This is because the device
scan process opens the device with the EXCL flag in the kernel.

Two reports were received:

 - btrfs/179 test case, where the fsck command failed with the -EBUSY
   error

 - LTP pwritev03 test case, where mkfs.vfs failed with
   the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
   on the device.

In both cases, fsck and mkfs (respectively) were racing with a
systemd-udevd device scan, and systemd-udevd won, resulting in the
-EBUSY error for fsck and mkfs.

Reproducing the problem has been difficult because there is a very
small window during which these userspace threads can race to
acquire the exclusive device open. Even on the system where the problem
was observed, the problem occurrences were anywhere between 10 to 400
iterations and chances of reproducing decreases with debug printk()s.

However, an exclusive device open is unnecessary for the scan process,
as there are no write operations on the device during scan. Furthermore,
during the mount process, the superblock is re-read in the below
function call chain:

  btrfs_mount_root
   btrfs_open_devices
    open_fs_devices
     btrfs_open_one_device
       btrfs_get_bdev_and_sb

So, to fix this issue, removes the FMODE_EXCL flag from the scan
operation, and add a comment.

The case where mkfs may still write to the device and a scan is running,
the btrfs signature is not written at that time so scan will not
recognize such device.

Reported-by: Sherry Yang <sherry.yang@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---

The upstream commit 50d281fc434cb8e2497f5e70a309ccca6b1a09f0 can be
applied without conflict to LTS stable-5.15.y and stable-6.1.y. However,
on LTS stable-5.4.y and stable-5.15.y, a conflict fix is required since
the zoned device support commits are not present in these versions. This
patch resolves the conflicts.

 fs/btrfs/volumes.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Greg KH April 3, 2023, 1:02 p.m. UTC | #1
On Mon, Apr 03, 2023 at 01:46:08PM +0800, Anand Jain wrote:
> commit 50d281fc434cb8e2497f5e70a309ccca6b1a09f0 upstream.
> 
> This fixes mkfs/mount/check failures due to race with systemd-udevd
> scan.
> 
> During the device scan initiated by systemd-udevd, other user space
> EXCL operations such as mkfs, mount, or check may get blocked and result
> in a "Device or resource busy" error. This is because the device
> scan process opens the device with the EXCL flag in the kernel.
> 
> Two reports were received:
> 
>  - btrfs/179 test case, where the fsck command failed with the -EBUSY
>    error
> 
>  - LTP pwritev03 test case, where mkfs.vfs failed with
>    the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
>    on the device.
> 
> In both cases, fsck and mkfs (respectively) were racing with a
> systemd-udevd device scan, and systemd-udevd won, resulting in the
> -EBUSY error for fsck and mkfs.
> 
> Reproducing the problem has been difficult because there is a very
> small window during which these userspace threads can race to
> acquire the exclusive device open. Even on the system where the problem
> was observed, the problem occurrences were anywhere between 10 to 400
> iterations and chances of reproducing decreases with debug printk()s.
> 
> However, an exclusive device open is unnecessary for the scan process,
> as there are no write operations on the device during scan. Furthermore,
> during the mount process, the superblock is re-read in the below
> function call chain:
> 
>   btrfs_mount_root
>    btrfs_open_devices
>     open_fs_devices
>      btrfs_open_one_device
>        btrfs_get_bdev_and_sb
> 
> So, to fix this issue, removes the FMODE_EXCL flag from the scan
> operation, and add a comment.
> 
> The case where mkfs may still write to the device and a scan is running,
> the btrfs signature is not written at that time so scan will not
> recognize such device.
> 
> Reported-by: Sherry Yang <sherry.yang@oracle.com>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
> CC: stable@vger.kernel.org # 5.4+
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> 
> The upstream commit 50d281fc434cb8e2497f5e70a309ccca6b1a09f0 can be
> applied without conflict to LTS stable-5.15.y and stable-6.1.y. However,
> on LTS stable-5.4.y and stable-5.15.y, a conflict fix is required since
> the zoned device support commits are not present in these versions. This
> patch resolves the conflicts.

Now queued up, thanks.

greg k-h
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3fa972a43b5e..c5944c61317f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1579,8 +1579,17 @@  struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
 	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 	 */
 	bytenr = btrfs_sb_offset(0);
-	flags |= FMODE_EXCL;
 
+	/*
+	 * Avoid using flag |= FMODE_EXCL here, as the systemd-udev may
+	 * initiate the device scan which may race with the user's mount
+	 * or mkfs command, resulting in failure.
+	 * Since the device scan is solely for reading purposes, there is
+	 * no need for FMODE_EXCL. Additionally, the devices are read again
+	 * during the mount process. It is ok to get some inconsistent
+	 * values temporarily, as the device paths of the fsid are the only
+	 * required information for assembling the volume.
+	 */
 	bdev = blkdev_get_by_path(path, flags, holder);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);