diff mbox series

block: Do not reread partition table on exclusively open device

Message ID 20221130175653.24299-1-jack@suse.cz (mailing list archive)
State New, archived
Headers show
Series block: Do not reread partition table on exclusively open device | expand

Commit Message

Jan Kara Nov. 30, 2022, 5:56 p.m. UTC
Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in
blk_drop_partitions") we allow rereading of partition table although
there are users of the block device. This has an undesirable consequence
that e.g. if sda and sdb are assembled to a RAID1 device md0 with
partitions, BLKRRPART ioctl on sda will rescan partition table and
create sda1 device. This partition device under a raid device confuses
some programs (such as libstorage-ng used for initial partitioning for
distribution installation) leading to failures.

Fix the problem refusing to rescan partitions if there is another user
that has the block device exclusively open.

Link: https://lore.kernel.org/all/20221130135344.2ul4cyfstfs3znxg@quack3
Fixes: 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk.h   |  2 +-
 block/genhd.c |  7 +++++--
 block/ioctl.c | 12 +++++++-----
 3 files changed, 13 insertions(+), 8 deletions(-)

Comments

Jens Axboe Nov. 30, 2022, 10:52 p.m. UTC | #1
On 11/30/22 10:56 AM, Jan Kara wrote:
> Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in
> blk_drop_partitions") we allow rereading of partition table although
> there are users of the block device. This has an undesirable consequence
> that e.g. if sda and sdb are assembled to a RAID1 device md0 with
> partitions, BLKRRPART ioctl on sda will rescan partition table and
> create sda1 device. This partition device under a raid device confuses
> some programs (such as libstorage-ng used for initial partitioning for
> distribution installation) leading to failures.
> 
> Fix the problem refusing to rescan partitions if there is another user
> that has the block device exclusively open.

This looks nice and clean to me. Was pondering whether to queue this up
for 6.1, but it's old enough that I think we should just funnel it through
6.2 and mark it stable.
Jens Axboe Nov. 30, 2022, 10:54 p.m. UTC | #2
On Wed, 30 Nov 2022 18:56:53 +0100, Jan Kara wrote:
> Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in
> blk_drop_partitions") we allow rereading of partition table although
> there are users of the block device. This has an undesirable consequence
> that e.g. if sda and sdb are assembled to a RAID1 device md0 with
> partitions, BLKRRPART ioctl on sda will rescan partition table and
> create sda1 device. This partition device under a raid device confuses
> some programs (such as libstorage-ng used for initial partitioning for
> distribution installation) leading to failures.
> 
> [...]

Applied, thanks!

[1/1] block: Do not reread partition table on exclusively open device
      commit: 8d67fc20caf8b08ef6c90a04970846a950d46dd1

Best regards,
Christoph Hellwig Dec. 1, 2022, 7:10 a.m. UTC | #3
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Jan Kara Dec. 1, 2022, 11:44 a.m. UTC | #4
Hello!

I'm sorry but this patch has a brownpaper bag bug because BLKRRPART now
bails also if there is no exlusive opener (which didn't have any adverse
effects on my test system because the BLKRRPART calls issued by
systemd-udev are mostly a pointless exercise as I've found out). I'll send
a fixup patch.  Either fold it into this patch or just add on top. Thanks
and I'm sorry for the trouble!
								Honza

On Wed 30-11-22 23:10:02, Christoph Hellwig wrote:
> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
Yu Kuai Jan. 31, 2023, 2 p.m. UTC | #5
Hi, Jan

在 2022/12/01 1:56, Jan Kara 写道:

> -int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
> +int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner)
>   {
>   	struct block_device *bdev;
>   
> @@ -366,6 +366,9 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
>   		return -EINVAL;
>   	if (disk->open_partitions)
>   		return -EBUSY;
> +	/* Someone else has bdev exclusively open? */
> +	if (disk->part0->bd_holder != owner)
> +		return -EBUSY;
>   
>   	set_bit(GD_NEED_PART_SCAN, &disk->state);
>   	bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL);

It just by code review, but I think the above checking should be inside
open_mutex, which is used to protect bdev claming. Otherwise there can
be race that this check is pass while someone else exclusive open the
disk before blkdev_get_by_dev().

How do you think about open coding blkdev_get_by_dev() here, something
like:

diff --git a/block/genhd.c b/block/genhd.c
index 23cf83b3331c..341af4db7d54 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -358,7 +358,7 @@ EXPORT_SYMBOL_GPL(disk_uevent);

  int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner)
  {
-       struct block_device *bdev;
+       int ret = 0;

         if (disk->flags & (GENHD_FL_NO_PART | GENHD_FL_HIDDEN))
                 return -EINVAL;
@@ -366,16 +366,31 @@ int disk_scan_partitions(struct gendisk *disk, 
fmode_t mode, void *owner)
                 return -EINVAL;
         if (disk->open_partitions)
                 return -EBUSY;
+
+       disk_block_events(disk);
+       mutex_lock(&disk->open_mutex);
+
         /* Someone else has bdev exclusively open? */
-       if (disk->part0->bd_holder && disk->part0->bd_holder != owner)
-               return -EBUSY;
+       if (disk->part0->bd_holder && disk->part0->bd_holder != owner) {
+               ret = -EBUSY;
+               goto out;
+       }
+
+       if (!disk_live(disk)) {
+               ret = -ENODEV;
+               goto out;
+       }

         set_bit(GD_NEED_PART_SCAN, &disk->state);
-       bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL);
-       if (IS_ERR(bdev))
-               return PTR_ERR(bdev);
-       blkdev_put(bdev, mode);
-       return 0;
+       ret = blkdev_get_whole(disk->part0, mode);
+out:
+       mutex_unlock(&disk->open_mutex);
+       disk_unblock_events(disk);
+
+       if (!ret)
+               blkdev_put_whole(disk->part0, mode);
+
+       return ret;
  }
Christoph Hellwig Jan. 31, 2023, 2:15 p.m. UTC | #6
It might be easier to just move the check into blkdev_get_whole,
which also ensures that non-excluisve openers don't cause a partition
scan while someone else has the device exclusively open.

diff --git a/block/bdev.c b/block/bdev.c
index edc110d90df404..a831b6c9c627d7 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -666,25 +666,28 @@ static void blkdev_flush_mapping(struct block_device *bdev)
 static int blkdev_get_whole(struct block_device *bdev, fmode_t mode)
 {
 	struct gendisk *disk = bdev->bd_disk;
-	int ret;
+	int ret = 0;
 
-	if (disk->fops->open) {
+	if (disk->fops->open)
 		ret = disk->fops->open(bdev, mode);
-		if (ret) {
-			/* avoid ghost partitions on a removed medium */
-			if (ret == -ENOMEDIUM &&
-			     test_bit(GD_NEED_PART_SCAN, &disk->state))
-				bdev_disk_changed(disk, true);
+
+	if (ret) {
+		/* avoid ghost partitions on a removed medium */
+		if (ret != -ENOMEDIUM)
 			return ret;
-		}
+	} else {
+		if (!atomic_read(&bdev->bd_openers))
+			set_init_blocksize(bdev);
+		atomic_inc(&bdev->bd_openers);
 	}
 
-	if (!atomic_read(&bdev->bd_openers))
-		set_init_blocksize(bdev);
-	if (test_bit(GD_NEED_PART_SCAN, &disk->state))
+	/*
+	 * Skip the partition scan if someone else has the device exclusively
+	 * open.
+	 */
+	if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder)
 		bdev_disk_changed(disk, false);
-	atomic_inc(&bdev->bd_openers);
-	return 0;
+	return ret;
 }
 
 static void blkdev_put_whole(struct block_device *bdev, fmode_t mode)
diff --git a/block/genhd.c b/block/genhd.c
index 23cf83b3331cde..4a7b1b8b9efdb5 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -366,9 +366,6 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner)
 		return -EINVAL;
 	if (disk->open_partitions)
 		return -EBUSY;
-	/* Someone else has bdev exclusively open? */
-	if (disk->part0->bd_holder && disk->part0->bd_holder != owner)
-		return -EBUSY;
 
 	set_bit(GD_NEED_PART_SCAN, &disk->state);
 	bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL);
Yu Kuai Feb. 1, 2023, 1:04 a.m. UTC | #7
Hi, Christoph

在 2023/01/31 22:15, Christoph Hellwig 写道:
> It might be easier to just move the check into blkdev_get_whole,
> which also ensures that non-excluisve openers don't cause a partition
> scan while someone else has the device exclusively open.
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index edc110d90df404..a831b6c9c627d7 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -666,25 +666,28 @@ static void blkdev_flush_mapping(struct block_device *bdev)
>   static int blkdev_get_whole(struct block_device *bdev, fmode_t mode)
>   {
>   	struct gendisk *disk = bdev->bd_disk;
> -	int ret;
> +	int ret = 0;
>   
> -	if (disk->fops->open) {
> +	if (disk->fops->open)
>   		ret = disk->fops->open(bdev, mode);
> -		if (ret) {
> -			/* avoid ghost partitions on a removed medium */
> -			if (ret == -ENOMEDIUM &&
> -			     test_bit(GD_NEED_PART_SCAN, &disk->state))
> -				bdev_disk_changed(disk, true);
> +
> +	if (ret) {
> +		/* avoid ghost partitions on a removed medium */
> +		if (ret != -ENOMEDIUM)
>   			return ret;
> -		}
> +	} else {
> +		if (!atomic_read(&bdev->bd_openers))
> +			set_init_blocksize(bdev);
> +		atomic_inc(&bdev->bd_openers);
>   	}
>   
> -	if (!atomic_read(&bdev->bd_openers))
> -		set_init_blocksize(bdev);
> -	if (test_bit(GD_NEED_PART_SCAN, &disk->state))
> +	/*
> +	 * Skip the partition scan if someone else has the device exclusively
> +	 * open.
> +	 */
> +	if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder)
>   		bdev_disk_changed(disk, false);

I think this is wrong here... We should at least allow the exclusively
opener to scan partition, right?

Thanks,
Kuai
Christoph Hellwig Feb. 1, 2023, 6:22 a.m. UTC | #8
On Wed, Feb 01, 2023 at 09:04:12AM +0800, Yu Kuai wrote:
> > +	if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder)
> >   		bdev_disk_changed(disk, false);
> 
> I think this is wrong here... We should at least allow the exclusively
> opener to scan partition, right?

bd_holder is only set in bd_finish_claiming, which is called after
the partition rescan.
Yu Kuai Feb. 1, 2023, 7:20 a.m. UTC | #9
Hi,

在 2023/02/01 14:22, Christoph Hellwig 写道:
> On Wed, Feb 01, 2023 at 09:04:12AM +0800, Yu Kuai wrote:
>>> +	if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder)
>>>    		bdev_disk_changed(disk, false);
>>
>> I think this is wrong here... We should at least allow the exclusively
>> opener to scan partition, right?
> 
> bd_holder is only set in bd_finish_claiming, which is called after
> the partition rescan.
> .
> 

I mean that someone open bdev exclusively first, and then call ioctl to
rescan partition.

By the way, disk_scan_partitions() won't claim bdev in the first
place...

Thanks,
Kuai
Yu Kuai Feb. 2, 2023, 12:50 p.m. UTC | #10
Hi, Jan and Christoph

在 2022/12/01 1:56, Jan Kara 写道:
> Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in
> blk_drop_partitions") we allow rereading of partition table although
> there are users of the block device. This has an undesirable consequence
> that e.g. if sda and sdb are assembled to a RAID1 device md0 with
> partitions, BLKRRPART ioctl on sda will rescan partition table and
> create sda1 device. This partition device under a raid device confuses
> some programs (such as libstorage-ng used for initial partitioning for
> distribution installation) leading to failures.
> 
> Fix the problem refusing to rescan partitions if there is another user
> that has the block device exclusively open.

Still by code review, I just found another race that cound cause disk
can't scan partition while creating:

t1: create device	t2: open device exclusively
device_add_disk()
  bdev_add
   insert_inode_hash
                         ...
                         bd_finish_claiming
                          bdev->bd_holder = holder
disk_scan_partitions
// check holder and failed

This race is artificial and unlikely to happend in practice, but this
is easy to fix by following:

diff --git a/block/genhd.c b/block/genhd.c
index 09f2ac548832..23e87753313b 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -497,6 +497,11 @@ int __must_check device_add_disk(struct device 
*parent, struct gendisk *disk,
                 if (ret)
                         goto out_unregister_bdi;

+               /*
+                * In case user open device before disk_scan_partitions(),
+                * set state here, so that blkdev_open() can scan 
partitions.
+                */
+               set_bit(GD_NEED_PART_SCAN, &disk->state);
                 bdev_add(disk->part0, ddev->devt);
                 if (get_capacity(disk))
                         disk_scan_partitions(disk, FMODE_READ, NULL);
diff mbox series

Patch

diff --git a/block/blk.h b/block/blk.h
index a186ea20f39d..8b75a95b28d6 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -436,7 +436,7 @@  static inline struct kmem_cache *blk_get_queue_kmem_cache(bool srcu)
 }
 struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu);
 
-int disk_scan_partitions(struct gendisk *disk, fmode_t mode);
+int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner);
 
 int disk_alloc_events(struct gendisk *disk);
 void disk_add_events(struct gendisk *disk);
diff --git a/block/genhd.c b/block/genhd.c
index 0f9769db2de8..012529d36f5b 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -356,7 +356,7 @@  void disk_uevent(struct gendisk *disk, enum kobject_action action)
 }
 EXPORT_SYMBOL_GPL(disk_uevent);
 
-int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
+int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner)
 {
 	struct block_device *bdev;
 
@@ -366,6 +366,9 @@  int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
 		return -EINVAL;
 	if (disk->open_partitions)
 		return -EBUSY;
+	/* Someone else has bdev exclusively open? */
+	if (disk->part0->bd_holder != owner)
+		return -EBUSY;
 
 	set_bit(GD_NEED_PART_SCAN, &disk->state);
 	bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL);
@@ -500,7 +503,7 @@  int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
 
 		bdev_add(disk->part0, ddev->devt);
 		if (get_capacity(disk))
-			disk_scan_partitions(disk, FMODE_READ);
+			disk_scan_partitions(disk, FMODE_READ, NULL);
 
 		/*
 		 * Announce the disk and partitions after all partitions are
diff --git a/block/ioctl.c b/block/ioctl.c
index 60121e89052b..96617512982e 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -467,9 +467,10 @@  static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
  * user space. Note the separate arg/argp parameters that are needed
  * to deal with the compat_ptr() conversion.
  */
-static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
-				unsigned cmd, unsigned long arg, void __user *argp)
+static int blkdev_common_ioctl(struct file *file, fmode_t mode, unsigned cmd,
+			       unsigned long arg, void __user *argp)
 {
+	struct block_device *bdev = I_BDEV(file->f_mapping->host);
 	unsigned int max_sectors;
 
 	switch (cmd) {
@@ -527,7 +528,8 @@  static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
 			return -EACCES;
 		if (bdev_is_partition(bdev))
 			return -EINVAL;
-		return disk_scan_partitions(bdev->bd_disk, mode & ~FMODE_EXCL);
+		return disk_scan_partitions(bdev->bd_disk, mode & ~FMODE_EXCL,
+					    file);
 	case BLKTRACESTART:
 	case BLKTRACESTOP:
 	case BLKTRACETEARDOWN:
@@ -605,7 +607,7 @@  long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 		break;
 	}
 
-	ret = blkdev_common_ioctl(bdev, mode, cmd, arg, argp);
+	ret = blkdev_common_ioctl(file, mode, cmd, arg, argp);
 	if (ret != -ENOIOCTLCMD)
 		return ret;
 
@@ -674,7 +676,7 @@  long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 		break;
 	}
 
-	ret = blkdev_common_ioctl(bdev, mode, cmd, arg, argp);
+	ret = blkdev_common_ioctl(file, mode, cmd, arg, argp);
 	if (ret == -ENOIOCTLCMD && disk->fops->compat_ioctl)
 		ret = disk->fops->compat_ioctl(bdev, mode, cmd, arg);