Message ID | 20221130175653.24299-1-jack@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: Do not reread partition table on exclusively open device | expand |
On 11/30/22 10:56 AM, Jan Kara wrote: > Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in > blk_drop_partitions") we allow rereading of partition table although > there are users of the block device. This has an undesirable consequence > that e.g. if sda and sdb are assembled to a RAID1 device md0 with > partitions, BLKRRPART ioctl on sda will rescan partition table and > create sda1 device. This partition device under a raid device confuses > some programs (such as libstorage-ng used for initial partitioning for > distribution installation) leading to failures. > > Fix the problem refusing to rescan partitions if there is another user > that has the block device exclusively open. This looks nice and clean to me. Was pondering whether to queue this up for 6.1, but it's old enough that I think we should just funnel it through 6.2 and mark it stable.
On Wed, 30 Nov 2022 18:56:53 +0100, Jan Kara wrote: > Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in > blk_drop_partitions") we allow rereading of partition table although > there are users of the block device. This has an undesirable consequence > that e.g. if sda and sdb are assembled to a RAID1 device md0 with > partitions, BLKRRPART ioctl on sda will rescan partition table and > create sda1 device. This partition device under a raid device confuses > some programs (such as libstorage-ng used for initial partitioning for > distribution installation) leading to failures. > > [...] Applied, thanks! [1/1] block: Do not reread partition table on exclusively open device commit: 8d67fc20caf8b08ef6c90a04970846a950d46dd1 Best regards,
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
Hello! I'm sorry but this patch has a brownpaper bag bug because BLKRRPART now bails also if there is no exlusive opener (which didn't have any adverse effects on my test system because the BLKRRPART calls issued by systemd-udev are mostly a pointless exercise as I've found out). I'll send a fixup patch. Either fold it into this patch or just add on top. Thanks and I'm sorry for the trouble! Honza On Wed 30-11-22 23:10:02, Christoph Hellwig wrote: > Looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de>
Hi, Jan 在 2022/12/01 1:56, Jan Kara 写道: > -int disk_scan_partitions(struct gendisk *disk, fmode_t mode) > +int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner) > { > struct block_device *bdev; > > @@ -366,6 +366,9 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) > return -EINVAL; > if (disk->open_partitions) > return -EBUSY; > + /* Someone else has bdev exclusively open? */ > + if (disk->part0->bd_holder != owner) > + return -EBUSY; > > set_bit(GD_NEED_PART_SCAN, &disk->state); > bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL); It just by code review, but I think the above checking should be inside open_mutex, which is used to protect bdev claming. Otherwise there can be race that this check is pass while someone else exclusive open the disk before blkdev_get_by_dev(). How do you think about open coding blkdev_get_by_dev() here, something like: diff --git a/block/genhd.c b/block/genhd.c index 23cf83b3331c..341af4db7d54 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -358,7 +358,7 @@ EXPORT_SYMBOL_GPL(disk_uevent); int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner) { - struct block_device *bdev; + int ret = 0; if (disk->flags & (GENHD_FL_NO_PART | GENHD_FL_HIDDEN)) return -EINVAL; @@ -366,16 +366,31 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner) return -EINVAL; if (disk->open_partitions) return -EBUSY; + + disk_block_events(disk); + mutex_lock(&disk->open_mutex); + /* Someone else has bdev exclusively open? */ - if (disk->part0->bd_holder && disk->part0->bd_holder != owner) - return -EBUSY; + if (disk->part0->bd_holder && disk->part0->bd_holder != owner) { + ret = -EBUSY; + goto out; + } + + if (!disk_live(disk)) { + ret = -ENODEV; + goto out; + } set_bit(GD_NEED_PART_SCAN, &disk->state); - bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL); - if (IS_ERR(bdev)) - return PTR_ERR(bdev); - blkdev_put(bdev, mode); - return 0; + ret = blkdev_get_whole(disk->part0, mode); +out: + mutex_unlock(&disk->open_mutex); + disk_unblock_events(disk); + + if (!ret) + blkdev_put_whole(disk->part0, mode); + + return ret; }
It might be easier to just move the check into blkdev_get_whole, which also ensures that non-excluisve openers don't cause a partition scan while someone else has the device exclusively open. diff --git a/block/bdev.c b/block/bdev.c index edc110d90df404..a831b6c9c627d7 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -666,25 +666,28 @@ static void blkdev_flush_mapping(struct block_device *bdev) static int blkdev_get_whole(struct block_device *bdev, fmode_t mode) { struct gendisk *disk = bdev->bd_disk; - int ret; + int ret = 0; - if (disk->fops->open) { + if (disk->fops->open) ret = disk->fops->open(bdev, mode); - if (ret) { - /* avoid ghost partitions on a removed medium */ - if (ret == -ENOMEDIUM && - test_bit(GD_NEED_PART_SCAN, &disk->state)) - bdev_disk_changed(disk, true); + + if (ret) { + /* avoid ghost partitions on a removed medium */ + if (ret != -ENOMEDIUM) return ret; - } + } else { + if (!atomic_read(&bdev->bd_openers)) + set_init_blocksize(bdev); + atomic_inc(&bdev->bd_openers); } - if (!atomic_read(&bdev->bd_openers)) - set_init_blocksize(bdev); - if (test_bit(GD_NEED_PART_SCAN, &disk->state)) + /* + * Skip the partition scan if someone else has the device exclusively + * open. + */ + if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder) bdev_disk_changed(disk, false); - atomic_inc(&bdev->bd_openers); - return 0; + return ret; } static void blkdev_put_whole(struct block_device *bdev, fmode_t mode) diff --git a/block/genhd.c b/block/genhd.c index 23cf83b3331cde..4a7b1b8b9efdb5 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -366,9 +366,6 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner) return -EINVAL; if (disk->open_partitions) return -EBUSY; - /* Someone else has bdev exclusively open? */ - if (disk->part0->bd_holder && disk->part0->bd_holder != owner) - return -EBUSY; set_bit(GD_NEED_PART_SCAN, &disk->state); bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL);
Hi, Christoph 在 2023/01/31 22:15, Christoph Hellwig 写道: > It might be easier to just move the check into blkdev_get_whole, > which also ensures that non-excluisve openers don't cause a partition > scan while someone else has the device exclusively open. > > diff --git a/block/bdev.c b/block/bdev.c > index edc110d90df404..a831b6c9c627d7 100644 > --- a/block/bdev.c > +++ b/block/bdev.c > @@ -666,25 +666,28 @@ static void blkdev_flush_mapping(struct block_device *bdev) > static int blkdev_get_whole(struct block_device *bdev, fmode_t mode) > { > struct gendisk *disk = bdev->bd_disk; > - int ret; > + int ret = 0; > > - if (disk->fops->open) { > + if (disk->fops->open) > ret = disk->fops->open(bdev, mode); > - if (ret) { > - /* avoid ghost partitions on a removed medium */ > - if (ret == -ENOMEDIUM && > - test_bit(GD_NEED_PART_SCAN, &disk->state)) > - bdev_disk_changed(disk, true); > + > + if (ret) { > + /* avoid ghost partitions on a removed medium */ > + if (ret != -ENOMEDIUM) > return ret; > - } > + } else { > + if (!atomic_read(&bdev->bd_openers)) > + set_init_blocksize(bdev); > + atomic_inc(&bdev->bd_openers); > } > > - if (!atomic_read(&bdev->bd_openers)) > - set_init_blocksize(bdev); > - if (test_bit(GD_NEED_PART_SCAN, &disk->state)) > + /* > + * Skip the partition scan if someone else has the device exclusively > + * open. > + */ > + if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder) > bdev_disk_changed(disk, false); I think this is wrong here... We should at least allow the exclusively opener to scan partition, right? Thanks, Kuai
On Wed, Feb 01, 2023 at 09:04:12AM +0800, Yu Kuai wrote: > > + if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder) > > bdev_disk_changed(disk, false); > > I think this is wrong here... We should at least allow the exclusively > opener to scan partition, right? bd_holder is only set in bd_finish_claiming, which is called after the partition rescan.
Hi, 在 2023/02/01 14:22, Christoph Hellwig 写道: > On Wed, Feb 01, 2023 at 09:04:12AM +0800, Yu Kuai wrote: >>> + if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder) >>> bdev_disk_changed(disk, false); >> >> I think this is wrong here... We should at least allow the exclusively >> opener to scan partition, right? > > bd_holder is only set in bd_finish_claiming, which is called after > the partition rescan. > . > I mean that someone open bdev exclusively first, and then call ioctl to rescan partition. By the way, disk_scan_partitions() won't claim bdev in the first place... Thanks, Kuai
Hi, Jan and Christoph 在 2022/12/01 1:56, Jan Kara 写道: > Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in > blk_drop_partitions") we allow rereading of partition table although > there are users of the block device. This has an undesirable consequence > that e.g. if sda and sdb are assembled to a RAID1 device md0 with > partitions, BLKRRPART ioctl on sda will rescan partition table and > create sda1 device. This partition device under a raid device confuses > some programs (such as libstorage-ng used for initial partitioning for > distribution installation) leading to failures. > > Fix the problem refusing to rescan partitions if there is another user > that has the block device exclusively open. Still by code review, I just found another race that cound cause disk can't scan partition while creating: t1: create device t2: open device exclusively device_add_disk() bdev_add insert_inode_hash ... bd_finish_claiming bdev->bd_holder = holder disk_scan_partitions // check holder and failed This race is artificial and unlikely to happend in practice, but this is easy to fix by following: diff --git a/block/genhd.c b/block/genhd.c index 09f2ac548832..23e87753313b 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -497,6 +497,11 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk, if (ret) goto out_unregister_bdi; + /* + * In case user open device before disk_scan_partitions(), + * set state here, so that blkdev_open() can scan partitions. + */ + set_bit(GD_NEED_PART_SCAN, &disk->state); bdev_add(disk->part0, ddev->devt); if (get_capacity(disk)) disk_scan_partitions(disk, FMODE_READ, NULL);
diff --git a/block/blk.h b/block/blk.h index a186ea20f39d..8b75a95b28d6 100644 --- a/block/blk.h +++ b/block/blk.h @@ -436,7 +436,7 @@ static inline struct kmem_cache *blk_get_queue_kmem_cache(bool srcu) } struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu); -int disk_scan_partitions(struct gendisk *disk, fmode_t mode); +int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner); int disk_alloc_events(struct gendisk *disk); void disk_add_events(struct gendisk *disk); diff --git a/block/genhd.c b/block/genhd.c index 0f9769db2de8..012529d36f5b 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -356,7 +356,7 @@ void disk_uevent(struct gendisk *disk, enum kobject_action action) } EXPORT_SYMBOL_GPL(disk_uevent); -int disk_scan_partitions(struct gendisk *disk, fmode_t mode) +int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner) { struct block_device *bdev; @@ -366,6 +366,9 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) return -EINVAL; if (disk->open_partitions) return -EBUSY; + /* Someone else has bdev exclusively open? */ + if (disk->part0->bd_holder != owner) + return -EBUSY; set_bit(GD_NEED_PART_SCAN, &disk->state); bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL); @@ -500,7 +503,7 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk, bdev_add(disk->part0, ddev->devt); if (get_capacity(disk)) - disk_scan_partitions(disk, FMODE_READ); + disk_scan_partitions(disk, FMODE_READ, NULL); /* * Announce the disk and partitions after all partitions are diff --git a/block/ioctl.c b/block/ioctl.c index 60121e89052b..96617512982e 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -467,9 +467,10 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode, * user space. Note the separate arg/argp parameters that are needed * to deal with the compat_ptr() conversion. */ -static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode, - unsigned cmd, unsigned long arg, void __user *argp) +static int blkdev_common_ioctl(struct file *file, fmode_t mode, unsigned cmd, + unsigned long arg, void __user *argp) { + struct block_device *bdev = I_BDEV(file->f_mapping->host); unsigned int max_sectors; switch (cmd) { @@ -527,7 +528,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode, return -EACCES; if (bdev_is_partition(bdev)) return -EINVAL; - return disk_scan_partitions(bdev->bd_disk, mode & ~FMODE_EXCL); + return disk_scan_partitions(bdev->bd_disk, mode & ~FMODE_EXCL, + file); case BLKTRACESTART: case BLKTRACESTOP: case BLKTRACETEARDOWN: @@ -605,7 +607,7 @@ long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg) break; } - ret = blkdev_common_ioctl(bdev, mode, cmd, arg, argp); + ret = blkdev_common_ioctl(file, mode, cmd, arg, argp); if (ret != -ENOIOCTLCMD) return ret; @@ -674,7 +676,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg) break; } - ret = blkdev_common_ioctl(bdev, mode, cmd, arg, argp); + ret = blkdev_common_ioctl(file, mode, cmd, arg, argp); if (ret == -ENOIOCTLCMD && disk->fops->compat_ioctl) ret = disk->fops->compat_ioctl(bdev, mode, cmd, arg);
Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") we allow rereading of partition table although there are users of the block device. This has an undesirable consequence that e.g. if sda and sdb are assembled to a RAID1 device md0 with partitions, BLKRRPART ioctl on sda will rescan partition table and create sda1 device. This partition device under a raid device confuses some programs (such as libstorage-ng used for initial partitioning for distribution installation) leading to failures. Fix the problem refusing to rescan partitions if there is another user that has the block device exclusively open. Link: https://lore.kernel.org/all/20221130135344.2ul4cyfstfs3znxg@quack3 Fixes: 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") Signed-off-by: Jan Kara <jack@suse.cz> --- block/blk.h | 2 +- block/genhd.c | 7 +++++-- block/ioctl.c | 12 +++++++----- 3 files changed, 13 insertions(+), 8 deletions(-)