diff mbox series

[RFC] block: reject I/O in BLKRRPART if block size changed

Message ID 20201226180232.12276-1-minwoo.im.dev@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] block: reject I/O in BLKRRPART if block size changed | expand

Commit Message

Minwoo Im Dec. 26, 2020, 6:02 p.m. UTC
Background:
  Let's say we have 2 LBA format for 4096B and 512B LBA size for a
NVMe namespace.  Assume that current LBA format is 4096B and in case
we convert namespace to 512B and 4096B back again:

  nvme format /dev/nvme0n1 --lbaf=1 --force  # to 512B LBA
  nvme format /dev/nvme0n1 --lbaf=0 --force  # to 4096B LBA

  Then we can see the following errors during the BLKRRPART ioctl from
the nvme-cli format subcommand:

  [   10.771740] blk_update_request: operation not supported error, dev nvme0n1, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
  [   10.780262] Buffer I/O error on dev nvme0n1, logical block 0, async page read
  ...

  Also, we can see the Read commands followed by the Format command due
to BLKRRPART ioctl with Number of LBAs to 65535(0xffff) which is
under-flowed because the request for the Read commands are coming with
512B and this is because it's playing around with i_blkbits from the
block_device inode which needs to be avoided as [1].

  kworker/0:1H-56      [000] ....   913.456922: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=1, cmdid=216, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=65535, ctrl=0x0, dsmgmt=0, reftag=0)
  ksoftirqd/0-9       [000] .Ns.   916.566351: nvme_complete_rq: nvme0: disk=nvme0n1, qid=1, cmdid=216, res=0x0, retries=0, flags=0x0, status=0x4002
  ...

  Before we have commit 5ff9f19231a0 ("block: simplify
set_init_blocksize"), block size used to be bumped up to the
4K(PAGE_SIZE) in this example and we have not seen these errors.  But
with this patch, we have to make sure that bdev->bd_inode->i_blkbits to
make sure that BLKRRPART ioctl pass proper request length based on the
changed logical block size.

Description:
  As the previous discussion [1], this patch introduced a gendisk flag
to indicate that block size has been changed in the runtime.  This flag
is set when logical block size is changed in the runtime with sector
capacity itself.  It will be cleared when the file descriptor for the
block devie is opened again which means __blkdev_get() updates the block
size via set_init_blocksize().
  This patch rejects I/O from the path of add_partitions() and
application should open the file descriptor again to update the block
size of the block device inode.

[1] https://lore.kernel.org/linux-nvme/20201223183143.GB13354@localhost.localdomain/T/#t

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
---
 block/genhd.c           |  3 +++
 block/partitions/core.c | 11 +++++++++++
 fs/block_dev.c          |  6 ++++++
 include/linux/genhd.h   |  1 +
 4 files changed, 21 insertions(+)

Comments

Minwoo Im Dec. 30, 2020, 2:05 p.m. UTC | #1
On 20-12-27 03:02:32, Minwoo Im wrote:
> Background:
>   Let's say we have 2 LBA format for 4096B and 512B LBA size for a
> NVMe namespace.  Assume that current LBA format is 4096B and in case
> we convert namespace to 512B and 4096B back again:
> 
>   nvme format /dev/nvme0n1 --lbaf=1 --force  # to 512B LBA
>   nvme format /dev/nvme0n1 --lbaf=0 --force  # to 4096B LBA
> 
>   Then we can see the following errors during the BLKRRPART ioctl from
> the nvme-cli format subcommand:
> 
>   [   10.771740] blk_update_request: operation not supported error, dev nvme0n1, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>   [   10.780262] Buffer I/O error on dev nvme0n1, logical block 0, async page read
>   ...
> 
>   Also, we can see the Read commands followed by the Format command due
> to BLKRRPART ioctl with Number of LBAs to 65535(0xffff) which is
> under-flowed because the request for the Read commands are coming with
> 512B and this is because it's playing around with i_blkbits from the
> block_device inode which needs to be avoided as [1].
> 
>   kworker/0:1H-56      [000] ....   913.456922: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=1, cmdid=216, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=65535, ctrl=0x0, dsmgmt=0, reftag=0)
>   ksoftirqd/0-9       [000] .Ns.   916.566351: nvme_complete_rq: nvme0: disk=nvme0n1, qid=1, cmdid=216, res=0x0, retries=0, flags=0x0, status=0x4002
>   ...
> 
>   Before we have commit 5ff9f19231a0 ("block: simplify
> set_init_blocksize"), block size used to be bumped up to the
> 4K(PAGE_SIZE) in this example and we have not seen these errors.  But
> with this patch, we have to make sure that bdev->bd_inode->i_blkbits to
> make sure that BLKRRPART ioctl pass proper request length based on the
> changed logical block size.
> 
> Description:
>   As the previous discussion [1], this patch introduced a gendisk flag
> to indicate that block size has been changed in the runtime.  This flag
> is set when logical block size is changed in the runtime with sector
> capacity itself.  It will be cleared when the file descriptor for the
> block devie is opened again which means __blkdev_get() updates the block
> size via set_init_blocksize().
>   This patch rejects I/O from the path of add_partitions() and
> application should open the file descriptor again to update the block
> size of the block device inode.
> 
> [1] https://lore.kernel.org/linux-nvme/20201223183143.GB13354@localhost.localdomain/T/#t
> 
> Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>

Hello,

Sorry for the noises here.  Please ignore this patch.  Will try to
prepare a new one for this issue.

Thanks,
diff mbox series

Patch

diff --git a/block/genhd.c b/block/genhd.c
index b84b8671e627..1f64907fac3d 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -79,6 +79,9 @@  bool set_capacity_and_notify(struct gendisk *disk, sector_t size)
 	 */
 	if (!capacity || !size)
 		return false;
+
+	disk->flags |= GENHD_FL_BLOCK_SIZE_CHANGED;
+
 	kobject_uevent_env(&disk_to_dev(disk)->kobj, KOBJ_CHANGE, envp);
 	return true;
 }
diff --git a/block/partitions/core.c b/block/partitions/core.c
index deca253583bd..7dfcda96be9e 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -617,6 +617,17 @@  int blk_add_partitions(struct gendisk *disk, struct block_device *bdev)
 	if (!disk_part_scan_enabled(disk))
 		return 0;
 
+	/*
+	 * Reject to check partition information if block size has been changed
+	 * in the runtime.  If block size of a block device has been changed,
+	 * the file descriptor should be opened agian to update the blkbits.
+	 */
+	if (disk->flags & GENHD_FL_BLOCK_SIZE_CHANGED) {
+		pr_warn("%s: rejecting checking partition. fd should be opened again.\n",
+				disk->disk_name);
+		return -EBADFD;
+	}
+
 	state = check_partition(disk, bdev);
 	if (!state)
 		return 0;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9e56ee1f2652..813361ad77c1 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -132,6 +132,12 @@  EXPORT_SYMBOL(truncate_bdev_range);
 static void set_init_blocksize(struct block_device *bdev)
 {
 	bdev->bd_inode->i_blkbits = blksize_bits(bdev_logical_block_size(bdev));
+
+	/*
+	 * Allow I/O commands for this block device.  We can say that this
+	 * block device has been set to a proper block size.
+	 */
+	bdev->bd_disk->flags &= ~GENHD_FL_BLOCK_SIZE_CHANGED;
 }
 
 int set_blocksize(struct block_device *bdev, int size)
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 809aaa32d53c..0e0e24917003 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -103,6 +103,7 @@  struct partition_meta_info {
 #define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE	0x0100
 #define GENHD_FL_NO_PART_SCAN			0x0200
 #define GENHD_FL_HIDDEN				0x0400
+#define GENHD_FL_BLOCK_SIZE_CHANGED		0x0800
 
 enum {
 	DISK_EVENT_MEDIA_CHANGE			= 1 << 0, /* media changed */