Message ID | 20250219024409.901186-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [V4] block: make segment size limit workable for > 4K PAGE_SIZE | expand |
On Wed, Feb 19, 2025 at 10:44:09AM +0800, Ming Lei wrote: > PAGE_SIZE is applied in validating block device queue limits, this way is > very fragile and is wrong: It's neither very fragily nor wrong. If you want to change it to suit your needs that might or might not be ok but this language isn't. > - queue limits are read from hardware, which is often one readonly hardware > property queues limits aren't read from hardware per definition. Very often they are software limits.
On Wed, Feb 19, 2025 at 10:13:24PM -0800, Christoph Hellwig wrote: > On Wed, Feb 19, 2025 at 10:44:09AM +0800, Ming Lei wrote: > > PAGE_SIZE is applied in validating block device queue limits, this way is > > very fragile and is wrong: > > It's neither very fragily nor wrong. If you want to change it to suit > your needs that might or might not be ok but this language isn't. max segment size is fixed value since it is read from hardware, now kernel validates it by variable PAGE_SIZE, not fragile? Well, it isn't my need only, there are lots of device which max segment size is < 64K, they work just fine in 4K page size kernel, however, they become unusable in 64K page size kernel. > > > - queue limits are read from hardware, which is often one readonly hardware > > property > > queues limits aren't read from hardware per definition. Very often they > are software limits. Fine, I will update the commit log to just mention max segment size limit. Thanks, Ming
On Wed, Feb 19, 2025 at 10:44:09AM +0800, Ming Lei wrote: > PAGE_SIZE is applied in validating block device queue limits, this way is > very fragile and is wrong: > > - queue limits are read from hardware, which is often one readonly hardware > property > > - PAGE_SIZE is one config option which can be changed during build time. > > In RH lab, it has been found that max segment size of some mmc card is > less than 64K, then this kind of card can't be probed successfully when > same kernel is re-built with 64K PAGE_SIZE. > > Fix this issue by adding BLK_MIN_SEGMENT_SIZE and lim->min_segment_size: > > - validate segment limits by BLK_MIN_SEGMENT_SIZE which is 4K(minimized PAGE_SIZE) > > - checking if one bvec can be one segment quickly by lim->min_segment_size > > commit 6aeb4f836480 ("block: remove bio_add_pc_page") > commit 02ee5d69e3ba ("block: remove blk_rq_bio_prep") > commit b7175e24d6ac ("block: add a dma mapping iterator") Let me try to help with this commit log message a bit, how about: Using PAGE_SIZE as a minimum expected DMA segment size in consideration of devices which have a max DMA segment size of 4k when used on 64k PAGE_SIZE systems leads to devices not being able to probe such as eMMC and Exynos UFS controller [0] [1] you can end up with a probe failure as follows: WARNING: CPU: 2 PID: 397 at block/blk-settings.c:339 blk_validate_limits+0x364/0x3c0 Modules linked in: mmc_block(+) rpmb_core crct10dif_ce ghash_ce sha2_ce dw_mmc_bluefield sha256_arm64 dw_mmc_pltfm sha1_ce dw_mmc mmc_core nfit i2c_mlxbf sbsa_gwdt gpio_mlxbf2 f_tmfifo dm_mirror dm_region_hash dm_log dm_mod CPU: 2 UID: 0 PID: 397 Comm: (udev-worker) Not tainted 6.12.0-39.el10.aarch64+64k #1 Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.5.1-1-g4078432 Jan 28 2021 ng pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : blk_validate_limits+0x364/0x3c0 p.service lr : blk_set_default_limits+0x20/0x40 Setup... sp : ffff80008688f2d0 x29: ffff80008688f2d0 x28: ffff000082acb600 x27: ffff80007bef02a8 x26: ffff80007bef0000 x25: ffff80008688f58e x24: ffff80008688f450 x23: ffff80008301b000 x22: 00000000ffffffff x21: ffff800082c39950 x20: 0000000000000000 x19: ffff0000930169e0 x18: 0000000000000014 x17: 00000000767472b1 x16: 0000000005a697e6 x15: 0000000002f42ca4 x11: 00000000de7f0111 x10: 000000005285b53a x9 : ffff800080752908 x8 : 0000000000000001 x7 : 0000000000000000 x6 : 0000000000000200 x5 : 0000000000000000 x4 : 000000000000ffff x3 : 0000000000004000 x2 : 0000000000000200 x1 : 0000000000001000 x0 : ffff80008688f450 Call trace: blk_validate_limits+0x364/0x3c0 blk_set_default_limits+0x20/0x40 blk_alloc_queue+0x84/0x240 blk_mq_alloc_queue+0x80/0x118 __blk_mq_alloc_disk+0x28/0x198 mmc_alloc_disk+0xe0/0x260 [mmc_block] ... mmcblk mmc0:0001: probe with driver mmcblk failed with error -22 To fix this provide a block sanity check to ensure we use the min of the block device's segment size and PAGE_SIZE. Link: https://lore.kernel.org/linux-block/20230612203314.17820-1-bvanassche@acm.org/ # [0] Link: https://lore.kernel.org/linux-block/1d55e942-5150-de4c-3a02-c3d066f87028@acm.org/ # [1] > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 15cd231d560c..4fe2dfabfc9d 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -329,7 +329,7 @@ int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim, > > if (nsegs < lim->max_segments && > bytes + bv.bv_len <= max_bytes && > - bv.bv_offset + bv.bv_len <= PAGE_SIZE) { > + bv.bv_offset + bv.bv_len <= lim->min_segment_size) { > nsegs++; > bytes += bv.bv_len; > } else { Now that's certainly more in line with what I expected. > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 248416ecd01c..1f7d492975c1 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -367,6 +367,7 @@ struct queue_limits { > unsigned int max_sectors; > unsigned int max_user_sectors; > unsigned int max_segment_size; > + unsigned int min_segment_size; > unsigned int physical_block_size; > unsigned int logical_block_size; > unsigned int alignment_offset; > @@ -1163,6 +1164,8 @@ static inline bool bdev_is_partition(struct block_device *bdev) > enum blk_default_limits { > BLK_MAX_SEGMENTS = 128, > BLK_SAFE_MAX_SECTORS = 255, > + /* use minimized PAGE_SIZE as min segment size hint */ > + BLK_MIN_SEGMENT_SIZE = 4096, > BLK_MAX_SEGMENT_SIZE = 65536, > BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL, I think Bart provided a more sensible comment: Use 4 KiB as the smallest default supported DMA segment size limit instead of PAGE_SIZE. This is important if the page size is larger than 4 KiB since the maximum DMA segment size for some storage controllers is only 4 KiB. With that: Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Luis
diff --git a/block/blk-merge.c b/block/blk-merge.c index 15cd231d560c..4fe2dfabfc9d 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -329,7 +329,7 @@ int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim, if (nsegs < lim->max_segments && bytes + bv.bv_len <= max_bytes && - bv.bv_offset + bv.bv_len <= PAGE_SIZE) { + bv.bv_offset + bv.bv_len <= lim->min_segment_size) { nsegs++; bytes += bv.bv_len; } else { diff --git a/block/blk-settings.c b/block/blk-settings.c index c44dadc35e1e..703a9217414e 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -246,6 +246,7 @@ int blk_validate_limits(struct queue_limits *lim) { unsigned int max_hw_sectors; unsigned int logical_block_sectors; + unsigned long seg_size; int err; /* @@ -303,7 +304,7 @@ int blk_validate_limits(struct queue_limits *lim) max_hw_sectors = min_not_zero(lim->max_hw_sectors, lim->max_dev_sectors); if (lim->max_user_sectors) { - if (lim->max_user_sectors < PAGE_SIZE / SECTOR_SIZE) + if (lim->max_user_sectors < BLK_MIN_SEGMENT_SIZE / SECTOR_SIZE) return -EINVAL; lim->max_sectors = min(max_hw_sectors, lim->max_user_sectors); } else if (lim->io_opt > (BLK_DEF_MAX_SECTORS_CAP << SECTOR_SHIFT)) { @@ -341,7 +342,7 @@ int blk_validate_limits(struct queue_limits *lim) */ if (!lim->seg_boundary_mask) lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK; - if (WARN_ON_ONCE(lim->seg_boundary_mask < PAGE_SIZE - 1)) + if (WARN_ON_ONCE(lim->seg_boundary_mask < BLK_MIN_SEGMENT_SIZE - 1)) return -EINVAL; /* @@ -362,10 +363,17 @@ int blk_validate_limits(struct queue_limits *lim) */ if (!lim->max_segment_size) lim->max_segment_size = BLK_MAX_SEGMENT_SIZE; - if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE)) + if (WARN_ON_ONCE(lim->max_segment_size < BLK_MIN_SEGMENT_SIZE)) return -EINVAL; } + /* setup min segment size for building new segment in fast path */ + if (lim->seg_boundary_mask > lim->max_segment_size - 1) + seg_size = lim->max_segment_size; + else + seg_size = lim->seg_boundary_mask + 1; + lim->min_segment_size = min_t(unsigned, seg_size, PAGE_SIZE); + /* * We require drivers to at least do logical block aligned I/O, but * historically could not check for that due to the separate calls diff --git a/block/blk.h b/block/blk.h index 90fa5f28ccab..57fe8261e09f 100644 --- a/block/blk.h +++ b/block/blk.h @@ -358,8 +358,12 @@ struct bio *bio_split_zone_append(struct bio *bio, static inline bool bio_may_need_split(struct bio *bio, const struct queue_limits *lim) { - return lim->chunk_sectors || bio->bi_vcnt != 1 || - bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE; + if (lim->chunk_sectors) + return true; + if (bio->bi_vcnt != 1) + return true; + return bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > + lim->min_segment_size; } /** diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 248416ecd01c..1f7d492975c1 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -367,6 +367,7 @@ struct queue_limits { unsigned int max_sectors; unsigned int max_user_sectors; unsigned int max_segment_size; + unsigned int min_segment_size; unsigned int physical_block_size; unsigned int logical_block_size; unsigned int alignment_offset; @@ -1163,6 +1164,8 @@ static inline bool bdev_is_partition(struct block_device *bdev) enum blk_default_limits { BLK_MAX_SEGMENTS = 128, BLK_SAFE_MAX_SECTORS = 255, + /* use minimized PAGE_SIZE as min segment size hint */ + BLK_MIN_SEGMENT_SIZE = 4096, BLK_MAX_SEGMENT_SIZE = 65536, BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL, };