diff mbox series

block: allow device to have both virt_boundary_mask and max segment size

Message ID 20240407131931.4055231-1-ming.lei@redhat.com (mailing list archive)
State New
Headers show
Series block: allow device to have both virt_boundary_mask and max segment size | expand

Commit Message

Ming Lei April 7, 2024, 1:19 p.m. UTC
When one stacking device is over one device with virt_boundary_mask and
another one with max segment size, the stacking device have both limits
set. This way is allowed before d690cb8ae14b ("block: add an API to
atomically update queue limits").

Relax the limit so that we won't break such kind of stacking setting.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218687
Reported-by: janpieter.sollie@edpnet.be
Fixes: d690cb8ae14b ("block: add an API to atomically update queue limits")
Link: https://lore.kernel.org/linux-block/ZfGl8HzUpiOxCLm3@fedora/
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: dm-devel@lists.linux.dev
Cc: Song Liu <song@kernel.org>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-settings.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

Comments

Mike Snitzer April 7, 2024, 2:57 p.m. UTC | #1
On Sun, Apr 07 2024 at  9:19P -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> When one stacking device is over one device with virt_boundary_mask and
> another one with max segment size, the stacking device have both limits
> set. This way is allowed before d690cb8ae14b ("block: add an API to
> atomically update queue limits").
> 
> Relax the limit so that we won't break such kind of stacking setting.
> 
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218687
> Reported-by: janpieter.sollie@edpnet.be
> Fixes: d690cb8ae14b ("block: add an API to atomically update queue limits")
> Link: https://lore.kernel.org/linux-block/ZfGl8HzUpiOxCLm3@fedora/
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Cc: dm-devel@lists.linux.dev
> Cc: Song Liu <song@kernel.org>
> Cc: linux-raid@vger.kernel.org
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-settings.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index cdbaef159c4b..d2731843f2fc 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -182,17 +182,13 @@ static int blk_validate_limits(struct queue_limits *lim)
>  		return -EINVAL;
>  
>  	/*
> -	 * Devices that require a virtual boundary do not support scatter/gather
> -	 * I/O natively, but instead require a descriptor list entry for each
> -	 * page (which might not be identical to the Linux PAGE_SIZE).  Because
> -	 * of that they are not limited by our notion of "segment size".
> +	 * Stacking device may have both virtual boundary and max segment
> +	 * size limit, so allow this setting now, and long-term the two
> +	 * might need to move out of stacking limits since we have immutable
> +	 * bvec and lower layer bio splitting is supposed to handle the two
> +	 * correctly.
>  	 */
> -	if (lim->virt_boundary_mask) {
> -		if (WARN_ON_ONCE(lim->max_segment_size &&
> -				 lim->max_segment_size != UINT_MAX))
> -			return -EINVAL;
> -		lim->max_segment_size = UINT_MAX;
> -	} else {
> +	if (!lim->virt_boundary_mask) {
>  		/*
>  		 * The maximum segment size has an odd historic 64k default that
>  		 * drivers probably should override.  Just like the I/O size we
> -- 
> 2.41.0
> 

Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Jens Axboe April 7, 2024, 9:50 p.m. UTC | #2
On Sun, 07 Apr 2024 21:19:31 +0800, Ming Lei wrote:
> When one stacking device is over one device with virt_boundary_mask and
> another one with max segment size, the stacking device have both limits
> set. This way is allowed before d690cb8ae14b ("block: add an API to
> atomically update queue limits").
> 
> Relax the limit so that we won't break such kind of stacking setting.
> 
> [...]

Applied, thanks!

[1/1] block: allow device to have both virt_boundary_mask and max segment size
      commit: b561ea56a26415bf44ce8ca6a8e625c7c390f1ea

Best regards,
Christoph Hellwig April 8, 2024, 5:55 a.m. UTC | #3
On Sun, Apr 07, 2024 at 09:19:31PM +0800, Ming Lei wrote:
> When one stacking device is over one device with virt_boundary_mask and
> another one with max segment size, the stacking device have both limits
> set. This way is allowed before d690cb8ae14b ("block: add an API to
> atomically update queue limits").
> 
> Relax the limit so that we won't break such kind of stacking setting.

No, this is broken as discussed before.  With a virt_boundary_mask
we create a segment for every page (that is device page, which usually
but not always is the same as the Linux page size).  If we now also
limit the segment size, we fail to produce valid I/O.

The problem is that that neither the segment_size nor the
virtual_boundary should be inherited by a stackable device and we
need to fix that.
Ming Lei April 8, 2024, 7:36 a.m. UTC | #4
On Mon, Apr 08, 2024 at 07:55:42AM +0200, Christoph Hellwig wrote:
> On Sun, Apr 07, 2024 at 09:19:31PM +0800, Ming Lei wrote:
> > When one stacking device is over one device with virt_boundary_mask and
> > another one with max segment size, the stacking device have both limits
> > set. This way is allowed before d690cb8ae14b ("block: add an API to
> > atomically update queue limits").
> > 
> > Relax the limit so that we won't break such kind of stacking setting.
> 
> No, this is broken as discussed before.  With a virt_boundary_mask
> we create a segment for every page (that is device page, which usually
> but not always is the same as the Linux page size).  If we now also
> limit the segment size, we fail to produce valid I/O.

It isn't now we put the limit, and this way has been done for stacking device
since beginning, it is actually added by commit d690cb8ae14b in v6.9-rc1.

If max segment size isn't aligned with virt_boundary_mask, bio_split_rw()
will split the bio with max segment size, this way still works, just not
efficiently. And in reality, the two are often aligned.

> 
> The problem is that that neither the segment_size nor the
> virtual_boundary should be inherited by a stackable device and we
> need to fix that.

It is one big change with regression risk, which may not be good after -rc3.



Thanks,
Ming
Christoph Hellwig April 8, 2024, 8:47 a.m. UTC | #5
On Mon, Apr 08, 2024 at 03:36:50PM +0800, Ming Lei wrote:
> It isn't now we put the limit, and this way has been done for stacking device
> since beginning, it is actually added by commit d690cb8ae14b in v6.9-rc1.
> 
> If max segment size isn't aligned with virt_boundary_mask, bio_split_rw()
> will split the bio with max segment size, this way still works, just not
> efficiently. And in reality, the two are often aligned.

We've had real bugs due to this, which is why we have the check.  We also
had a warning before the commit, it's just it got skipped for
stacking.  So even if we want to return to the broken pre-6.9-rc behavior
it should only be for stacking.  I don't think that is a good outcome,
though.
Ming Lei April 8, 2024, 9:48 a.m. UTC | #6
On Mon, Apr 08, 2024 at 10:47:39AM +0200, Christoph Hellwig wrote:
> On Mon, Apr 08, 2024 at 03:36:50PM +0800, Ming Lei wrote:
> > It isn't now we put the limit, and this way has been done for stacking device
> > since beginning, it is actually added by commit d690cb8ae14b in v6.9-rc1.
> > 
> > If max segment size isn't aligned with virt_boundary_mask, bio_split_rw()
> > will split the bio with max segment size, this way still works, just not
> > efficiently. And in reality, the two are often aligned.
> 
> We've had real bugs due to this, which is why we have the check.  We also
> had a warning before the commit, it's just it got skipped for
> stacking.  So even if we want to return to the broken pre-6.9-rc behavior
> it should only be for stacking.  I don't think that is a good outcome,
> though.

The limit is from commit 09324d32d2a0 ("block: force an unlimited segment
size on queues with a virt boundary") which claims to fix f6970f83ef79
("block: don't check if adjacent bvecs in one bio can be mergeable").

However commit f6970f83ef79 only covers merge code which isn't used by
bio driver at all, so not sure pre-6.9-rc is broken for stacking driver.

Also commit 09324d32d2a0 mentioned that it did not cause problem,
actually 64K default segment size limits always exists even though the
device doesn't provide one, so looks there isn't report as 'real bugs',
or maybe I miss something?

commit 09324d32d2a0843e66652a087da6f77924358e62
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue May 21 09:01:41 2019 +0200

    block: force an unlimited segment size on queues with a virt boundary

    We currently fail to update the front/back segment size in the bio when
    deciding to allow an otherwise gappy segement to a device with a
    virt boundary.  The reason why this did not cause problems is that
    devices with a virt boundary fundamentally don't use segments as we
    know it and thus don't care.  Make that assumption formal by forcing
    an unlimited segement size in this case.

    Fixes: f6970f83ef79 ("block: don't check if adjacent bvecs in one bio can be mergeable")
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>


Thanks,
Ming
janpieter.sollie@edpnet.be April 8, 2024, 12:48 p.m. UTC | #7
On 2024-04-08 09:36, Ming Lei wrote:
> 
> It isn't now we put the limit, and this way has been done for stacking 
> device
> since beginning, it is actually added by commit d690cb8ae14b in 
> v6.9-rc1.
> 
> If max segment size isn't aligned with virt_boundary_mask, 
> bio_split_rw()
> will split the bio with max segment size, this way still works, just 
> not
> efficiently. And in reality, the two are often aligned.

I take it as a compliment, building exotic configurations is something 
I'd love to be good at.
But, as far as I understand, this warning is caused by my raid config, 
right?
How is it possible that a raid6 array has a queue/max_segment_size of 
(2^16 - 1) in sysfs while 2 others on the same system have a 
queue/max_segment_size of (2^32 - 1)?
they are all rotational devices on the same SAS controller, just this 
malfunctioning one uses SATA drives while the other 2 are SAS.
Understanding this would help me to avoid this unwanted behavior.

Kind regards,

Janpieter Sollie
Christoph Hellwig April 9, 2024, 1:57 p.m. UTC | #8
On Mon, Apr 08, 2024 at 05:48:02PM +0800, Ming Lei wrote:
> The limit is from commit 09324d32d2a0 ("block: force an unlimited segment
> size on queues with a virt boundary") which claims to fix f6970f83ef79
> ("block: don't check if adjacent bvecs in one bio can be mergeable").
> 
> However commit f6970f83ef79 only covers merge code which isn't used by
> bio driver at all, so not sure pre-6.9-rc is broken for stacking driver.

We can stack rq drivers as well.

> Also commit 09324d32d2a0 mentioned that it did not cause problem,
> actually 64K default segment size limits always exists even though the
> device doesn't provide one, so looks there isn't report as 'real bugs',
> or maybe I miss something?

The problem is when the segment size does not align to the boundary
mask as you'll now start feeding malformed segments/entris to the
device.
Ming Lei April 9, 2024, 3:56 p.m. UTC | #9
On Tue, Apr 09, 2024 at 03:57:58PM +0200, Christoph Hellwig wrote:
> On Mon, Apr 08, 2024 at 05:48:02PM +0800, Ming Lei wrote:
> > The limit is from commit 09324d32d2a0 ("block: force an unlimited segment
> > size on queues with a virt boundary") which claims to fix f6970f83ef79
> > ("block: don't check if adjacent bvecs in one bio can be mergeable").
> > 
> > However commit f6970f83ef79 only covers merge code which isn't used by
> > bio driver at all, so not sure pre-6.9-rc is broken for stacking driver.
> 
> We can stack rq drivers as well.
> 
> > Also commit 09324d32d2a0 mentioned that it did not cause problem,
> > actually 64K default segment size limits always exists even though the
> > device doesn't provide one, so looks there isn't report as 'real bugs',
> > or maybe I miss something?
> 
> The problem is when the segment size does not align to the boundary
> mask as you'll now start feeding malformed segments/entris to the
> device.

In case of single bio, this bio will be split with max segment size
if segment size doesn't align with virt boundary, so the resulted bio
is still valid because virt-boundary allows the last bvec to be
unaligned. 

In case of bio merge, bio_will_gap() is always called to make sure
there isn't gap between the two bios wrt. virt boundary.

Can you explain a bit how one malformed bio is made?



Thanks, 
Ming
Geert Uytterhoeven April 24, 2024, 10:26 a.m. UTC | #10
Hi Ming,

On Sun, 7 Apr 2024, Ming Lei wrote:
> When one stacking device is over one device with virt_boundary_mask and
> another one with max segment size, the stacking device have both limits
> set. This way is allowed before d690cb8ae14b ("block: add an API to
> atomically update queue limits").
>
> Relax the limit so that we won't break such kind of stacking setting.
>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218687
> Reported-by: janpieter.sollie@edpnet.be
> Fixes: d690cb8ae14b ("block: add an API to atomically update queue limits")
> Link: https://lore.kernel.org/linux-block/ZfGl8HzUpiOxCLm3@fedora/
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Cc: dm-devel@lists.linux.dev
> Cc: Song Liu <song@kernel.org>
> Cc: linux-raid@vger.kernel.org
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Thanks for your patch, which is now commit b561ea56a26415bf ("block:
allow device to have both virt_boundary_mask and max segment size") in
v6.9-rc4.

With CONFIG_DMA_API_DEBUG_SG=y and IOMMU support enabled, this causes a
warning on R-Car Gen3/Gen4 platforms:

     DMA-API: renesas_sdhi_internal_dmac ee160000.mmc: mapping sg segment longer than device claims to support [len=86016] [max=65536]
     WARNING: CPU: 1 PID: 281 at kernel/dma/debug.c:1178 debug_dma_map_sg+0x2ac/0x330
     Modules linked in:
     CPU: 1 PID: 281 Comm: systemd-udevd Tainted: G        W          6.9.0-rc2-ebisu-00012-gb561ea56a264 #596
     Hardware name: Renesas Ebisu board based on r8a77990 (DT)
     pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
     pc : debug_dma_map_sg+0x2ac/0x330
     lr : debug_dma_map_sg+0x2ac/0x330
     sp : ffffffc083643470
     x29: ffffffc083643470 x28: 0000000000000080 x27: 0000000000010000
     x26: 0000000000000000 x25: 0000000000000001 x24: ffffffc0810afc30
     x23: ffffffffffffffff x22: ffffffc080c8366f x21: ffffff8008849f80
     x20: ffffff800cd24000 x19: ffffff80099a2810 x18: 0000000000000000
     x17: ffffff800801a000 x16: ffffffc080453f00 x15: ffffffc0836430f0
     x14: ffffffc08099fb50 x13: 0000000000000007 x12: 0000000000000000
     x11: 0000000000000202 x10: ffffffc0810d99d0 x9 : ffffffc081189bb0
     x8 : ffffffc083643178 x7 : ffffffc083643180 x6 : 00000000ffffdfff
     x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff800ebad280
     Call trace:
      debug_dma_map_sg+0x2ac/0x330
      __dma_map_sg_attrs+0xcc/0xd0
      dma_map_sg_attrs+0xc/0x1c
      renesas_sdhi_internal_dmac_map+0x64/0x94
      renesas_sdhi_internal_dmac_pre_req+0x20/0x2c
      mmc_blk_mq_issue_rq+0x62c/0x6c8
      mmc_mq_queue_rq+0x194/0x218
      blk_mq_dispatch_rq_list+0x36c/0x4d4
      __blk_mq_sched_dispatch_requests+0x344/0x4e0
      blk_mq_sched_dispatch_requests+0x28/0x5c
      blk_mq_run_hw_queue+0x1a4/0x218
      blk_mq_flush_plug_list+0x2fc/0x4a0
      __blk_flush_plug+0x70/0x134
      blk_finish_plug+0x24/0x34
      read_pages+0x60/0x158
      page_cache_ra_unbounded+0x98/0x184
      do_page_cache_ra+0x44/0x50
      force_page_cache_ra+0x98/0x9c
      page_cache_sync_ra+0x30/0x54
      filemap_get_pages+0xfc/0x4f8
      filemap_read+0xe8/0x2b8
      blkdev_read_iter+0x12c/0x144
      vfs_read+0x104/0x150
      ksys_read+0x6c/0xd4
      __arm64_sys_read+0x14/0x1c
      invoke_syscall+0x70/0xf4
      el0_svc_common.constprop.0+0xb0/0xcc
      do_el0_svc+0x1c/0x24
      el0_svc+0x34/0x8c
      el0t_64_sync_handler+0x88/0x124
      el0t_64_sync+0x150/0x154
     irq event stamp: 0
     hardirqs last  enabled at (0): [<0000000000000000>] 0x0
     hardirqs last disabled at (0): [<ffffffc08007efa4>] copy_process+0x6ac/0x18d4
     softirqs last  enabled at (0): [<ffffffc08007efa4>] copy_process+0x6ac/0x18d4
     softirqs last disabled at (0): [<0000000000000000>] 0x0

Reverting this commit, or disabling IOMMU support fixes the issue.

> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -182,17 +182,13 @@ static int blk_validate_limits(struct queue_limits *lim)
> 		return -EINVAL;
>
> 	/*
> -	 * Devices that require a virtual boundary do not support scatter/gather
> -	 * I/O natively, but instead require a descriptor list entry for each
> -	 * page (which might not be identical to the Linux PAGE_SIZE).  Because
> -	 * of that they are not limited by our notion of "segment size".
> +	 * Stacking device may have both virtual boundary and max segment
> +	 * size limit, so allow this setting now, and long-term the two
> +	 * might need to move out of stacking limits since we have immutable
> +	 * bvec and lower layer bio splitting is supposed to handle the two
> +	 * correctly.
> 	 */
> -	if (lim->virt_boundary_mask) {
> -		if (WARN_ON_ONCE(lim->max_segment_size &&
> -				 lim->max_segment_size != UINT_MAX))
> -			return -EINVAL;
> -		lim->max_segment_size = UINT_MAX;
> -	} else {
> +	if (!lim->virt_boundary_mask) {
> 		/*
> 		 * The maximum segment size has an odd historic 64k default that
> 		 * drivers probably should override.  Just like the I/O size we

Gr{oetje,eeting}s,

 						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
 							    -- Linus Torvalds
Ming Lei April 24, 2024, 12:41 p.m. UTC | #11
On Wed, Apr 24, 2024 at 12:26:26PM +0200, Geert Uytterhoeven wrote:
> 	Hi Ming,
> 
> On Sun, 7 Apr 2024, Ming Lei wrote:
> > When one stacking device is over one device with virt_boundary_mask and
> > another one with max segment size, the stacking device have both limits
> > set. This way is allowed before d690cb8ae14b ("block: add an API to
> > atomically update queue limits").
> > 
> > Relax the limit so that we won't break such kind of stacking setting.
> > 
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218687
> > Reported-by: janpieter.sollie@edpnet.be
> > Fixes: d690cb8ae14b ("block: add an API to atomically update queue limits")
> > Link: https://lore.kernel.org/linux-block/ZfGl8HzUpiOxCLm3@fedora/
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Mike Snitzer <snitzer@kernel.org>
> > Cc: dm-devel@lists.linux.dev
> > Cc: Song Liu <song@kernel.org>
> > Cc: linux-raid@vger.kernel.org
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> Thanks for your patch, which is now commit b561ea56a26415bf ("block:
> allow device to have both virt_boundary_mask and max segment size") in
> v6.9-rc4.
> 
> With CONFIG_DMA_API_DEBUG_SG=y and IOMMU support enabled, this causes a
> warning on R-Car Gen3/Gen4 platforms:
> 
>     DMA-API: renesas_sdhi_internal_dmac ee160000.mmc: mapping sg segment longer than device claims to support [len=86016] [max=65536]
>     WARNING: CPU: 1 PID: 281 at kernel/dma/debug.c:1178 debug_dma_map_sg+0x2ac/0x330
>     Modules linked in:
>     CPU: 1 PID: 281 Comm: systemd-udevd Tainted: G        W          6.9.0-rc2-ebisu-00012-gb561ea56a264 #596
>     Hardware name: Renesas Ebisu board based on r8a77990 (DT)
>     pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>     pc : debug_dma_map_sg+0x2ac/0x330
>     lr : debug_dma_map_sg+0x2ac/0x330
>     sp : ffffffc083643470
>     x29: ffffffc083643470 x28: 0000000000000080 x27: 0000000000010000
>     x26: 0000000000000000 x25: 0000000000000001 x24: ffffffc0810afc30
>     x23: ffffffffffffffff x22: ffffffc080c8366f x21: ffffff8008849f80
>     x20: ffffff800cd24000 x19: ffffff80099a2810 x18: 0000000000000000
>     x17: ffffff800801a000 x16: ffffffc080453f00 x15: ffffffc0836430f0
>     x14: ffffffc08099fb50 x13: 0000000000000007 x12: 0000000000000000
>     x11: 0000000000000202 x10: ffffffc0810d99d0 x9 : ffffffc081189bb0
>     x8 : ffffffc083643178 x7 : ffffffc083643180 x6 : 00000000ffffdfff
>     x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
>     x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff800ebad280
>     Call trace:
>      debug_dma_map_sg+0x2ac/0x330
>      __dma_map_sg_attrs+0xcc/0xd0
>      dma_map_sg_attrs+0xc/0x1c
>      renesas_sdhi_internal_dmac_map+0x64/0x94
>      renesas_sdhi_internal_dmac_pre_req+0x20/0x2c
>      mmc_blk_mq_issue_rq+0x62c/0x6c8
>      mmc_mq_queue_rq+0x194/0x218
>      blk_mq_dispatch_rq_list+0x36c/0x4d4
>      __blk_mq_sched_dispatch_requests+0x344/0x4e0
>      blk_mq_sched_dispatch_requests+0x28/0x5c
>      blk_mq_run_hw_queue+0x1a4/0x218
>      blk_mq_flush_plug_list+0x2fc/0x4a0
>      __blk_flush_plug+0x70/0x134
>      blk_finish_plug+0x24/0x34
>      read_pages+0x60/0x158
>      page_cache_ra_unbounded+0x98/0x184
>      do_page_cache_ra+0x44/0x50
>      force_page_cache_ra+0x98/0x9c
>      page_cache_sync_ra+0x30/0x54
>      filemap_get_pages+0xfc/0x4f8
>      filemap_read+0xe8/0x2b8
>      blkdev_read_iter+0x12c/0x144
>      vfs_read+0x104/0x150
>      ksys_read+0x6c/0xd4
>      __arm64_sys_read+0x14/0x1c
>      invoke_syscall+0x70/0xf4
>      el0_svc_common.constprop.0+0xb0/0xcc
>      do_el0_svc+0x1c/0x24
>      el0_svc+0x34/0x8c
>      el0t_64_sync_handler+0x88/0x124
>      el0t_64_sync+0x150/0x154
>     irq event stamp: 0
>     hardirqs last  enabled at (0): [<0000000000000000>] 0x0
>     hardirqs last disabled at (0): [<ffffffc08007efa4>] copy_process+0x6ac/0x18d4
>     softirqs last  enabled at (0): [<ffffffc08007efa4>] copy_process+0x6ac/0x18d4
>     softirqs last disabled at (0): [<0000000000000000>] 0x0
> 
> Reverting this commit, or disabling IOMMU support fixes the issue.

Hello Geert,

Can you test the following patch?


diff --git a/block/blk-settings.c b/block/blk-settings.c
index 8e1d7ed52fef..ebba05a2bc7f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -188,7 +188,10 @@ static int blk_validate_limits(struct queue_limits *lim)
 	 * bvec and lower layer bio splitting is supposed to handle the two
 	 * correctly.
 	 */
-	if (!lim->virt_boundary_mask) {
+	if (lim->virt_boundary_mask) {
+		if (!lim->max_segment_size)
+			lim->max_segment_size = UINT_MAX;
+	} else {
 		/*
 		 * The maximum segment size has an odd historic 64k default that
 		 * drivers probably should override.  Just like the I/O size we

thanks
Ming
Geert Uytterhoeven April 24, 2024, 1:09 p.m. UTC | #12
Hi Ming,

On Wed, Apr 24, 2024 at 2:41 PM Ming Lei <ming.lei@redhat.com> wrote:
> On Wed, Apr 24, 2024 at 12:26:26PM +0200, Geert Uytterhoeven wrote:
> > On Sun, 7 Apr 2024, Ming Lei wrote:
> > > When one stacking device is over one device with virt_boundary_mask and
> > > another one with max segment size, the stacking device have both limits
> > > set. This way is allowed before d690cb8ae14b ("block: add an API to
> > > atomically update queue limits").
> > >
> > > Relax the limit so that we won't break such kind of stacking setting.
> > >
> > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218687
> > > Reported-by: janpieter.sollie@edpnet.be
> > > Fixes: d690cb8ae14b ("block: add an API to atomically update queue limits")
> > > Link: https://lore.kernel.org/linux-block/ZfGl8HzUpiOxCLm3@fedora/
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Mike Snitzer <snitzer@kernel.org>
> > > Cc: dm-devel@lists.linux.dev
> > > Cc: Song Liu <song@kernel.org>
> > > Cc: linux-raid@vger.kernel.org
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >
> > Thanks for your patch, which is now commit b561ea56a26415bf ("block:
> > allow device to have both virt_boundary_mask and max segment size") in
> > v6.9-rc4.
> >
> > With CONFIG_DMA_API_DEBUG_SG=y and IOMMU support enabled, this causes a
> > warning on R-Car Gen3/Gen4 platforms:
> >
> >     DMA-API: renesas_sdhi_internal_dmac ee160000.mmc: mapping sg segment longer than device claims to support [len=86016] [max=65536]
> >     WARNING: CPU: 1 PID: 281 at kernel/dma/debug.c:1178 debug_dma_map_sg+0x2ac/0x330
> >
> > Reverting this commit, or disabling IOMMU support fixes the issue.
>
> Can you test the following patch?
>
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 8e1d7ed52fef..ebba05a2bc7f 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -188,7 +188,10 @@ static int blk_validate_limits(struct queue_limits *lim)
>          * bvec and lower layer bio splitting is supposed to handle the two
>          * correctly.
>          */
> -       if (!lim->virt_boundary_mask) {
> +       if (lim->virt_boundary_mask) {
> +               if (!lim->max_segment_size)
> +                       lim->max_segment_size = UINT_MAX;
> +       } else {
>                 /*
>                  * The maximum segment size has an odd historic 64k default that
>                  * drivers probably should override.  Just like the I/O size we

Thanks, that works for me (both with/without IOMMU support)!
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert
diff mbox series

Patch

diff --git a/block/blk-settings.c b/block/blk-settings.c
index cdbaef159c4b..d2731843f2fc 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -182,17 +182,13 @@  static int blk_validate_limits(struct queue_limits *lim)
 		return -EINVAL;
 
 	/*
-	 * Devices that require a virtual boundary do not support scatter/gather
-	 * I/O natively, but instead require a descriptor list entry for each
-	 * page (which might not be identical to the Linux PAGE_SIZE).  Because
-	 * of that they are not limited by our notion of "segment size".
+	 * Stacking device may have both virtual boundary and max segment
+	 * size limit, so allow this setting now, and long-term the two
+	 * might need to move out of stacking limits since we have immutable
+	 * bvec and lower layer bio splitting is supposed to handle the two
+	 * correctly.
 	 */
-	if (lim->virt_boundary_mask) {
-		if (WARN_ON_ONCE(lim->max_segment_size &&
-				 lim->max_segment_size != UINT_MAX))
-			return -EINVAL;
-		lim->max_segment_size = UINT_MAX;
-	} else {
+	if (!lim->virt_boundary_mask) {
 		/*
 		 * The maximum segment size has an odd historic 64k default that
 		 * drivers probably should override.  Just like the I/O size we