diff mbox series

[V2] dm-raid: fix updating of max_discard_sectors limit

Message ID 20190911113133.837-1-ming.lei@redhat.com (mailing list archive)
State Accepted, archived
Delegated to: Mike Snitzer
Headers show
Series [V2] dm-raid: fix updating of max_discard_sectors limit | expand

Commit Message

Ming Lei Sept. 11, 2019, 11:31 a.m. UTC
Unit of 'chunk_size' is byte, instead of sector, so fix it.

Without this fix, too big max_discard_sectors is applied on the request queue
of dm-raid, finally raid code has to split the bio again.

This re-split done by raid causes the following nested clone_endio:

1) one big bio 'A' is submitted to dm queue, and served as the original
bio

2) one new bio 'B' is cloned from the original bio 'A', and .map()
is run on this bio of 'B', and B's original bio points to 'A'

3) raid code sees that 'B' is too big, and split 'B' and re-submit
the remainded part of 'B' to dm-raid queue via generic_make_request().

4) now dm will hanlde 'B' as new original bio, then allocate a new
clone bio of 'C' and run .map() on 'C'. Meantime C's original bio
points to 'B'.

5) suppose now 'C' is completed by raid direclty, then the following
clone_endio() is called recursively:

	clone_endio(C)
		->clone_endio(B)		#B is original bio of 'C'
			->bio_endio(A)

'A' can be big enough to make handreds of nested clone_endio(), then
stack can be corrupted easily.

Cc: <stable@vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
V2:
	- fix commit log a bit

 drivers/md/dm-raid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Mike Snitzer Sept. 11, 2019, 1:35 p.m. UTC | #1
On Wed, Sep 11 2019 at  7:31am -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> Unit of 'chunk_size' is byte, instead of sector, so fix it.
> 
> Without this fix, too big max_discard_sectors is applied on the request queue
> of dm-raid, finally raid code has to split the bio again.
> 
> This re-split done by raid causes the following nested clone_endio:
> 
> 1) one big bio 'A' is submitted to dm queue, and served as the original
> bio
> 
> 2) one new bio 'B' is cloned from the original bio 'A', and .map()
> is run on this bio of 'B', and B's original bio points to 'A'
> 
> 3) raid code sees that 'B' is too big, and split 'B' and re-submit
> the remainded part of 'B' to dm-raid queue via generic_make_request().
> 
> 4) now dm will hanlde 'B' as new original bio, then allocate a new
> clone bio of 'C' and run .map() on 'C'. Meantime C's original bio
> points to 'B'.
> 
> 5) suppose now 'C' is completed by raid direclty, then the following
> clone_endio() is called recursively:
> 
> 	clone_endio(C)
> 		->clone_endio(B)		#B is original bio of 'C'
> 			->bio_endio(A)
> 
> 'A' can be big enough to make handreds of nested clone_endio(), then
> stack can be corrupted easily.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> V2:
> 	- fix commit log a bit
> 
>  drivers/md/dm-raid.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index 8a60a4a070ac..c26aa4e8207a 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3749,7 +3749,7 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits)
>  	 */
>  	if (rs_is_raid1(rs) || rs_is_raid10(rs)) {
>  		limits->discard_granularity = chunk_size;
> -		limits->max_discard_sectors = chunk_size;
> +		limits->max_discard_sectors = chunk_size >> 9;
>  	}
>  }
>  
> -- 
> 2.20.1
> 

Thanks a lot Ming!  But oof, really embarassing oversight on my part!

FYI, I added a "Fixes:" tag to the commit header and switched to
shifting by SECTOR_SHIFT instead of 9, staged commit for 5.4 is here:

https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=509818079bf1fefff4ed02d6a1b994e20efc0480

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Sept. 11, 2019, 1:43 p.m. UTC | #2
On Wed, Sep 11 2019 at  9:35am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
 
> FYI, I added a "Fixes:" tag to the commit header and switched to
> shifting by SECTOR_SHIFT instead of 9, staged commit for 5.4 is here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=509818079bf1fefff4ed02d6a1b994e20efc0480

I just fixed a few typos in your patch header, so updated commit is:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=2b7aa6bde032c56e00245ed438922529251c9e13

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
John Stoffel Sept. 11, 2019, 6:59 p.m. UTC | #3
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> On Wed, Sep 11 2019 at  7:31am -0400,
Mike> Ming Lei <ming.lei@redhat.com> wrote:

>> Unit of 'chunk_size' is byte, instead of sector, so fix it.
>> 
>> Without this fix, too big max_discard_sectors is applied on the request queue
>> of dm-raid, finally raid code has to split the bio again.
>> 
>> This re-split done by raid causes the following nested clone_endio:
>> 
>> 1) one big bio 'A' is submitted to dm queue, and served as the original
>> bio
>> 
>> 2) one new bio 'B' is cloned from the original bio 'A', and .map()
>> is run on this bio of 'B', and B's original bio points to 'A'
>> 
>> 3) raid code sees that 'B' is too big, and split 'B' and re-submit
>> the remainded part of 'B' to dm-raid queue via generic_make_request().
>> 
>> 4) now dm will hanlde 'B' as new original bio, then allocate a new
>> clone bio of 'C' and run .map() on 'C'. Meantime C's original bio
>> points to 'B'.
>> 
>> 5) suppose now 'C' is completed by raid direclty, then the following
>> clone_endio() is called recursively:
>> 
>> clone_endio(C)
-> clone_endio(B)		#B is original bio of 'C'
-> bio_endio(A)
>> 
>> 'A' can be big enough to make handreds of nested clone_endio(), then
>> stack can be corrupted easily.
>> 
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>> ---
>> V2:
>> - fix commit log a bit
>> 
>> drivers/md/dm-raid.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>> index 8a60a4a070ac..c26aa4e8207a 100644
>> --- a/drivers/md/dm-raid.c
>> +++ b/drivers/md/dm-raid.c
>> @@ -3749,7 +3749,7 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits)
>> */
>> if (rs_is_raid1(rs) || rs_is_raid10(rs)) {
limits-> discard_granularity = chunk_size;
>> -		limits->max_discard_sectors = chunk_size;
>> +		limits->max_discard_sectors = chunk_size >> 9;
>> }
>> }
>> 
>> -- 
>> 2.20.1
>> 

Mike> Thanks a lot Ming!  But oof, really embarassing oversight on my part!

Mike> FYI, I added a "Fixes:" tag to the commit header and switched to
Mike> shifting by SECTOR_SHIFT instead of 9, staged commit for 5.4 is here:

Mike> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=509818079bf1fefff4ed02d6a1b994e20efc0480

Mike> --
Mike> dm-devel mailing list
Mike> dm-devel@redhat.com
Mike> https://www.redhat.com/mailman/listinfo/dm-devel



Would it make sense to re-name the variable to chunk_size_bytes as
well?  

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Sasha Levin Sept. 12, 2019, 7:30 a.m. UTC | #4
Hi,

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.2.14, v4.19.72, v4.14.143, v4.9.192, v4.4.192.

v5.2.14: Build OK!
v4.19.72: Failed to apply! Possible dependencies:
    53b471687012 ("dm: remove indirect calls from __send_changing_extent_only()")
    61697a6abd24 ("dm: eliminate 'split_discard_bios' flag from DM target interface")

v4.14.143: Failed to apply! Possible dependencies:
    00716545c894 ("dm: add support for secure erase forwarding")
    0519c71e8d46 ("dm: backfill abnormal IO support to non-splitting IO submission")
    0776aa0e30aa ("dm: ensure bio-based DM's bioset and io_pool support targets' maximum IOs")
    18a25da84354 ("dm: ensure bio submission follows a depth-first tree walk")
    318716ddea08 ("dm: safely allocate multiple bioset bios")
    3d7f45625a84 ("dm: fix __send_changing_extent_only() to send first bio and chain remainder")
    53b471687012 ("dm: remove indirect calls from __send_changing_extent_only()")
    552aa679f265 ("dm raid: use rs_is_raid*()")
    61697a6abd24 ("dm: eliminate 'split_discard_bios' flag from DM target interface")
    64f52b0e3148 ("dm: improve performance by moving dm_io structure to per-bio-data")
    745dc570b2c3 ("dm: rename 'bio' member of dm_io structure to 'orig_bio'")
    978e51ba38e0 ("dm: optimize bio-based NVMe IO submission")
    f31c21e4365c ("dm: remove unused 'num_write_bios' target interface")

v4.9.192: Failed to apply! Possible dependencies:
    124d6db07c3b ("nbd: use our own workqueue for recv threads")
    19372e276917 ("loop: implement REQ_OP_WRITE_ZEROES")
    3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
    48920ff2a5a9 ("block: remove the discard_zeroes_data flag")
    552aa679f265 ("dm raid: use rs_is_raid*()")
    61697a6abd24 ("dm: eliminate 'split_discard_bios' flag from DM target interface")
    7ab84db64f11 ("dm integrity: improve the Kconfig help text for DM_INTEGRITY")
    9561a7ade0c2 ("nbd: add multi-connection support")
    b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices")

v4.4.192: Failed to apply! Possible dependencies:
    33e53f06850f ("dm raid: introduce extended superblock and new raid types to support takeover/reshaping")
    4c9971ca6a17 ("dm raid: make sure no feature flags are set in metadata")
    552aa679f265 ("dm raid: use rs_is_raid*()")
    61697a6abd24 ("dm: eliminate 'split_discard_bios' flag from DM target interface")
    676fa5ad6e96 ("dm raid: use rt_is_raid*() in all appropriate checks")
    702108d194e3 ("dm raid: cleanup / provide infrastructure")
    73c6f239a862 ("dm raid: rename variable 'ret' to 'r' to conform to other dm code")
    92c83d79b07e ("dm raid: use dm_arg_set API in constructor")
    f090279eaff8 ("dm raid: check constructor arguments for invalid raid level/argument combinations")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

--
Thanks,
Sasha

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 8a60a4a070ac..c26aa4e8207a 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3749,7 +3749,7 @@  static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits)
 	 */
 	if (rs_is_raid1(rs) || rs_is_raid10(rs)) {
 		limits->discard_granularity = chunk_size;
-		limits->max_discard_sectors = chunk_size;
+		limits->max_discard_sectors = chunk_size >> 9;
 	}
 }