diff mbox series

block/mq-deadline: Fix WARN when set async_depth by sysfs

Message ID 1711680261-5789-1-git-send-email-zhiguo.niu@unisoc.com (mailing list archive)
State New, archived
Headers show
Series block/mq-deadline: Fix WARN when set async_depth by sysfs | expand

Commit Message

Zhiguo Niu March 29, 2024, 2:44 a.m. UTC
A WARN may occur when async_depth is set from user by sysfs,
the warning log is as following:

[  623.848659] WARNING: CPU: 0 PID: 7798 at lib/sbitmap.c:537 sbitmap_queue_get_shallow+0x2c/0x38
[  623.878550] CPU: 0 PID: 7798 Comm: kworker/u16:2 Tainted: G        W  OE      6.6.0-mainline-g8d9254e6f4a0-dirty-ab000013 #1
[  623.880091] Hardware name: Unisoc UMS9621-base Board (DT)
[  623.880906] Workqueue: writeback wb_workfn (flush-254:48)
[  623.881748] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  623.882763] pc : sbitmap_queue_get_shallow+0x2c/0x38
[  623.883525] lr : __blk_mq_get_tag+0x50/0xd4
[  623.884198] sp : ffffffc08a073230
[  623.884745] x29: ffffffc08a073230 x28: ffffffc0821445e0 x27: 0000000000000000
[  623.885799] x26: ffffff8087de8000 x25: 0000000000000000 x24: 0000000000000002
[  623.886849] x23: ffffffc0820f2008 x22: ffffff8088ac3918 x21: ffffff808c358f10
[  623.887897] x20: ffffff808c358f00 x19: ffffffc08a073360 x18: ffffffc08bde70a8
[  623.888946] x17: 000000007e57c819 x16: 000000007e57c819 x15: fffffffdfe000000
[  623.889993] x14: 0000000000000001 x13: 0000000000000004 x12: 00000003c6c931ed
[  623.891038] x11: ffffff80939a3800 x10: ffffffc0801ac88c x9 : 0000000000000000
[  623.892086] x8 : 0000000000000006 x7 : 0000000000000000 x6 : ffffffc080765204
[  623.893131] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
[  623.894174] x2 : ffffffc080765224 x1 : 0000000000000005 x0 : ffffff808c358f10
[  623.895221] Call trace:
[  623.895660] sbitmap_queue_get_shallow+0x2c/0x38
[  623.896379] blk_mq_get_tag+0xa0/0x350
[  623.896992] __blk_mq_alloc_requests+0x218/0x300
[  623.897715] blk_mq_submit_bio+0x314/0x774
[  623.898369] __submit_bio+0xb4/0xe0
[  623.898950] submit_bio_noacct_nocheck+0x110/0x324
[  623.899692] submit_bio_noacct+0x278/0x3f8
[  623.900344] submit_bio+0xcc/0xe8
[  623.900900] f2fs_submit_write_bio+0x100/0x428
[  623.901605] __submit_merged_bio+0x74/0x1ac
[  623.902269] __submit_merged_write_cond+0x188/0x1f4
[  623.903022] f2fs_write_data_pages+0xb10/0xc2c
[  623.903727] do_writepages+0xf4/0x618
[  623.904332] __writeback_single_inode+0x78/0x60c
[  623.905055] writeback_sb_inodes+0x294/0x520
[  623.905734] __writeback_inodes_wb+0xa0/0xf4
[  623.906413] wb_writeback+0x188/0x4e8
[  623.907014] wb_workfn+0x420/0x608
[  623.907582] process_one_work+0x23c/0x55c
[  623.908227] worker_thread+0x2ac/0x3e4
[  623.908838] kthread+0x108/0x12c
[  623.909389] ret_from_fork+0x10/0x20

The rootcause is user may set async_depth to a value which is less
than its initial value from dd_init_hctx->dd_depth_updated, and this
initial value is set to sbq->min_shallow_depth, when async_depth is
modified by user from sysfs, sbq->min_shallow_depth will not be changed
simultaneously, and it is also not easy to obtain tag sbitmap information
in deadline_async_depth_store.

So a suitable value should be set to min_shallow_depth in dd_depth_updated.

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
---
 block/mq-deadline.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Damien Le Moal March 29, 2024, 3:40 a.m. UTC | #1
On 3/29/24 11:44, Zhiguo Niu wrote:
> A WARN may occur when async_depth is set from user by sysfs,
> the warning log is as following:
> 
> [  623.848659] WARNING: CPU: 0 PID: 7798 at lib/sbitmap.c:537 sbitmap_queue_get_shallow+0x2c/0x38
> [  623.878550] CPU: 0 PID: 7798 Comm: kworker/u16:2 Tainted: G        W  OE      6.6.0-mainline-g8d9254e6f4a0-dirty-ab000013 #1
> [  623.880091] Hardware name: Unisoc UMS9621-base Board (DT)
> [  623.880906] Workqueue: writeback wb_workfn (flush-254:48)
> [  623.881748] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  623.882763] pc : sbitmap_queue_get_shallow+0x2c/0x38
> [  623.883525] lr : __blk_mq_get_tag+0x50/0xd4
> [  623.884198] sp : ffffffc08a073230
> [  623.884745] x29: ffffffc08a073230 x28: ffffffc0821445e0 x27: 0000000000000000
> [  623.885799] x26: ffffff8087de8000 x25: 0000000000000000 x24: 0000000000000002
> [  623.886849] x23: ffffffc0820f2008 x22: ffffff8088ac3918 x21: ffffff808c358f10
> [  623.887897] x20: ffffff808c358f00 x19: ffffffc08a073360 x18: ffffffc08bde70a8
> [  623.888946] x17: 000000007e57c819 x16: 000000007e57c819 x15: fffffffdfe000000
> [  623.889993] x14: 0000000000000001 x13: 0000000000000004 x12: 00000003c6c931ed
> [  623.891038] x11: ffffff80939a3800 x10: ffffffc0801ac88c x9 : 0000000000000000
> [  623.892086] x8 : 0000000000000006 x7 : 0000000000000000 x6 : ffffffc080765204
> [  623.893131] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
> [  623.894174] x2 : ffffffc080765224 x1 : 0000000000000005 x0 : ffffff808c358f10
> [  623.895221] Call trace:
> [  623.895660] sbitmap_queue_get_shallow+0x2c/0x38
> [  623.896379] blk_mq_get_tag+0xa0/0x350
> [  623.896992] __blk_mq_alloc_requests+0x218/0x300
> [  623.897715] blk_mq_submit_bio+0x314/0x774
> [  623.898369] __submit_bio+0xb4/0xe0
> [  623.898950] submit_bio_noacct_nocheck+0x110/0x324
> [  623.899692] submit_bio_noacct+0x278/0x3f8
> [  623.900344] submit_bio+0xcc/0xe8
> [  623.900900] f2fs_submit_write_bio+0x100/0x428
> [  623.901605] __submit_merged_bio+0x74/0x1ac
> [  623.902269] __submit_merged_write_cond+0x188/0x1f4
> [  623.903022] f2fs_write_data_pages+0xb10/0xc2c
> [  623.903727] do_writepages+0xf4/0x618
> [  623.904332] __writeback_single_inode+0x78/0x60c
> [  623.905055] writeback_sb_inodes+0x294/0x520
> [  623.905734] __writeback_inodes_wb+0xa0/0xf4
> [  623.906413] wb_writeback+0x188/0x4e8
> [  623.907014] wb_workfn+0x420/0x608
> [  623.907582] process_one_work+0x23c/0x55c
> [  623.908227] worker_thread+0x2ac/0x3e4
> [  623.908838] kthread+0x108/0x12c
> [  623.909389] ret_from_fork+0x10/0x20
> 
> The rootcause is user may set async_depth to a value which is less
> than its initial value from dd_init_hctx->dd_depth_updated, and this
> initial value is set to sbq->min_shallow_depth, when async_depth is
> modified by user from sysfs, sbq->min_shallow_depth will not be changed
> simultaneously, and it is also not easy to obtain tag sbitmap information
> in deadline_async_depth_store.
> 
> So a suitable value should be set to min_shallow_depth in dd_depth_updated.
> 
> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> ---
>  block/mq-deadline.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 02a916b..89c516e 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -646,10 +646,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
>  	struct request_queue *q = hctx->queue;
>  	struct deadline_data *dd = q->elevator->elevator_data;
>  	struct blk_mq_tags *tags = hctx->sched_tags;
> +	unsigned int shift = tags->bitmap_tags.sb.shift;
> +	unsigned int dd_min_depth = max(1U, 3 * (1U << shift)  / 4);

Extra blank space before "/".
That division could also be replaced with ">> 2".

>  
>  	dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
>  
> -	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
> +	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd_min_depth);
>  }
>  
>  /* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
Zhiguo Niu March 29, 2024, 9:43 a.m. UTC | #2
On Fri, Mar 29, 2024 at 11:40 AM Damien Le Moal <dlemoal@kernel.org> wrote:
>
> On 3/29/24 11:44, Zhiguo Niu wrote:
> > A WARN may occur when async_depth is set from user by sysfs,
> > the warning log is as following:
> >
> > [  623.848659] WARNING: CPU: 0 PID: 7798 at lib/sbitmap.c:537 sbitmap_queue_get_shallow+0x2c/0x38
> > [  623.878550] CPU: 0 PID: 7798 Comm: kworker/u16:2 Tainted: G        W  OE      6.6.0-mainline-g8d9254e6f4a0-dirty-ab000013 #1
> > [  623.880091] Hardware name: Unisoc UMS9621-base Board (DT)
> > [  623.880906] Workqueue: writeback wb_workfn (flush-254:48)
> > [  623.881748] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [  623.882763] pc : sbitmap_queue_get_shallow+0x2c/0x38
> > [  623.883525] lr : __blk_mq_get_tag+0x50/0xd4
> > [  623.884198] sp : ffffffc08a073230
> > [  623.884745] x29: ffffffc08a073230 x28: ffffffc0821445e0 x27: 0000000000000000
> > [  623.885799] x26: ffffff8087de8000 x25: 0000000000000000 x24: 0000000000000002
> > [  623.886849] x23: ffffffc0820f2008 x22: ffffff8088ac3918 x21: ffffff808c358f10
> > [  623.887897] x20: ffffff808c358f00 x19: ffffffc08a073360 x18: ffffffc08bde70a8
> > [  623.888946] x17: 000000007e57c819 x16: 000000007e57c819 x15: fffffffdfe000000
> > [  623.889993] x14: 0000000000000001 x13: 0000000000000004 x12: 00000003c6c931ed
> > [  623.891038] x11: ffffff80939a3800 x10: ffffffc0801ac88c x9 : 0000000000000000
> > [  623.892086] x8 : 0000000000000006 x7 : 0000000000000000 x6 : ffffffc080765204
> > [  623.893131] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
> > [  623.894174] x2 : ffffffc080765224 x1 : 0000000000000005 x0 : ffffff808c358f10
> > [  623.895221] Call trace:
> > [  623.895660] sbitmap_queue_get_shallow+0x2c/0x38
> > [  623.896379] blk_mq_get_tag+0xa0/0x350
> > [  623.896992] __blk_mq_alloc_requests+0x218/0x300
> > [  623.897715] blk_mq_submit_bio+0x314/0x774
> > [  623.898369] __submit_bio+0xb4/0xe0
> > [  623.898950] submit_bio_noacct_nocheck+0x110/0x324
> > [  623.899692] submit_bio_noacct+0x278/0x3f8
> > [  623.900344] submit_bio+0xcc/0xe8
> > [  623.900900] f2fs_submit_write_bio+0x100/0x428
> > [  623.901605] __submit_merged_bio+0x74/0x1ac
> > [  623.902269] __submit_merged_write_cond+0x188/0x1f4
> > [  623.903022] f2fs_write_data_pages+0xb10/0xc2c
> > [  623.903727] do_writepages+0xf4/0x618
> > [  623.904332] __writeback_single_inode+0x78/0x60c
> > [  623.905055] writeback_sb_inodes+0x294/0x520
> > [  623.905734] __writeback_inodes_wb+0xa0/0xf4
> > [  623.906413] wb_writeback+0x188/0x4e8
> > [  623.907014] wb_workfn+0x420/0x608
> > [  623.907582] process_one_work+0x23c/0x55c
> > [  623.908227] worker_thread+0x2ac/0x3e4
> > [  623.908838] kthread+0x108/0x12c
> > [  623.909389] ret_from_fork+0x10/0x20
> >
> > The rootcause is user may set async_depth to a value which is less
> > than its initial value from dd_init_hctx->dd_depth_updated, and this
> > initial value is set to sbq->min_shallow_depth, when async_depth is
> > modified by user from sysfs, sbq->min_shallow_depth will not be changed
> > simultaneously, and it is also not easy to obtain tag sbitmap information
> > in deadline_async_depth_store.
> >
> > So a suitable value should be set to min_shallow_depth in dd_depth_updated.
> >
> > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> > ---
> >  block/mq-deadline.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> > index 02a916b..89c516e 100644
> > --- a/block/mq-deadline.c
> > +++ b/block/mq-deadline.c
> > @@ -646,10 +646,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
> >       struct request_queue *q = hctx->queue;
> >       struct deadline_data *dd = q->elevator->elevator_data;
> >       struct blk_mq_tags *tags = hctx->sched_tags;
> > +     unsigned int shift = tags->bitmap_tags.sb.shift;
> > +     unsigned int dd_min_depth = max(1U, 3 * (1U << shift)  / 4);
>
> Extra blank space before "/".
Hi Damien Le Moal
Thank you for this detailed review, I will fix it with suggestions
from other reviewers.
> That division could also be replaced with ">> 2".
yes, I just refer to the original code "dd->async_depth = max(1UL, 3 *
q->nr_requests / 4);"
thanks!
>
> >
> >       dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
> >
> > -     sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
> > +     sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd_min_depth);
> >  }
> >
> >  /* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
>
> --
> Damien Le Moal
> Western Digital Research
>
Bart Van Assche March 29, 2024, 6:08 p.m. UTC | #3
On 3/28/24 7:44 PM, Zhiguo Niu wrote:
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 02a916b..89c516e 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -646,10 +646,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
>   	struct request_queue *q = hctx->queue;
>   	struct deadline_data *dd = q->elevator->elevator_data;
>   	struct blk_mq_tags *tags = hctx->sched_tags;
> +	unsigned int shift = tags->bitmap_tags.sb.shift;
> +	unsigned int dd_min_depth = max(1U, 3 * (1U << shift)  / 4);
>   
>   	dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
>   
> -	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
> +	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd_min_depth);
>   }

The above patch sets min_shallow_depth to the same value as commit
d47f9717e5cf ("block/mq-deadline: use correct way to throttling write
requests"). That commit got reverted because it was causing performance
problems. So the above patch reintroduces the performance problem that
has been fixed by commit 256aab46e316 ("Revert "block/mq-deadline: use
correct way to throttling write requests"").

Thank you for attempting to reintroduce a problem that just got fixed
without even mentioning that this is an attempt to reintroduce a
performance problem.

Bart.
Zhiguo Niu April 1, 2024, 12:58 a.m. UTC | #4
On Sat, Mar 30, 2024 at 2:08 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 3/28/24 7:44 PM, Zhiguo Niu wrote:
> > diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> > index 02a916b..89c516e 100644
> > --- a/block/mq-deadline.c
> > +++ b/block/mq-deadline.c
> > @@ -646,10 +646,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
> >       struct request_queue *q = hctx->queue;
> >       struct deadline_data *dd = q->elevator->elevator_data;
> >       struct blk_mq_tags *tags = hctx->sched_tags;
> > +     unsigned int shift = tags->bitmap_tags.sb.shift;
> > +     unsigned int dd_min_depth = max(1U, 3 * (1U << shift)  / 4);
> >
> >       dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
> >
> > -     sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
> > +     sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd_min_depth);
> >   }
>
> The above patch sets min_shallow_depth to the same value as commit
> d47f9717e5cf ("block/mq-deadline: use correct way to throttling write
> requests"). That commit got reverted because it was causing performance
> problems. So the above patch reintroduces the performance problem that
> has been fixed by commit 256aab46e316 ("Revert "block/mq-deadline: use
> correct way to throttling write requests"").
Hi Bart Van Assche,

This  patch only modifies the initial minimum value of
min_shallow_depth and does not change "dd->async_depth",
so it will not cause performance problems like the previous patch
(d47f9717e5cf ("block/mq-deadline: use correct way to throttling write
> requests")).
>
> Thank you for attempting to reintroduce a problem that just got fixed
> without even mentioning that this is an attempt to reintroduce a
> performance problem.

So what are your suggestions for fixing the warning shown in commit
msg if dd->async_depth is set by the user from sysfs?
thanks
>
> Bart.
>
>
Bart Van Assche April 1, 2024, 9:23 p.m. UTC | #5
On 3/31/24 17:58, Zhiguo Niu wrote:
> On Sat, Mar 30, 2024 at 2:08 AM Bart Van Assche <bvanassche@acm.org> wrote:
>>
>> On 3/28/24 7:44 PM, Zhiguo Niu wrote:
>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>>> index 02a916b..89c516e 100644
>>> --- a/block/mq-deadline.c
>>> +++ b/block/mq-deadline.c
>>> @@ -646,10 +646,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
>>>        struct request_queue *q = hctx->queue;
>>>        struct deadline_data *dd = q->elevator->elevator_data;
>>>        struct blk_mq_tags *tags = hctx->sched_tags;
>>> +     unsigned int shift = tags->bitmap_tags.sb.shift;
>>> +     unsigned int dd_min_depth = max(1U, 3 * (1U << shift)  / 4);
>>>
>>>        dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
>>>
>>> -     sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
>>> +     sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd_min_depth);
>>>    }
>>
>> The above patch sets min_shallow_depth to the same value as commit
>> d47f9717e5cf ("block/mq-deadline: use correct way to throttling write
>> requests"). That commit got reverted because it was causing performance
>> problems. So the above patch reintroduces the performance problem that
>> has been fixed by commit 256aab46e316 ("Revert "block/mq-deadline: use
>> correct way to throttling write requests"").
> Hi Bart Van Assche,
> 
> This  patch only modifies the initial minimum value of
> min_shallow_depth and does not change "dd->async_depth",
> so it will not cause performance problems like the previous patch
> (d47f9717e5cf ("block/mq-deadline: use correct way to throttling write
> requests")).

Oops, I misread your patch. After having taken another look, my
conclusions are as follows:
* sbitmap_queue_min_shallow_depth() is called. This causes
   sbq->wake_batch to be modified but I don't think that it is a proper
   fix for dd_limit_depth().
* dd_limit_depth() still assigns a number in the range 1..nr_requests to
   data->shallow_depth while a number in the range 1..(1<<bt->sb.shift)
   should be assigned.

> So what are your suggestions for fixing the warning shown in commit
> msg if dd->async_depth is set by the user from sysfs?
> thanks

How about the two untested patches below?

Thanks,

Bart.


Subject: [PATCH 1/2] block: Call .limit_depth() after .hctx has been set

Prepare for using .hctx in dd_limit_depth().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
  block/blk-mq.c | 8 +++++---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 34060d885c5a..d0db9252bb71 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -435,6 +435,7 @@ __blk_mq_alloc_requests_batch(struct 
blk_mq_alloc_data *data)
  static struct request *__blk_mq_alloc_requests(struct 
blk_mq_alloc_data *data)
  {
  	struct request_queue *q = data->q;
+	struct elevator_mq_ops *ops = NULL;
  	u64 alloc_time_ns = 0;
  	struct request *rq;
  	unsigned int tag;
@@ -459,13 +460,11 @@ static struct request 
*__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
  		 */
  		if ((data->cmd_flags & REQ_OP_MASK) != REQ_OP_FLUSH &&
  		    !blk_op_is_passthrough(data->cmd_flags)) {
-			struct elevator_mq_ops *ops = &q->elevator->type->ops;
+			ops = &q->elevator->type->ops;

  			WARN_ON_ONCE(data->flags & BLK_MQ_REQ_RESERVED);

  			data->rq_flags |= RQF_USE_SCHED;
-			if (ops->limit_depth)
-				ops->limit_depth(data->cmd_flags, data);
  		}
  	}

@@ -478,6 +477,9 @@ static struct request 
*__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
  	if (data->flags & BLK_MQ_REQ_RESERVED)
  		data->rq_flags |= RQF_RESV;

+	if (ops && ops->limit_depth)
+		ops->limit_depth(data->cmd_flags, data);
+
  	/*
  	 * Try batched alloc if we want more than 1 tag.
  	 */



Subject: [PATCH 2/2] block/mq-deadline: Fix the tag reservation code

Fixes: 07757588e507 ("block/mq-deadline: Reserve 25% of scheduler tags 
for synchronous requests")
---
  block/mq-deadline.c | 18 ++++++++++++++++--
  1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 02a916ba62ee..8e780069d91b 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -621,6 +621,20 @@ static struct request *dd_dispatch_request(struct 
blk_mq_hw_ctx *hctx)
  	return rq;
  }

+/*
+ * 'depth' is a number in the range 0..q->nr_requests. Convert it to a 
number
+ * in the range 0..(1 << bt->sb.shift) since that is the range expected by
+ * sbitmap_get_shallow().
+ */
+static int dd_to_word_depth(struct blk_mq_hw_ctx *hctx, unsigned int 
qdepth)
+{
+	struct sbitmap_queue *bt = &hctx->sched_tags->bitmap_tags;
+	const unsigned int nrr = hctx->queue->nr_requests;
+
+	return max(((qdepth << bt->sb.shift) + nrr - 1) / nrr,
+		   bt->min_shallow_depth);
+}
+
  /*
   * Called by __blk_mq_alloc_request(). The shallow_depth value set by this
   * function is used by __blk_mq_get_tag().
@@ -637,7 +651,7 @@ static void dd_limit_depth(blk_opf_t opf, struct 
blk_mq_alloc_data *data)
  	 * Throttle asynchronous requests and writes such that these requests
  	 * do not block the allocation of synchronous requests.
  	 */
-	data->shallow_depth = dd->async_depth;
+	data->shallow_depth = dd_to_word_depth(data->hctx, dd->async_depth);
  }

  /* Called by blk_mq_update_nr_requests(). */
@@ -649,7 +663,7 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)

  	dd->async_depth = max(1UL, 3 * q->nr_requests / 4);

-	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
+	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, 1);
  }

  /* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
Zhiguo Niu April 2, 2024, 5:44 a.m. UTC | #6
On Tue, Apr 2, 2024 at 5:23 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 3/31/24 17:58, Zhiguo Niu wrote:
> > On Sat, Mar 30, 2024 at 2:08 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >>
> >> On 3/28/24 7:44 PM, Zhiguo Niu wrote:
> >>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> >>> index 02a916b..89c516e 100644
> >>> --- a/block/mq-deadline.c
> >>> +++ b/block/mq-deadline.c
> >>> @@ -646,10 +646,12 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
> >>>        struct request_queue *q = hctx->queue;
> >>>        struct deadline_data *dd = q->elevator->elevator_data;
> >>>        struct blk_mq_tags *tags = hctx->sched_tags;
> >>> +     unsigned int shift = tags->bitmap_tags.sb.shift;
> >>> +     unsigned int dd_min_depth = max(1U, 3 * (1U << shift)  / 4);
> >>>
> >>>        dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
> >>>
> >>> -     sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
> >>> +     sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd_min_depth);
> >>>    }
> >>
> >> The above patch sets min_shallow_depth to the same value as commit
> >> d47f9717e5cf ("block/mq-deadline: use correct way to throttling write
> >> requests"). That commit got reverted because it was causing performance
> >> problems. So the above patch reintroduces the performance problem that
> >> has been fixed by commit 256aab46e316 ("Revert "block/mq-deadline: use
> >> correct way to throttling write requests"").
> > Hi Bart Van Assche,
> >
> > This  patch only modifies the initial minimum value of
> > min_shallow_depth and does not change "dd->async_depth",
> > so it will not cause performance problems like the previous patch
> > (d47f9717e5cf ("block/mq-deadline: use correct way to throttling write
> > requests")).
>
Hi Bart Van Assche,
thanks for your reply.

> Oops, I misread your patch. After having taken another look, my
> conclusions are as follows:
> * sbitmap_queue_min_shallow_depth() is called. This causes
>    sbq->wake_batch to be modified but I don't think that it is a proper
>    fix for dd_limit_depth().
yes, it will affect sbq->wake_batch,  But judging from the following code:
wake_batch = clamp_t(unsigned int, depth / SBQ_WAIT_QUEUES, 1, SBQ_WAKE_BATCH);
wake_batch will be between 1~8. and my experiment result is:
1. hw conditions:  8 cpus,emmc flash, one hw queue, and
queue_depth=64, sched_tag/nr_request=128, init dd->async_depth=96
--before this patch, get sched_tag infor from
sys/kernel/debug/block/mmcblk0/hctx0
---depth=128
---bits_per_word=32
---map_nr=4
---wake_batch=8
---min_shallow_depth=96
--after this patch, get sched_tag infor from
sys/kernel/debug/block/mmcblk0/hctx0
---bits_per_word=32
---map_nr=4
---wake_batch=8
---min_shallow_depth=24
wake_batch not changed.

2. hw conditions:  8 cpus,ufs flash, one hw queue, and queue_depth=31,
sched_tag/nr_request=62, init dd->async_depth=46
--before this patch, get sched_tag infor from sys/kernel/debug/block/sda/hctx0
---depth=62
---bits_per_word=8
---map_nr=8
---wake_batch=7
---min_shallow_depth=46
--after this patch, get sched_tag infor from sys/kernel/debug/block/sda/hctx0
---bits_per_word=8
---map_nr=8
---wake_batch=6
---min_shallow_depth=6
wake_batch changed from 7 to 6, so it seems the patch  have little
impact on wake_batch?

> * dd_limit_depth() still assigns a number in the range 1..nr_requests to
>    data->shallow_depth while a number in the range 1..(1<<bt->sb.shift)
>    should be assigned.
yes, In order to avoid the performance regression problem that Harshit
Mogalapalli reported, this patch will not directly modify
dd->async_depth,
but user can modify dd->async_depth from sysfs according to user's
request. which will modify data->shallow_depth after user modify it by
sysfs.
>
> > So what are your suggestions for fixing the warning shown in commit
> > msg if dd->async_depth is set by the user from sysfs?
> > thanks
>
> How about the two untested patches below?
>
> Thanks,
>
> Bart.
>
>
> Subject: [PATCH 1/2] block: Call .limit_depth() after .hctx has been set
>
> Prepare for using .hctx in dd_limit_depth().
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/blk-mq.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 34060d885c5a..d0db9252bb71 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -435,6 +435,7 @@ __blk_mq_alloc_requests_batch(struct
> blk_mq_alloc_data *data)
>   static struct request *__blk_mq_alloc_requests(struct
> blk_mq_alloc_data *data)
>   {
>         struct request_queue *q = data->q;
> +       struct elevator_mq_ops *ops = NULL;
>         u64 alloc_time_ns = 0;
>         struct request *rq;
>         unsigned int tag;
> @@ -459,13 +460,11 @@ static struct request
> *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>                  */
>                 if ((data->cmd_flags & REQ_OP_MASK) != REQ_OP_FLUSH &&
>                     !blk_op_is_passthrough(data->cmd_flags)) {
> -                       struct elevator_mq_ops *ops = &q->elevator->type->ops;
> +                       ops = &q->elevator->type->ops;
>
>                         WARN_ON_ONCE(data->flags & BLK_MQ_REQ_RESERVED);
>
>                         data->rq_flags |= RQF_USE_SCHED;
> -                       if (ops->limit_depth)
> -                               ops->limit_depth(data->cmd_flags, data);
>                 }
>         }
>
> @@ -478,6 +477,9 @@ static struct request
> *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>         if (data->flags & BLK_MQ_REQ_RESERVED)
>                 data->rq_flags |= RQF_RESV;
>
> +       if (ops && ops->limit_depth)
> +               ops->limit_depth(data->cmd_flags, data);
> +
>         /*
>          * Try batched alloc if we want more than 1 tag.
>          */
>
>
>
I think this part is OK .
> Subject: [PATCH 2/2] block/mq-deadline: Fix the tag reservation code
>
> Fixes: 07757588e507 ("block/mq-deadline: Reserve 25% of scheduler tags
> for synchronous requests")
> ---
>   block/mq-deadline.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 02a916ba62ee..8e780069d91b 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -621,6 +621,20 @@ static struct request *dd_dispatch_request(struct
> blk_mq_hw_ctx *hctx)
>         return rq;
>   }
>
> +/*
> + * 'depth' is a number in the range 0..q->nr_requests. Convert it to a
> number
> + * in the range 0..(1 << bt->sb.shift) since that is the range expected by
> + * sbitmap_get_shallow().
> + */
> +static int dd_to_word_depth(struct blk_mq_hw_ctx *hctx, unsigned int
> qdepth)
> +{
> +       struct sbitmap_queue *bt = &hctx->sched_tags->bitmap_tags;
> +       const unsigned int nrr = hctx->queue->nr_requests;
> +
> +       return max(((qdepth << bt->sb.shift) + nrr - 1) / nrr,
> +                  bt->min_shallow_depth);
will this still  cause the performance regression mentioned by Harshit
Mogalapalli?
this will change data->shallow_depth value that is less than
dd->async_depth " dd->async_depth = max(1UL, 3 * q->nr_requests /
4);".
> +}
> +
>   /*
>    * Called by __blk_mq_alloc_request(). The shallow_depth value set by this
>    * function is used by __blk_mq_get_tag().
> @@ -637,7 +651,7 @@ static void dd_limit_depth(blk_opf_t opf, struct
> blk_mq_alloc_data *data)
>          * Throttle asynchronous requests and writes such that these requests
>          * do not block the allocation of synchronous requests.
>          */
> -       data->shallow_depth = dd->async_depth;
> +       data->shallow_depth = dd_to_word_depth(data->hctx, dd->async_depth);
>   }
>
>   /* Called by blk_mq_update_nr_requests(). */
> @@ -649,7 +663,7 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
>
>         dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
>
> -       sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
> +       sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, 1);
this will also change  min_shallow_depth and affect wake_batch?

My personal opinion is to keep the current dd->aync_depth unchanged to
avoid causing performance regression,
but it should  allow users to set it by sysfs, and the WARN mentioned
best to be solved.
and just only change this part?
 -       sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
 +       sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, 1);
thanks!
>   }
>
>   /* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
>
Bart Van Assche April 2, 2024, 11:20 p.m. UTC | #7
On 4/1/24 10:44 PM, Zhiguo Niu wrote:
> On Tue, Apr 2, 2024 at 5:23 AM Bart Van Assche <bvanassche@acm.org> wrote:
>> Oops, I misread your patch. After having taken another look, my
>> conclusions are as follows:
>> * sbitmap_queue_min_shallow_depth() is called. This causes
>>     sbq->wake_batch to be modified but I don't think that it is a proper
>>     fix for dd_limit_depth().
> yes, it will affect sbq->wake_batch,  But judging from the following code:
> [ ... ]

If we want to allow small values of dd->async_depth, min_shallow_depth
must be 1. The BFQ I/O scheduler also follows this approach.

>> * dd_limit_depth() still assigns a number in the range 1..nr_requests to
>>     data->shallow_depth while a number in the range 1..(1<<bt->sb.shift)
>>     should be assigned.
> yes, In order to avoid the performance regression problem that Harshit
> Mogalapalli reported, this patch will not directly modify
> dd->async_depth,
> but user can modify dd->async_depth from sysfs according to user's
> request. which will modify data->shallow_depth after user modify it by
> sysfs.

It seems like there is no other option than keeping the current default
depth limit for async requests ...

> My personal opinion is to keep the current dd->aync_depth unchanged to
> avoid causing performance regression,
> but it should  allow users to set it by sysfs, and the WARN mentioned
> best to be solved.
> and just only change this part?
>   -       sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
>   +       sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, 1);
> thanks!

The above change will suppress the kernel warning. I think the other
changes from patch 2/2 are also necessary. Otherwise the unit of
"async_depth" will depend on sbitmap word shift parameter. I don't think
that users should have to worry about which shift value has been chosen
by the sbitmap implementation.

Thanks,

Bart.
Zhiguo Niu April 3, 2024, 5:41 a.m. UTC | #8
On Wed, Apr 3, 2024 at 7:20 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 4/1/24 10:44 PM, Zhiguo Niu wrote:
> > On Tue, Apr 2, 2024 at 5:23 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >> Oops, I misread your patch. After having taken another look, my
> >> conclusions are as follows:
> >> * sbitmap_queue_min_shallow_depth() is called. This causes
> >>     sbq->wake_batch to be modified but I don't think that it is a proper
> >>     fix for dd_limit_depth().
> > yes, it will affect sbq->wake_batch,  But judging from the following code:
> > [ ... ]
>
> If we want to allow small values of dd->async_depth, min_shallow_depth
> must be 1. The BFQ I/O scheduler also follows this approach.
>
> >> * dd_limit_depth() still assigns a number in the range 1..nr_requests to
> >>     data->shallow_depth while a number in the range 1..(1<<bt->sb.shift)
> >>     should be assigned.
> > yes, In order to avoid the performance regression problem that Harshit
> > Mogalapalli reported, this patch will not directly modify
> > dd->async_depth,
> > but user can modify dd->async_depth from sysfs according to user's
> > request. which will modify data->shallow_depth after user modify it by
> > sysfs.
>
> It seems like there is no other option than keeping the current default
> depth limit for async requests ...
>
> > My personal opinion is to keep the current dd->aync_depth unchanged to
> > avoid causing performance regression,
> > but it should  allow users to set it by sysfs, and the WARN mentioned
> > best to be solved.
> > and just only change this part?
> >   -       sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
> >   +       sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, 1);
> > thanks!
>
> The above change will suppress the kernel warning. I think the other
> changes from patch 2/2 are also necessary. Otherwise the unit of
> "async_depth" will depend on sbitmap word shift parameter. I don't think
> that users should have to worry about which shift value has been chosen
> by the sbitmap implementation.
Hi Bart Van Assche,
So will you help do an  official patch version about patch 1 and patch 2?

Besides, I am not sure  that  the following part will cause
performance regression or not?
+static int dd_to_word_depth(struct blk_mq_hw_ctx *hctx, unsigned int
qdepth)
+{
+       struct sbitmap_queue *bt = &hctx->sched_tags->bitmap_tags;
+       const unsigned int nrr = hctx->queue->nr_requests;
+
+       return max(((qdepth << bt->sb.shift) + nrr - 1) / nrr,
+                  bt->min_shallow_depth);
+}
+
which is somewhat similar to the previous commit d47f9717e5cf
("block/mq-deadline: use correct way to throttling write
requests"). just an example
hw conditions:  8 cpus,emmc flash, one hw queue, and queue_depth=64,
sched_tag/nr_request=128, init dd->async_depth=96
 bt->sb.shift=5,
so max(((qdepth << bt->sb.shift) + nrr - 1) / nrr,bt->min_shallow_depth);
will get 24 and commit d47f9717e5cf ("block/mq-deadline: use correct
way to throttling write
requests") also get 24, but your revert commit 256aab46e316 ("Revert
"block/mq-deadline: use
correct way to throttling write requests"") will get 96,
or am I missing something else?
thanks!
>
> Thanks,
>
> Bart.
>
diff mbox series

Patch

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 02a916b..89c516e 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -646,10 +646,12 @@  static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
 	struct request_queue *q = hctx->queue;
 	struct deadline_data *dd = q->elevator->elevator_data;
 	struct blk_mq_tags *tags = hctx->sched_tags;
+	unsigned int shift = tags->bitmap_tags.sb.shift;
+	unsigned int dd_min_depth = max(1U, 3 * (1U << shift)  / 4);
 
 	dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
 
-	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
+	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd_min_depth);
 }
 
 /* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */