Message ID | 20220316061134.3812309-1-shinichiro.kawasaki@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: limit request dispatch loop duration | expand |
On 3/16/22 15:11, Shin'ichiro Kawasaki wrote: > When IO requests are made continuously and the target block device > handles requests faster than request arrival, the request dispatch loop > keeps on repeating to dispatch the arriving requests very long time, > more than a minute. Since the loop runs as a workqueue worker task, the > very long loop duration triggers workqueue watchdog timeout and BUG [1]. > > To avoid the very long loop duration, break the loop periodically. When > opportunity to dispatch requests still exists, check need_resched(). If > need_resched() returns true, the dispatch loop already consumed its time > slice, then reschedule the dispatch work and break the loop. With heavy > IO load, need_resched() does not return true for 20~30 seconds. To cover > such case, check time spent in the dispatch loop with jiffies. If more > than 1 second is spent, reschedule the dispatch work and break the loop. > > [1] > > [ 609.691437] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 35s! > [ 609.701820] Showing busy workqueues and worker pools: > [ 609.707915] workqueue events: flags=0x0 > [ 609.712615] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 609.712626] pending: drm_fb_helper_damage_work [drm_kms_helper] > [ 609.712687] workqueue events_freezable: flags=0x4 > [ 609.732943] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 609.732952] pending: pci_pme_list_scan > [ 609.732968] workqueue events_power_efficient: flags=0x80 > [ 609.751947] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 609.751955] pending: neigh_managed_work > [ 609.752018] workqueue kblockd: flags=0x18 > [ 609.769480] pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4 > [ 609.769488] in-flight: 1020:blk_mq_run_work_fn > [ 609.769498] pending: blk_mq_timeout_work, blk_mq_run_work_fn > [ 609.769744] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=35s workers=2 idle: 67 > [ 639.899730] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 66s! > [ 639.909513] Showing busy workqueues and worker pools: > [ 639.915404] workqueue events: flags=0x0 > [ 639.920197] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 639.920215] pending: drm_fb_helper_damage_work [drm_kms_helper] > [ 639.920365] workqueue kblockd: flags=0x18 > [ 639.939932] pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4 > [ 639.939942] in-flight: 1020:blk_mq_run_work_fn > [ 639.939955] pending: blk_mq_timeout_work, blk_mq_run_work_fn > [ 639.940212] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=66s workers=2 idle: 67 > > Fixes: 6e6fcbc27e778 ("blk-mq: support batching dispatch in case of io") > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > Cc: stable@vger.kernel.org # v5.10+ > Link: https://lore.kernel.org/linux-block/20220310091649.zypaem5lkyfadymg@shindev/ > Reviewed-by: Ming Lei <ming.lei@redhat.com> > Tested-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-mq-sched.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 55488ba97823..64941615befc 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -181,9 +181,15 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > { > int ret; > + unsigned long end = jiffies + HZ; > > do { > ret = __blk_mq_do_dispatch_sched(hctx); > + if (ret == 1 && > + (need_resched() || time_is_after_jiffies(end))) { > + blk_mq_delay_run_hw_queue(hctx, 0); > + break; Minor nit: you could "return 1;" directly here. > + } > } while (ret == 1); > > return ret; Otherwise, looks good. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
On 3/16/22 12:11 AM, Shin'ichiro Kawasaki wrote: > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 55488ba97823..64941615befc 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -181,9 +181,15 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > { > int ret; > + unsigned long end = jiffies + HZ; > > do { > ret = __blk_mq_do_dispatch_sched(hctx); > + if (ret == 1 && > + (need_resched() || time_is_after_jiffies(end))) { > + blk_mq_delay_run_hw_queue(hctx, 0); > + break; > + } > } while (ret == 1); > > return ret; I think it'd look cleaner as: static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) { unsigned long end = jiffies + HZ; int ret; do { ret = __blk_mq_do_dispatch_sched(hctx); if (ret != 1) break; if (need_resched() || time_is_after_jiffies(end)) { blk_mq_delay_run_hw_queue(hctx, 0); break; } } while (1); return ret; }
On Mar 16, 2022 / 15:11, Shin'ichiro Kawasaki wrote: (snip) > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 55488ba97823..64941615befc 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -181,9 +181,15 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > { > int ret; > + unsigned long end = jiffies + HZ; > > do { > ret = __blk_mq_do_dispatch_sched(hctx); > + if (ret == 1 && > + (need_resched() || time_is_after_jiffies(end))) { I've noticed that I made a horrible mistake. The time_is_after_jiffies(end) in the line above is wrong. It should be time_is_before_jiffies(end). My bad. Before I post v2 with this fix, I will ensure the fix by checking call frequency of blk_mq_delay_run_hw_queue using kprobe. > + blk_mq_delay_run_hw_queue(hctx, 0); > + break; > + } > } while (ret == 1); > > return ret; > -- > 2.34.1 >
On Mar 16, 2022 / 06:20, Jens Axboe wrote: > On 3/16/22 12:11 AM, Shin'ichiro Kawasaki wrote: > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > index 55488ba97823..64941615befc 100644 > > --- a/block/blk-mq-sched.c > > +++ b/block/blk-mq-sched.c > > @@ -181,9 +181,15 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > { > > int ret; > > + unsigned long end = jiffies + HZ; > > > > do { > > ret = __blk_mq_do_dispatch_sched(hctx); > > + if (ret == 1 && > > + (need_resched() || time_is_after_jiffies(end))) { > > + blk_mq_delay_run_hw_queue(hctx, 0); > > + break; > > + } > > } while (ret == 1); > > > > return ret; > > I think it'd look cleaner as: > > static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > { > unsigned long end = jiffies + HZ; > int ret; > > do { > ret = __blk_mq_do_dispatch_sched(hctx); > if (ret != 1) > break; > if (need_resched() || time_is_after_jiffies(end)) { > blk_mq_delay_run_hw_queue(hctx, 0); > break; > } > } while (1); > > return ret; > } Thank you for this suggestion. It looks nice since avoids duplicated 'ret == 1'. Will reflect in v2.
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 55488ba97823..64941615befc 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -181,9 +181,15 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) { int ret; + unsigned long end = jiffies + HZ; do { ret = __blk_mq_do_dispatch_sched(hctx); + if (ret == 1 && + (need_resched() || time_is_after_jiffies(end))) { + blk_mq_delay_run_hw_queue(hctx, 0); + break; + } } while (ret == 1); return ret;