Message ID | 20200619084214.337449-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | dm-rq: don't call blk_mq_queue_stopped in dm_stop_queue() | expand |
Hi Ming, Thanks for the patch! But I'm having a hard time understanding what you've written in the patch header, On Fri, Jun 19 2020 at 4:42am -0400, Ming Lei <ming.lei@redhat.com> wrote: > dm-rq won't stop queue, meantime blk-mq won't stop one queue too, so > remove the check. It'd be helpful if you could unpack this with more detail before going on to explain why using blk_queue_quiesced, despite dm-rq using blk_mq_queue_stopped, would also be ineffective. SO: > dm-rq won't stop queue 1) why won't dm-rq stop the queue? Do you mean it won't reliably _always_ stop the queue because of the blk_mq_queue_stopped() check? > meantime blk-mq won't stop one queue too, so remove the check. 2) Meaning?: blk_mq_queue_stopped() will return true even if only one hw queue is stopped, given blk-mq must stop all hw queues a positive return from this blk_mq_queue_stopped() check is incorrectly assuming it meanss all hw queues are stopped. > dm_stop_queue() actually tries to quiesce hw queues via blk_mq_quiesce_queue(), > we can't check via blk_queue_quiesced for avoiding unnecessary queue > quiesce because the flag is set before synchronize_rcu() and dm_stop_queue > may be called when synchronize_rcu from another blk_mq_quiesce_queue is > in-progress. But I'm left with questions/confusion on this too: 1) you mention blk_queue_quiesced instead of blk_mq_queue_stopped, so I assume you mean that: not only is blk_mq_queue_stopped() ineffective, blk_queue_quiesced() would be too? 2) the race you detail (with competing blk_mq_quiesce_queue) relative to synchronize_rcu() and testing "the flag" is very detailed yet vague. Anyway, once we get this heaader cleaned up a bit more I'll be happy to get this staged as a stable@ fix for 5.8 inclusion ASAP. Thanks! Mike > > Cc: linux-block@vger.kernel.org > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > drivers/md/dm-rq.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c > index f60c02512121..ed4d5ea66ccc 100644 > --- a/drivers/md/dm-rq.c > +++ b/drivers/md/dm-rq.c > @@ -70,9 +70,6 @@ void dm_start_queue(struct request_queue *q) > > void dm_stop_queue(struct request_queue *q) > { > - if (blk_mq_queue_stopped(q)) > - return; > - > blk_mq_quiesce_queue(q); > } > > -- > 2.25.2 >
Hi Mike, On Fri, Jun 19, 2020 at 05:42:50AM -0400, Mike Snitzer wrote: > Hi Ming, > > Thanks for the patch! But I'm having a hard time understanding what > you've written in the patch header, > > On Fri, Jun 19 2020 at 4:42am -0400, > Ming Lei <ming.lei@redhat.com> wrote: > > > dm-rq won't stop queue, meantime blk-mq won't stop one queue too, so > > remove the check. > > It'd be helpful if you could unpack this with more detail before going on > to explain why using blk_queue_quiesced, despite dm-rq using > blk_mq_queue_stopped, would also be ineffective. > > SO: > > > dm-rq won't stop queue > > 1) why won't dm-rq stop the queue? Do you mean it won't reliably > _always_ stop the queue because of the blk_mq_queue_stopped() check? device mapper doesn't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues. > > > meantime blk-mq won't stop one queue too, so remove the check. > > 2) Meaning?: blk_mq_queue_stopped() will return true even if only one hw > queue is stopped, given blk-mq must stop all hw queues a positive return > from this blk_mq_queue_stopped() check is incorrectly assuming it meanss > all hw queues are stopped. blk-mq won't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues for dm-rq's queue too, so dm-rq's hw queue won't be stopped. BTW blk_mq_stop_hw_queue or blk_mq_stop_hw_queues are supposed to be used for throttling queue. > > > dm_stop_queue() actually tries to quiesce hw queues via blk_mq_quiesce_queue(), > > we can't check via blk_queue_quiesced for avoiding unnecessary queue > > quiesce because the flag is set before synchronize_rcu() and dm_stop_queue > > may be called when synchronize_rcu from another blk_mq_quiesce_queue is > > in-progress. > > But I'm left with questions/confusion on this too: > > 1) you mention blk_queue_quiesced instead of blk_mq_queue_stopped, so I > assume you mean that: not only is blk_mq_queue_stopped() > ineffective, blk_queue_quiesced() would be too? blk_mq_queue_stopped isn't necessary because dm-rq's hw queue won't be stopped by anyone, meantime replacing it with blk_queue_quiesced() is wrong. > > 2) the race you detail (with competing blk_mq_quiesce_queue) relative to > synchronize_rcu() and testing "the flag" is very detailed yet vague. If two code paths are calling dm_stop_queue() at the same time, one path may return immediately and it is wrong, sine synchronize_rcu() from another path may not be done. > > Anyway, once we get this heaader cleaned up a bit more I'll be happy to > get this staged as a stable@ fix for 5.8 inclusion ASAP. This patch isn't a fix, and it shouldn't be related with rhel8's issue. Thanks, Ming > > Thanks! > Mike > > > > > Cc: linux-block@vger.kernel.org > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > drivers/md/dm-rq.c | 3 --- > > 1 file changed, 3 deletions(-) > > > > diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c > > index f60c02512121..ed4d5ea66ccc 100644 > > --- a/drivers/md/dm-rq.c > > +++ b/drivers/md/dm-rq.c > > @@ -70,9 +70,6 @@ void dm_start_queue(struct request_queue *q) > > > > void dm_stop_queue(struct request_queue *q) > > { > > - if (blk_mq_queue_stopped(q)) > > - return; > > - > > blk_mq_quiesce_queue(q); > > } > > > > -- > > 2.25.2 > >
On Fri, Jun 19 2020 at 6:11am -0400, Ming Lei <ming.lei@redhat.com> wrote: > Hi Mike, > > On Fri, Jun 19, 2020 at 05:42:50AM -0400, Mike Snitzer wrote: > > Hi Ming, > > > > Thanks for the patch! But I'm having a hard time understanding what > > you've written in the patch header, > > > > On Fri, Jun 19 2020 at 4:42am -0400, > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > dm-rq won't stop queue, meantime blk-mq won't stop one queue too, so > > > remove the check. > > > > It'd be helpful if you could unpack this with more detail before going on > > to explain why using blk_queue_quiesced, despite dm-rq using > > blk_mq_queue_stopped, would also be ineffective. > > > > SO: > > > > > dm-rq won't stop queue > > > > 1) why won't dm-rq stop the queue? Do you mean it won't reliably > > _always_ stop the queue because of the blk_mq_queue_stopped() check? > > device mapper doesn't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues. > > > > > > meantime blk-mq won't stop one queue too, so remove the check. > > > > 2) Meaning?: blk_mq_queue_stopped() will return true even if only one hw > > queue is stopped, given blk-mq must stop all hw queues a positive return > > from this blk_mq_queue_stopped() check is incorrectly assuming it meanss > > all hw queues are stopped. > > blk-mq won't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues for > dm-rq's queue too, so dm-rq's hw queue won't be stopped. > > BTW blk_mq_stop_hw_queue or blk_mq_stop_hw_queues are supposed to be > used for throttling queue. I'm going to look at actually stopping the queue (using one of these interfaces). I didn't realize I wasn't actually stopping the queue. The intent was to do so. In speaking with Jens yesterday about freeze vs stop: it is clear that dm-rq needs to still be able to allocate new requests, but _not_ call the queue_rq to issue the requests, while "stopped" (due to dm-mpath potentially deferring retries of failed requests because of path failure while quiescing the queue during DM device suspend). But that freezing the queue goes too far because it won't allow such request allocation. > > > dm_stop_queue() actually tries to quiesce hw queues via blk_mq_quiesce_queue(), > > > we can't check via blk_queue_quiesced for avoiding unnecessary queue > > > quiesce because the flag is set before synchronize_rcu() and dm_stop_queue > > > may be called when synchronize_rcu from another blk_mq_quiesce_queue is > > > in-progress. > > > > But I'm left with questions/confusion on this too: > > > > 1) you mention blk_queue_quiesced instead of blk_mq_queue_stopped, so I > > assume you mean that: not only is blk_mq_queue_stopped() > > ineffective, blk_queue_quiesced() would be too? > > blk_mq_queue_stopped isn't necessary because dm-rq's hw queue won't be > stopped by anyone, meantime replacing it with blk_queue_quiesced() is wrong. > > > > > 2) the race you detail (with competing blk_mq_quiesce_queue) relative to > > synchronize_rcu() and testing "the flag" is very detailed yet vague. > > If two code paths are calling dm_stop_queue() at the same time, one path may > return immediately and it is wrong, sine synchronize_rcu() from another path > may not be done. > > > > > Anyway, once we get this heaader cleaned up a bit more I'll be happy to > > get this staged as a stable@ fix for 5.8 inclusion ASAP. > > This patch isn't a fix, and it shouldn't be related with rhel8's issue. I realize that now. I've changed the patch header to be a bit clearer and staged it for 5.9, see: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.9&id=06e788ed59e0095b679bdce9e39c1a251032ae62 Thanks, Mike
On Fri, Jun 19 2020 at 12:06pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Fri, Jun 19 2020 at 6:11am -0400, > Ming Lei <ming.lei@redhat.com> wrote: > > > Hi Mike, > > > > On Fri, Jun 19, 2020 at 05:42:50AM -0400, Mike Snitzer wrote: > > > Hi Ming, > > > > > > Thanks for the patch! But I'm having a hard time understanding what > > > you've written in the patch header, > > > > > > On Fri, Jun 19 2020 at 4:42am -0400, > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > dm-rq won't stop queue, meantime blk-mq won't stop one queue too, so > > > > remove the check. > > > > > > It'd be helpful if you could unpack this with more detail before going on > > > to explain why using blk_queue_quiesced, despite dm-rq using > > > blk_mq_queue_stopped, would also be ineffective. > > > > > > SO: > > > > > > > dm-rq won't stop queue > > > > > > 1) why won't dm-rq stop the queue? Do you mean it won't reliably > > > _always_ stop the queue because of the blk_mq_queue_stopped() check? > > > > device mapper doesn't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues. > > > > > > > > > meantime blk-mq won't stop one queue too, so remove the check. > > > > > > 2) Meaning?: blk_mq_queue_stopped() will return true even if only one hw > > > queue is stopped, given blk-mq must stop all hw queues a positive return > > > from this blk_mq_queue_stopped() check is incorrectly assuming it meanss > > > all hw queues are stopped. > > > > blk-mq won't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues for > > dm-rq's queue too, so dm-rq's hw queue won't be stopped. > > > > BTW blk_mq_stop_hw_queue or blk_mq_stop_hw_queues are supposed to be > > used for throttling queue. > > I'm going to look at actually stopping the queue (using one of these > interfaces). I didn't realize I wasn't actually stopping the queue. > The intent was to do so. > > In speaking with Jens yesterday about freeze vs stop: it is clear that > dm-rq needs to still be able to allocate new requests, but _not_ call > the queue_rq to issue the requests, while "stopped" (due to dm-mpath > potentially deferring retries of failed requests because of path failure > while quiescing the queue during DM device suspend). But that freezing > the queue goes too far because it won't allow such request allocation. Seems I'm damned if I do (stop) or damned if I don't (new reports of requests completing after DM device suspend's blk_mq_quiesce_queue()+dm_wait_for_completion()). I'm left at something of a loss about what to do! Bart? Jens? Ming? Looking closer at the git history, commit 7b17c2f7292ba takes center stage: commit 7b17c2f7292ba1f3f98dae3f7077f9e569653276 Author: Bart Van Assche <bart.vanassche@sandisk.com> Date: Fri Oct 28 17:22:16 2016 -0700 dm: Fix a race condition related to stopping and starting queues Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request() calls have stopped before setting the "queue stopped" flag. This allows to remove the "queue stopped" test from dm_mq_queue_rq() and dm_mq_requeue_request(). This patch fixes a race condition because dm_mq_queue_rq() is called without holding the queue lock and hence BLK_MQ_S_STOPPED can be set at any time while dm_mq_queue_rq() is in progress. This patch prevents that the following hang occurs sporadically when using dm-mq: INFO: task systemd-udevd:10111 blocked for more than 480 seconds. Call Trace: [<ffffffff8161f397>] schedule+0x37/0x90 [<ffffffff816239ef>] schedule_timeout+0x27f/0x470 [<ffffffff8161e76f>] io_schedule_timeout+0x9f/0x110 [<ffffffff8161fb36>] bit_wait_io+0x16/0x60 [<ffffffff8161f929>] __wait_on_bit_lock+0x49/0xa0 [<ffffffff8114fe69>] __lock_page+0xb9/0xc0 [<ffffffff81165d90>] truncate_inode_pages_range+0x3e0/0x760 [<ffffffff81166120>] truncate_inode_pages+0x10/0x20 [<ffffffff81212a20>] kill_bdev+0x30/0x40 [<ffffffff81213d41>] __blkdev_put+0x71/0x360 [<ffffffff81214079>] blkdev_put+0x49/0x170 [<ffffffff812141c0>] blkdev_close+0x20/0x30 [<ffffffff811d48e8>] __fput+0xe8/0x1f0 [<ffffffff811d4a29>] ____fput+0x9/0x10 [<ffffffff810842d3>] task_work_run+0x83/0xb0 [<ffffffff8106606e>] do_exit+0x3ee/0xc40 [<ffffffff8106694b>] do_group_exit+0x4b/0xc0 [<ffffffff81073d9a>] get_signal+0x2ca/0x940 [<ffffffff8101bf43>] do_signal+0x23/0x660 [<ffffffff810022b3>] exit_to_usermode_loop+0x73/0xb0 [<ffffffff81002cb0>] syscall_return_slowpath+0xb0/0xc0 [<ffffffff81624e33>] entry_SYSCALL_64_fastpath+0xa6/0xa8 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 09c958b6f038..8b92e066bb69 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -102,7 +102,7 @@ static void dm_mq_stop_queue(struct request_queue *q) if (blk_mq_queue_stopped(q)) return; - blk_mq_stop_hw_queues(q); + blk_mq_quiesce_queue(q); } void dm_stop_queue(struct request_queue *q) @@ -880,17 +880,6 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx, dm_put_live_table(md, srcu_idx); } - /* - * On suspend dm_stop_queue() handles stopping the blk-mq - * request_queue BUT: even though the hw_queues are marked - * BLK_MQ_S_STOPPED at that point there is still a race that - * is allowing block/blk-mq.c to call ->queue_rq against a - * hctx that it really shouldn't. The following check guards - * against this rarity (albeit _not_ race-free). - */ - if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))) - return BLK_MQ_RQ_QUEUE_BUSY; - if (ti->type->busy && ti->type->busy(ti)) return BLK_MQ_RQ_QUEUE_BUSY;
On Fri, Jun 19, 2020 at 12:06:57PM -0400, Mike Snitzer wrote: > On Fri, Jun 19 2020 at 6:11am -0400, > Ming Lei <ming.lei@redhat.com> wrote: > > > Hi Mike, > > > > On Fri, Jun 19, 2020 at 05:42:50AM -0400, Mike Snitzer wrote: > > > Hi Ming, > > > > > > Thanks for the patch! But I'm having a hard time understanding what > > > you've written in the patch header, > > > > > > On Fri, Jun 19 2020 at 4:42am -0400, > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > dm-rq won't stop queue, meantime blk-mq won't stop one queue too, so > > > > remove the check. > > > > > > It'd be helpful if you could unpack this with more detail before going on > > > to explain why using blk_queue_quiesced, despite dm-rq using > > > blk_mq_queue_stopped, would also be ineffective. > > > > > > SO: > > > > > > > dm-rq won't stop queue > > > > > > 1) why won't dm-rq stop the queue? Do you mean it won't reliably > > > _always_ stop the queue because of the blk_mq_queue_stopped() check? > > > > device mapper doesn't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues. > > > > > > > > > meantime blk-mq won't stop one queue too, so remove the check. > > > > > > 2) Meaning?: blk_mq_queue_stopped() will return true even if only one hw > > > queue is stopped, given blk-mq must stop all hw queues a positive return > > > from this blk_mq_queue_stopped() check is incorrectly assuming it meanss > > > all hw queues are stopped. > > > > blk-mq won't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues for > > dm-rq's queue too, so dm-rq's hw queue won't be stopped. > > > > BTW blk_mq_stop_hw_queue or blk_mq_stop_hw_queues are supposed to be > > used for throttling queue. > > I'm going to look at actually stopping the queue (using one of these > interfaces). I didn't realize I wasn't actually stopping the queue. > The intent was to do so. > > In speaking with Jens yesterday about freeze vs stop: it is clear that > dm-rq needs to still be able to allocate new requests, but _not_ call > the queue_rq to issue the requests, while "stopped" (due to dm-mpath > potentially deferring retries of failed requests because of path failure > while quiescing the queue during DM device suspend). But that freezing > the queue goes too far because it won't allow such request allocation. Freezing shouldn't be a good choice for driver usually, and quiesce is exactly what you expect: request allocation is allowed, meantime, no .queue_rq is possible. > > > > > dm_stop_queue() actually tries to quiesce hw queues via blk_mq_quiesce_queue(), > > > > we can't check via blk_queue_quiesced for avoiding unnecessary queue > > > > quiesce because the flag is set before synchronize_rcu() and dm_stop_queue > > > > may be called when synchronize_rcu from another blk_mq_quiesce_queue is > > > > in-progress. > > > > > > But I'm left with questions/confusion on this too: > > > > > > 1) you mention blk_queue_quiesced instead of blk_mq_queue_stopped, so I > > > assume you mean that: not only is blk_mq_queue_stopped() > > > ineffective, blk_queue_quiesced() would be too? > > > > blk_mq_queue_stopped isn't necessary because dm-rq's hw queue won't be > > stopped by anyone, meantime replacing it with blk_queue_quiesced() is wrong. > > > > > > > > 2) the race you detail (with competing blk_mq_quiesce_queue) relative to > > > synchronize_rcu() and testing "the flag" is very detailed yet vague. > > > > If two code paths are calling dm_stop_queue() at the same time, one path may > > return immediately and it is wrong, sine synchronize_rcu() from another path > > may not be done. > > > > > > > > Anyway, once we get this heaader cleaned up a bit more I'll be happy to > > > get this staged as a stable@ fix for 5.8 inclusion ASAP. > > > > This patch isn't a fix, and it shouldn't be related with rhel8's issue. > > I realize that now. I've changed the patch header to be a bit clearer > and staged it for 5.9, see: > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.9&id=06e788ed59e0095b679bdce9e39c1a251032ae62 Thanks!
On Fri, Jun 19, 2020 at 01:40:41PM -0400, Mike Snitzer wrote: > On Fri, Jun 19 2020 at 12:06pm -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > > > On Fri, Jun 19 2020 at 6:11am -0400, > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > Hi Mike, > > > > > > On Fri, Jun 19, 2020 at 05:42:50AM -0400, Mike Snitzer wrote: > > > > Hi Ming, > > > > > > > > Thanks for the patch! But I'm having a hard time understanding what > > > > you've written in the patch header, > > > > > > > > On Fri, Jun 19 2020 at 4:42am -0400, > > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > > > dm-rq won't stop queue, meantime blk-mq won't stop one queue too, so > > > > > remove the check. > > > > > > > > It'd be helpful if you could unpack this with more detail before going on > > > > to explain why using blk_queue_quiesced, despite dm-rq using > > > > blk_mq_queue_stopped, would also be ineffective. > > > > > > > > SO: > > > > > > > > > dm-rq won't stop queue > > > > > > > > 1) why won't dm-rq stop the queue? Do you mean it won't reliably > > > > _always_ stop the queue because of the blk_mq_queue_stopped() check? > > > > > > device mapper doesn't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues. > > > > > > > > > > > > meantime blk-mq won't stop one queue too, so remove the check. > > > > > > > > 2) Meaning?: blk_mq_queue_stopped() will return true even if only one hw > > > > queue is stopped, given blk-mq must stop all hw queues a positive return > > > > from this blk_mq_queue_stopped() check is incorrectly assuming it meanss > > > > all hw queues are stopped. > > > > > > blk-mq won't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues for > > > dm-rq's queue too, so dm-rq's hw queue won't be stopped. > > > > > > BTW blk_mq_stop_hw_queue or blk_mq_stop_hw_queues are supposed to be > > > used for throttling queue. > > > > I'm going to look at actually stopping the queue (using one of these > > interfaces). I didn't realize I wasn't actually stopping the queue. > > The intent was to do so. > > > > In speaking with Jens yesterday about freeze vs stop: it is clear that > > dm-rq needs to still be able to allocate new requests, but _not_ call > > the queue_rq to issue the requests, while "stopped" (due to dm-mpath > > potentially deferring retries of failed requests because of path failure > > while quiescing the queue during DM device suspend). But that freezing > > the queue goes too far because it won't allow such request allocation. > > Seems I'm damned if I do (stop) or damned if I don't (new reports of > requests completing after DM device suspend's > blk_mq_quiesce_queue()+dm_wait_for_completion()). request(but not new) completing is possible after blk_mq_quiesce_queue()+ dm_wait_for_completion, because blk_mq_rq_inflight() only checks INFLIGHT request. If all requests are marked as MQ_RQ_COMPLETE, blk_mq_rq_inflight() still may return false. However, MQ_RQ_COMPLETE is one transient state. So what does dm-rq expect from dm_wait_for_completion()? If it is just no new request entering dm_queue_rq(), there shouldn't be issue. If dm-rq hopes there aren't any real inflight request(MQ_RQ_COMPLETE & MQ_RQ_INFLIGHT), we can change blk_mq_rq_inflight to support that. Thanks, Ming
On Sat, Jun 20, 2020 at 06:37:44AM +0800, Ming Lei wrote: > On Fri, Jun 19, 2020 at 01:40:41PM -0400, Mike Snitzer wrote: > > On Fri, Jun 19 2020 at 12:06pm -0400, > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > On Fri, Jun 19 2020 at 6:11am -0400, > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > Hi Mike, > > > > > > > > On Fri, Jun 19, 2020 at 05:42:50AM -0400, Mike Snitzer wrote: > > > > > Hi Ming, > > > > > > > > > > Thanks for the patch! But I'm having a hard time understanding what > > > > > you've written in the patch header, > > > > > > > > > > On Fri, Jun 19 2020 at 4:42am -0400, > > > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > > > > > dm-rq won't stop queue, meantime blk-mq won't stop one queue too, so > > > > > > remove the check. > > > > > > > > > > It'd be helpful if you could unpack this with more detail before going on > > > > > to explain why using blk_queue_quiesced, despite dm-rq using > > > > > blk_mq_queue_stopped, would also be ineffective. > > > > > > > > > > SO: > > > > > > > > > > > dm-rq won't stop queue > > > > > > > > > > 1) why won't dm-rq stop the queue? Do you mean it won't reliably > > > > > _always_ stop the queue because of the blk_mq_queue_stopped() check? > > > > > > > > device mapper doesn't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues. > > > > > > > > > > > > > > > meantime blk-mq won't stop one queue too, so remove the check. > > > > > > > > > > 2) Meaning?: blk_mq_queue_stopped() will return true even if only one hw > > > > > queue is stopped, given blk-mq must stop all hw queues a positive return > > > > > from this blk_mq_queue_stopped() check is incorrectly assuming it meanss > > > > > all hw queues are stopped. > > > > > > > > blk-mq won't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues for > > > > dm-rq's queue too, so dm-rq's hw queue won't be stopped. > > > > > > > > BTW blk_mq_stop_hw_queue or blk_mq_stop_hw_queues are supposed to be > > > > used for throttling queue. > > > > > > I'm going to look at actually stopping the queue (using one of these > > > interfaces). I didn't realize I wasn't actually stopping the queue. > > > The intent was to do so. > > > > > > In speaking with Jens yesterday about freeze vs stop: it is clear that > > > dm-rq needs to still be able to allocate new requests, but _not_ call > > > the queue_rq to issue the requests, while "stopped" (due to dm-mpath > > > potentially deferring retries of failed requests because of path failure > > > while quiescing the queue during DM device suspend). But that freezing > > > the queue goes too far because it won't allow such request allocation. > > > > Seems I'm damned if I do (stop) or damned if I don't (new reports of > > requests completing after DM device suspend's > > blk_mq_quiesce_queue()+dm_wait_for_completion()). > > request(but not new) completing is possible after blk_mq_quiesce_queue()+ > dm_wait_for_completion, because blk_mq_rq_inflight() only checks INFLIGHT > request. If all requests are marked as MQ_RQ_COMPLETE, blk_mq_rq_inflight() > still may return false. However, MQ_RQ_COMPLETE is one transient state. > > So what does dm-rq expect from dm_wait_for_completion()? > > If it is just no new request entering dm_queue_rq(), there shouldn't be > issue. > > If dm-rq hopes there aren't any real inflight request(MQ_RQ_COMPLETE & > MQ_RQ_INFLIGHT), we can change blk_mq_rq_inflight to support that. Hi Mike, Please test the following patch and see if the issue can be fixed: From faf0f9f15627446e8c35db518e37a4a2e4323eb2 Mon Sep 17 00:00:00 2001 From: Ming Lei <ming.lei@redhat.com> Date: Sat, 20 Jun 2020 06:45:49 +0800 Subject: [PATCH] blk-mq: cover request of MQ_RQ_COMPLETE as inflight in blk_mq_rq_inflight When request is marked as MQ_RQ_COMPLETE, ->complete isn't called & done yet, and driver may expect that there isn't any driver related activity since blk_mq_queue_inflight() returns. Fixes it by covering request of MQ_RQ_COMPLETE as inflight in blk_mq_rq_inflight(). Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 4f57d27bfa73..7bc084b5bc37 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -831,7 +831,7 @@ static bool blk_mq_rq_inflight(struct blk_mq_hw_ctx *hctx, struct request *rq, * If we find a request that is inflight and the queue matches, * we know the queue is busy. Return false to stop the iteration. */ - if (rq->state == MQ_RQ_IN_FLIGHT && rq->q == hctx->queue) { + if (rq->state != MQ_RQ_IDLE && rq->q == hctx->queue) { bool *busy = priv; *busy = true;
On Fri, Jun 19 2020 at 6:52pm -0400, Ming Lei <ming.lei@redhat.com> wrote: > On Sat, Jun 20, 2020 at 06:37:44AM +0800, Ming Lei wrote: > > On Fri, Jun 19, 2020 at 01:40:41PM -0400, Mike Snitzer wrote: > > > On Fri, Jun 19 2020 at 12:06pm -0400, > > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > > > On Fri, Jun 19 2020 at 6:11am -0400, > > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > > > Hi Mike, > > > > > > > > > > On Fri, Jun 19, 2020 at 05:42:50AM -0400, Mike Snitzer wrote: > > > > > > Hi Ming, > > > > > > > > > > > > Thanks for the patch! But I'm having a hard time understanding what > > > > > > you've written in the patch header, > > > > > > > > > > > > On Fri, Jun 19 2020 at 4:42am -0400, > > > > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > > > > > > > dm-rq won't stop queue, meantime blk-mq won't stop one queue too, so > > > > > > > remove the check. > > > > > > > > > > > > It'd be helpful if you could unpack this with more detail before going on > > > > > > to explain why using blk_queue_quiesced, despite dm-rq using > > > > > > blk_mq_queue_stopped, would also be ineffective. > > > > > > > > > > > > SO: > > > > > > > > > > > > > dm-rq won't stop queue > > > > > > > > > > > > 1) why won't dm-rq stop the queue? Do you mean it won't reliably > > > > > > _always_ stop the queue because of the blk_mq_queue_stopped() check? > > > > > > > > > > device mapper doesn't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues. > > > > > > > > > > > > > > > > > > meantime blk-mq won't stop one queue too, so remove the check. > > > > > > > > > > > > 2) Meaning?: blk_mq_queue_stopped() will return true even if only one hw > > > > > > queue is stopped, given blk-mq must stop all hw queues a positive return > > > > > > from this blk_mq_queue_stopped() check is incorrectly assuming it meanss > > > > > > all hw queues are stopped. > > > > > > > > > > blk-mq won't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues for > > > > > dm-rq's queue too, so dm-rq's hw queue won't be stopped. > > > > > > > > > > BTW blk_mq_stop_hw_queue or blk_mq_stop_hw_queues are supposed to be > > > > > used for throttling queue. > > > > > > > > I'm going to look at actually stopping the queue (using one of these > > > > interfaces). I didn't realize I wasn't actually stopping the queue. > > > > The intent was to do so. > > > > > > > > In speaking with Jens yesterday about freeze vs stop: it is clear that > > > > dm-rq needs to still be able to allocate new requests, but _not_ call > > > > the queue_rq to issue the requests, while "stopped" (due to dm-mpath > > > > potentially deferring retries of failed requests because of path failure > > > > while quiescing the queue during DM device suspend). But that freezing > > > > the queue goes too far because it won't allow such request allocation. > > > > > > Seems I'm damned if I do (stop) or damned if I don't (new reports of > > > requests completing after DM device suspend's > > > blk_mq_quiesce_queue()+dm_wait_for_completion()). > > > > request(but not new) completing is possible after blk_mq_quiesce_queue()+ > > dm_wait_for_completion, because blk_mq_rq_inflight() only checks INFLIGHT > > request. If all requests are marked as MQ_RQ_COMPLETE, blk_mq_rq_inflight() > > still may return false. However, MQ_RQ_COMPLETE is one transient state. > > > > So what does dm-rq expect from dm_wait_for_completion()? > > > > If it is just no new request entering dm_queue_rq(), there shouldn't be > > issue. > > > > If dm-rq hopes there aren't any real inflight request(MQ_RQ_COMPLETE & > > MQ_RQ_INFLIGHT), we can change blk_mq_rq_inflight to support that. > > Hi Mike, > > Please test the following patch and see if the issue can be fixed: > > From faf0f9f15627446e8c35db518e37a4a2e4323eb2 Mon Sep 17 00:00:00 2001 > From: Ming Lei <ming.lei@redhat.com> > Date: Sat, 20 Jun 2020 06:45:49 +0800 > Subject: [PATCH] blk-mq: cover request of MQ_RQ_COMPLETE as inflight in > blk_mq_rq_inflight > > When request is marked as MQ_RQ_COMPLETE, ->complete isn't called & done > yet, and driver may expect that there isn't any driver related activity since > blk_mq_queue_inflight() returns. > > Fixes it by covering request of MQ_RQ_COMPLETE as inflight in blk_mq_rq_inflight(). > > Cc: Mike Snitzer <snitzer@redhat.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-mq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 4f57d27bfa73..7bc084b5bc37 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -831,7 +831,7 @@ static bool blk_mq_rq_inflight(struct blk_mq_hw_ctx *hctx, struct request *rq, > * If we find a request that is inflight and the queue matches, > * we know the queue is busy. Return false to stop the iteration. > */ > - if (rq->state == MQ_RQ_IN_FLIGHT && rq->q == hctx->queue) { > + if (rq->state != MQ_RQ_IDLE && rq->q == hctx->queue) { > bool *busy = priv; > > *busy = true; > -- > 2.25.2 > I was going to ask if being more explit would be better: diff --git a/block/blk-mq.c b/block/blk-mq.c index 4f57d27bfa73..96816ce57eb1 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -828,10 +828,11 @@ static bool blk_mq_rq_inflight(struct blk_mq_hw_ctx *hctx, struct request *rq, void *priv, bool reserved) { /* - * If we find a request that is inflight and the queue matches, + * If we find a request that is inflight or complete and the queue matches, * we know the queue is busy. Return false to stop the iteration. */ - if (rq->state == MQ_RQ_IN_FLIGHT && rq->q == hctx->queue) { + if ((rq->state == MQ_RQ_IN_FLIGHT || rq->state == MQ_RQ_COMPLETE) && + rq->q == hctx->queue) { bool *busy = priv; *busy = true; But is your patch more forgiving of any future blk-mq states that might also consistitute outstanding work? Seems likely. Thanks, Mike
On Fri, Jun 19, 2020 at 07:04:05PM -0400, Mike Snitzer wrote: > On Fri, Jun 19 2020 at 6:52pm -0400, > Ming Lei <ming.lei@redhat.com> wrote: > > > On Sat, Jun 20, 2020 at 06:37:44AM +0800, Ming Lei wrote: > > > On Fri, Jun 19, 2020 at 01:40:41PM -0400, Mike Snitzer wrote: > > > > On Fri, Jun 19 2020 at 12:06pm -0400, > > > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > > > > > On Fri, Jun 19 2020 at 6:11am -0400, > > > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > On Fri, Jun 19, 2020 at 05:42:50AM -0400, Mike Snitzer wrote: > > > > > > > Hi Ming, > > > > > > > > > > > > > > Thanks for the patch! But I'm having a hard time understanding what > > > > > > > you've written in the patch header, > > > > > > > > > > > > > > On Fri, Jun 19 2020 at 4:42am -0400, > > > > > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > > > > > > > > > dm-rq won't stop queue, meantime blk-mq won't stop one queue too, so > > > > > > > > remove the check. > > > > > > > > > > > > > > It'd be helpful if you could unpack this with more detail before going on > > > > > > > to explain why using blk_queue_quiesced, despite dm-rq using > > > > > > > blk_mq_queue_stopped, would also be ineffective. > > > > > > > > > > > > > > SO: > > > > > > > > > > > > > > > dm-rq won't stop queue > > > > > > > > > > > > > > 1) why won't dm-rq stop the queue? Do you mean it won't reliably > > > > > > > _always_ stop the queue because of the blk_mq_queue_stopped() check? > > > > > > > > > > > > device mapper doesn't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues. > > > > > > > > > > > > > > > > > > > > > meantime blk-mq won't stop one queue too, so remove the check. > > > > > > > > > > > > > > 2) Meaning?: blk_mq_queue_stopped() will return true even if only one hw > > > > > > > queue is stopped, given blk-mq must stop all hw queues a positive return > > > > > > > from this blk_mq_queue_stopped() check is incorrectly assuming it meanss > > > > > > > all hw queues are stopped. > > > > > > > > > > > > blk-mq won't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues for > > > > > > dm-rq's queue too, so dm-rq's hw queue won't be stopped. > > > > > > > > > > > > BTW blk_mq_stop_hw_queue or blk_mq_stop_hw_queues are supposed to be > > > > > > used for throttling queue. > > > > > > > > > > I'm going to look at actually stopping the queue (using one of these > > > > > interfaces). I didn't realize I wasn't actually stopping the queue. > > > > > The intent was to do so. > > > > > > > > > > In speaking with Jens yesterday about freeze vs stop: it is clear that > > > > > dm-rq needs to still be able to allocate new requests, but _not_ call > > > > > the queue_rq to issue the requests, while "stopped" (due to dm-mpath > > > > > potentially deferring retries of failed requests because of path failure > > > > > while quiescing the queue during DM device suspend). But that freezing > > > > > the queue goes too far because it won't allow such request allocation. > > > > > > > > Seems I'm damned if I do (stop) or damned if I don't (new reports of > > > > requests completing after DM device suspend's > > > > blk_mq_quiesce_queue()+dm_wait_for_completion()). > > > > > > request(but not new) completing is possible after blk_mq_quiesce_queue()+ > > > dm_wait_for_completion, because blk_mq_rq_inflight() only checks INFLIGHT > > > request. If all requests are marked as MQ_RQ_COMPLETE, blk_mq_rq_inflight() > > > still may return false. However, MQ_RQ_COMPLETE is one transient state. > > > > > > So what does dm-rq expect from dm_wait_for_completion()? > > > > > > If it is just no new request entering dm_queue_rq(), there shouldn't be > > > issue. > > > > > > If dm-rq hopes there aren't any real inflight request(MQ_RQ_COMPLETE & > > > MQ_RQ_INFLIGHT), we can change blk_mq_rq_inflight to support that. > > > > Hi Mike, > > > > Please test the following patch and see if the issue can be fixed: > > > > From faf0f9f15627446e8c35db518e37a4a2e4323eb2 Mon Sep 17 00:00:00 2001 > > From: Ming Lei <ming.lei@redhat.com> > > Date: Sat, 20 Jun 2020 06:45:49 +0800 > > Subject: [PATCH] blk-mq: cover request of MQ_RQ_COMPLETE as inflight in > > blk_mq_rq_inflight > > > > When request is marked as MQ_RQ_COMPLETE, ->complete isn't called & done > > yet, and driver may expect that there isn't any driver related activity since > > blk_mq_queue_inflight() returns. > > > > Fixes it by covering request of MQ_RQ_COMPLETE as inflight in blk_mq_rq_inflight(). > > > > Cc: Mike Snitzer <snitzer@redhat.com> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-mq.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index 4f57d27bfa73..7bc084b5bc37 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -831,7 +831,7 @@ static bool blk_mq_rq_inflight(struct blk_mq_hw_ctx *hctx, struct request *rq, > > * If we find a request that is inflight and the queue matches, > > * we know the queue is busy. Return false to stop the iteration. > > */ > > - if (rq->state == MQ_RQ_IN_FLIGHT && rq->q == hctx->queue) { > > + if (rq->state != MQ_RQ_IDLE && rq->q == hctx->queue) { > > bool *busy = priv; > > > > *busy = true; > > -- > > 2.25.2 > > > > I was going to ask if being more explit would be better: > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 4f57d27bfa73..96816ce57eb1 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -828,10 +828,11 @@ static bool blk_mq_rq_inflight(struct blk_mq_hw_ctx *hctx, struct request *rq, > void *priv, bool reserved) > { > /* > - * If we find a request that is inflight and the queue matches, > + * If we find a request that is inflight or complete and the queue matches, > * we know the queue is busy. Return false to stop the iteration. > */ > - if (rq->state == MQ_RQ_IN_FLIGHT && rq->q == hctx->queue) { > + if ((rq->state == MQ_RQ_IN_FLIGHT || rq->state == MQ_RQ_COMPLETE) && > + rq->q == hctx->queue) { > bool *busy = priv; > > *busy = true; > > But is your patch more forgiving of any future blk-mq states that might > also consistitute outstanding work? Seems likely. I am fine with either way since it is called in slow path. But what matters is if it can fix this issue? Thanks, Ming
On Sat, Jun 20, 2020 at 07:14:51AM +0800, Ming Lei wrote: > On Fri, Jun 19, 2020 at 07:04:05PM -0400, Mike Snitzer wrote: > > On Fri, Jun 19 2020 at 6:52pm -0400, > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > On Sat, Jun 20, 2020 at 06:37:44AM +0800, Ming Lei wrote: > > > > On Fri, Jun 19, 2020 at 01:40:41PM -0400, Mike Snitzer wrote: > > > > > On Fri, Jun 19 2020 at 12:06pm -0400, > > > > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > > > > > > > On Fri, Jun 19 2020 at 6:11am -0400, > > > > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > On Fri, Jun 19, 2020 at 05:42:50AM -0400, Mike Snitzer wrote: > > > > > > > > Hi Ming, > > > > > > > > > > > > > > > > Thanks for the patch! But I'm having a hard time understanding what > > > > > > > > you've written in the patch header, > > > > > > > > > > > > > > > > On Fri, Jun 19 2020 at 4:42am -0400, > > > > > > > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > > > > > > > > > > > > > dm-rq won't stop queue, meantime blk-mq won't stop one queue too, so > > > > > > > > > remove the check. > > > > > > > > > > > > > > > > It'd be helpful if you could unpack this with more detail before going on > > > > > > > > to explain why using blk_queue_quiesced, despite dm-rq using > > > > > > > > blk_mq_queue_stopped, would also be ineffective. > > > > > > > > > > > > > > > > SO: > > > > > > > > > > > > > > > > > dm-rq won't stop queue > > > > > > > > > > > > > > > > 1) why won't dm-rq stop the queue? Do you mean it won't reliably > > > > > > > > _always_ stop the queue because of the blk_mq_queue_stopped() check? > > > > > > > > > > > > > > device mapper doesn't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues. > > > > > > > > > > > > > > > > > > > > > > > > meantime blk-mq won't stop one queue too, so remove the check. > > > > > > > > > > > > > > > > 2) Meaning?: blk_mq_queue_stopped() will return true even if only one hw > > > > > > > > queue is stopped, given blk-mq must stop all hw queues a positive return > > > > > > > > from this blk_mq_queue_stopped() check is incorrectly assuming it meanss > > > > > > > > all hw queues are stopped. > > > > > > > > > > > > > > blk-mq won't call blk_mq_stop_hw_queue or blk_mq_stop_hw_queues for > > > > > > > dm-rq's queue too, so dm-rq's hw queue won't be stopped. > > > > > > > > > > > > > > BTW blk_mq_stop_hw_queue or blk_mq_stop_hw_queues are supposed to be > > > > > > > used for throttling queue. > > > > > > > > > > > > I'm going to look at actually stopping the queue (using one of these > > > > > > interfaces). I didn't realize I wasn't actually stopping the queue. > > > > > > The intent was to do so. > > > > > > > > > > > > In speaking with Jens yesterday about freeze vs stop: it is clear that > > > > > > dm-rq needs to still be able to allocate new requests, but _not_ call > > > > > > the queue_rq to issue the requests, while "stopped" (due to dm-mpath > > > > > > potentially deferring retries of failed requests because of path failure > > > > > > while quiescing the queue during DM device suspend). But that freezing > > > > > > the queue goes too far because it won't allow such request allocation. > > > > > > > > > > Seems I'm damned if I do (stop) or damned if I don't (new reports of > > > > > requests completing after DM device suspend's > > > > > blk_mq_quiesce_queue()+dm_wait_for_completion()). > > > > > > > > request(but not new) completing is possible after blk_mq_quiesce_queue()+ > > > > dm_wait_for_completion, because blk_mq_rq_inflight() only checks INFLIGHT > > > > request. If all requests are marked as MQ_RQ_COMPLETE, blk_mq_rq_inflight() > > > > still may return false. However, MQ_RQ_COMPLETE is one transient state. > > > > > > > > So what does dm-rq expect from dm_wait_for_completion()? > > > > > > > > If it is just no new request entering dm_queue_rq(), there shouldn't be > > > > issue. > > > > > > > > If dm-rq hopes there aren't any real inflight request(MQ_RQ_COMPLETE & > > > > MQ_RQ_INFLIGHT), we can change blk_mq_rq_inflight to support that. > > > > > > Hi Mike, > > > > > > Please test the following patch and see if the issue can be fixed: > > > > > > From faf0f9f15627446e8c35db518e37a4a2e4323eb2 Mon Sep 17 00:00:00 2001 > > > From: Ming Lei <ming.lei@redhat.com> > > > Date: Sat, 20 Jun 2020 06:45:49 +0800 > > > Subject: [PATCH] blk-mq: cover request of MQ_RQ_COMPLETE as inflight in > > > blk_mq_rq_inflight > > > > > > When request is marked as MQ_RQ_COMPLETE, ->complete isn't called & done > > > yet, and driver may expect that there isn't any driver related activity since > > > blk_mq_queue_inflight() returns. > > > > > > Fixes it by covering request of MQ_RQ_COMPLETE as inflight in blk_mq_rq_inflight(). > > > > > > Cc: Mike Snitzer <snitzer@redhat.com> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > > --- > > > block/blk-mq.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > > index 4f57d27bfa73..7bc084b5bc37 100644 > > > --- a/block/blk-mq.c > > > +++ b/block/blk-mq.c > > > @@ -831,7 +831,7 @@ static bool blk_mq_rq_inflight(struct blk_mq_hw_ctx *hctx, struct request *rq, > > > * If we find a request that is inflight and the queue matches, > > > * we know the queue is busy. Return false to stop the iteration. > > > */ > > > - if (rq->state == MQ_RQ_IN_FLIGHT && rq->q == hctx->queue) { > > > + if (rq->state != MQ_RQ_IDLE && rq->q == hctx->queue) { > > > bool *busy = priv; > > > > > > *busy = true; > > > -- > > > 2.25.2 > > > > > > > I was going to ask if being more explit would be better: > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index 4f57d27bfa73..96816ce57eb1 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -828,10 +828,11 @@ static bool blk_mq_rq_inflight(struct blk_mq_hw_ctx *hctx, struct request *rq, > > void *priv, bool reserved) > > { > > /* > > - * If we find a request that is inflight and the queue matches, > > + * If we find a request that is inflight or complete and the queue matches, > > * we know the queue is busy. Return false to stop the iteration. > > */ > > - if (rq->state == MQ_RQ_IN_FLIGHT && rq->q == hctx->queue) { > > + if ((rq->state == MQ_RQ_IN_FLIGHT || rq->state == MQ_RQ_COMPLETE) && > > + rq->q == hctx->queue) { > > bool *busy = priv; > > > > *busy = true; > > > > But is your patch more forgiving of any future blk-mq states that might > > also consistitute outstanding work? Seems likely. > > I am fine with either way since it is called in slow path. BTW, another candidate is to use blk_mq_request_started(). Thanks, Ming
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index f60c02512121..ed4d5ea66ccc 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -70,9 +70,6 @@ void dm_start_queue(struct request_queue *q) void dm_stop_queue(struct request_queue *q) { - if (blk_mq_queue_stopped(q)) - return; - blk_mq_quiesce_queue(q); }
dm-rq won't stop queue, meantime blk-mq won't stop one queue too, so remove the check. dm_stop_queue() actually tries to quiesce hw queues via blk_mq_quiesce_queue(), we can't check via blk_queue_quiesced for avoiding unnecessary queue quiesce because the flag is set before synchronize_rcu() and dm_stop_queue may be called when synchronize_rcu from another blk_mq_quiesce_queue is in-progress. Cc: linux-block@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/md/dm-rq.c | 3 --- 1 file changed, 3 deletions(-)