diff mbox series

scsi: core: only re-run queue in scsi_end_request() if device queue is busy

Message ID 20191117080818.2664-1-ming.lei@redhat.com (mailing list archive)
State Superseded
Headers show
Series scsi: core: only re-run queue in scsi_end_request() if device queue is busy | expand

Commit Message

Ming Lei Nov. 17, 2019, 8:08 a.m. UTC
Now the requeue queue is run in scsi_end_request() unconditionally if both
target queue and host queue is ready. We should have re-run request queue
only after this device queue becomes busy for restarting this LUN only.

Recently Long Li reported that cost of run queue may be very heavy in
case of high queue depth. So improve this situation by only running
requesut queue when this LUN is busy.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-block@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c    | 29 +++++++++++++++++++++++++++--
 include/scsi/scsi_device.h |  1 +
 2 files changed, 28 insertions(+), 2 deletions(-)

Comments

Damien Le Moal Nov. 18, 2019, 12:18 a.m. UTC | #1
On 2019/11/17 17:08, Ming Lei wrote:
> Now the requeue queue is run in scsi_end_request() unconditionally if both
> target queue and host queue is ready. We should have re-run request queue
> only after this device queue becomes busy for restarting this LUN only.
> 
> Recently Long Li reported that cost of run queue may be very heavy in
> case of high queue depth. So improve this situation by only running
> requesut queue when this LUN is busy.

s/requesut/request

Also, shouldn't this patch have the tag:

Reported-by: Long Li <longli@microsoft.com>

?

Another remark is that Long's approach is generic to the block layer
while your patch here is scsi specific. I wonder if the same problem
cannot happen with other drivers too ?

> 
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Ewan D. Milne <emilne@redhat.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Long Li <longli@microsoft.com>
> Cc: linux-block@vger.kernel.org
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/scsi_lib.c    | 29 +++++++++++++++++++++++++++--
>  include/scsi/scsi_device.h |  1 +
>  2 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 379533ce8661..212903d5f43c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -612,7 +612,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
>  	if (scsi_target(sdev)->single_lun ||
>  	    !list_empty(&sdev->host->starved_list))
>  		kblockd_schedule_work(&sdev->requeue_work);
> -	else
> +	else if (READ_ONCE(sdev->restart))
>  		blk_mq_run_hw_queues(q, true);
>  
>  	percpu_ref_put(&q->q_usage_counter);
> @@ -1632,8 +1632,33 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
>  	struct request_queue *q = hctx->queue;
>  	struct scsi_device *sdev = q->queuedata;
>  
> -	if (scsi_dev_queue_ready(q, sdev))
> +	if (scsi_dev_queue_ready(q, sdev)) {
> +		WRITE_ONCE(sdev->restart, 0);
>  		return true;
> +	}
> +
> +	/*
> +	 * If all in-flight requests originated from this LUN are completed
> +	 * before setting .restart, sdev->device_busy will be observed as
> +	 * zero, then blk_mq_delay_run_hw_queue() will dispatch this request
> +	 * soon. Otherwise, completion of one of these request will observe
> +	 * the .restart flag, and the request queue will be run for handling
> +	 * this request, see scsi_end_request().
> +	 *
> +	 * However, the .restart flag may be cleared from other dispatch code
> +	 * path after one inflight request is completed, then:
> +	 *
> +	 * 1) if this reqquest is dispatched from scheduler queue or sw queue one
> +	 * by one, this request will be handled in that dispatch path too given
> +	 * the request still stays at scheduler/sw queue when calling .get_budget()
> +	 * callback.
> +	 *
> +	 * 2) if this request is dispatched from hctx->dispatch or
> +	 * blk_mq_flush_busy_ctxs(), this request will be put into hctx->dispatch
> +	 * list soon, and blk-mq will be responsible for covering it, see
> +	 * blk_mq_dispatch_rq_list().
> +	 */
> +	WRITE_ONCE(sdev->restart, 1);
>  
>  	if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
>  		blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 202f4d6a4342..9d8ca662ae86 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -109,6 +109,7 @@ struct scsi_device {
>  	atomic_t device_busy;		/* commands actually active on LLDD */
>  	atomic_t device_blocked;	/* Device returned QUEUE_FULL. */
>  
> +	unsigned int restart;
>  	spinlock_t list_lock;
>  	struct list_head cmd_list;	/* queue of in use SCSI Command structures */
>  	struct list_head starved_entry;
>
Ming Lei Nov. 18, 2019, 12:47 a.m. UTC | #2
Hi Damine,

On Mon, Nov 18, 2019 at 12:18:48AM +0000, Damien Le Moal wrote:
> On 2019/11/17 17:08, Ming Lei wrote:
> > Now the requeue queue is run in scsi_end_request() unconditionally if both
> > target queue and host queue is ready. We should have re-run request queue
> > only after this device queue becomes busy for restarting this LUN only.
> > 
> > Recently Long Li reported that cost of run queue may be very heavy in
> > case of high queue depth. So improve this situation by only running
> > requesut queue when this LUN is busy.
> 
> s/requesut/request
> 
> Also, shouldn't this patch have the tag:
> 
> Reported-by: Long Li <longli@microsoft.com>
> 
> ?

Will do that in V2.

> 
> Another remark is that Long's approach is generic to the block layer
> while your patch here is scsi specific. I wonder if the same problem
> cannot happen with other drivers too ?

Not sure what you meant about the same problem.

It is definitely bad to unconditionally call blk_mq_run_hw_queues()
in driver's IO completion handler from performance viewpoint.

This patch simply addresses this SCSI specific issue, since blk_mq_run_hw_queues()
shouldn't be called from scsi_end_request() when the device queue isn't busy.

If other driver has such kind of issue, I believe it should have been fixed in
driver too.

So my patch isn't contradictory with Long's patch which improves generic blk-mq's
run queue.

Thanks,
Ming
Damien Le Moal Nov. 18, 2019, 1:10 a.m. UTC | #3
On 2019/11/18 9:47, Ming Lei wrote:
> Hi Damine,
> 
> On Mon, Nov 18, 2019 at 12:18:48AM +0000, Damien Le Moal wrote:
>> On 2019/11/17 17:08, Ming Lei wrote:
>>> Now the requeue queue is run in scsi_end_request() unconditionally if both
>>> target queue and host queue is ready. We should have re-run request queue
>>> only after this device queue becomes busy for restarting this LUN only.
>>>
>>> Recently Long Li reported that cost of run queue may be very heavy in
>>> case of high queue depth. So improve this situation by only running
>>> requesut queue when this LUN is busy.
>>
>> s/requesut/request
>>
>> Also, shouldn't this patch have the tag:
>>
>> Reported-by: Long Li <longli@microsoft.com>
>>
>> ?
> 
> Will do that in V2.
> 
>>
>> Another remark is that Long's approach is generic to the block layer
>> while your patch here is scsi specific. I wonder if the same problem
>> cannot happen with other drivers too ?
> 
> Not sure what you meant about the same problem.
> 
> It is definitely bad to unconditionally call blk_mq_run_hw_queues()
> in driver's IO completion handler from performance viewpoint.
> 
> This patch simply addresses this SCSI specific issue, since blk_mq_run_hw_queues()
> shouldn't be called from scsi_end_request() when the device queue isn't busy.
> 
> If other driver has such kind of issue, I believe it should have been fixed in
> driver too.
> 
> So my patch isn't contradictory with Long's patch which improves generic blk-mq's
> run queue.

OK. Thanks.

> 
> Thanks,
> Ming
> 
>
Jens Axboe Nov. 18, 2019, 2:46 a.m. UTC | #4
On 11/17/19 5:18 PM, Damien Le Moal wrote:
> On 2019/11/17 17:08, Ming Lei wrote:
>> Now the requeue queue is run in scsi_end_request() unconditionally if both
>> target queue and host queue is ready. We should have re-run request queue
>> only after this device queue becomes busy for restarting this LUN only.
>>
>> Recently Long Li reported that cost of run queue may be very heavy in
>> case of high queue depth. So improve this situation by only running
>> requesut queue when this LUN is busy.
> 
> s/requesut/request
> 
> Also, shouldn't this patch have the tag:
> 
> Reported-by: Long Li <longli@microsoft.com>
> 
> ?
> 
> Another remark is that Long's approach is generic to the block layer
> while your patch here is scsi specific. I wonder if the same problem
> cannot happen with other drivers too ?

The block layer is just doing what it's told, and I doubt many drivers
would have the crazy kind of re-run logic that the SCSI midlayer does.
As far as I'm concerned, the fix belongs on the SCSI side of things
instead of being papered over on the block side.
Bart Van Assche Nov. 18, 2019, 5:30 a.m. UTC | #5
On 2019-11-17 00:08, Ming Lei wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 379533ce8661..212903d5f43c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -612,7 +612,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
>  	if (scsi_target(sdev)->single_lun ||
>  	    !list_empty(&sdev->host->starved_list))
>  		kblockd_schedule_work(&sdev->requeue_work);
> -	else
> +	else if (READ_ONCE(sdev->restart))
>  		blk_mq_run_hw_queues(q, true);
>  

Rerunning the hardware queues is not only necessary after
scsi_dev_queue_ready() returns false but also after .queuecommand()
returns SCSI_MLQUEUE_*_BUSY. Can this patch cause queue stalls if
.queuecommand() returns SCSI_MLQUEUE_*_BUSY?

Thanks,

Bart.
Ming Lei Nov. 18, 2019, 7:08 a.m. UTC | #6
On Sun, Nov 17, 2019 at 09:30:39PM -0800, Bart Van Assche wrote:
> On 2019-11-17 00:08, Ming Lei wrote:
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 379533ce8661..212903d5f43c 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -612,7 +612,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
> >  	if (scsi_target(sdev)->single_lun ||
> >  	    !list_empty(&sdev->host->starved_list))
> >  		kblockd_schedule_work(&sdev->requeue_work);
> > -	else
> > +	else if (READ_ONCE(sdev->restart))
> >  		blk_mq_run_hw_queues(q, true);
> >  
> 
> Rerunning the hardware queues is not only necessary after
> scsi_dev_queue_ready() returns false but also after .queuecommand()
> returns SCSI_MLQUEUE_*_BUSY. Can this patch cause queue stalls if
> .queuecommand() returns SCSI_MLQUEUE_*_BUSY?

No, that isn't why blk_mq_run_hw_queues is called in scsi_end_request(),
and you should see that it is just this LUN to be re-run.

Also if .queuecommand() returns any non-zero value, BLK_STS_RESOURCE
will be returned to blk-mq, then blk-mq will cover the re-run.

thanks,
Ming
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 379533ce8661..212903d5f43c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -612,7 +612,7 @@  static bool scsi_end_request(struct request *req, blk_status_t error,
 	if (scsi_target(sdev)->single_lun ||
 	    !list_empty(&sdev->host->starved_list))
 		kblockd_schedule_work(&sdev->requeue_work);
-	else
+	else if (READ_ONCE(sdev->restart))
 		blk_mq_run_hw_queues(q, true);
 
 	percpu_ref_put(&q->q_usage_counter);
@@ -1632,8 +1632,33 @@  static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
 	struct request_queue *q = hctx->queue;
 	struct scsi_device *sdev = q->queuedata;
 
-	if (scsi_dev_queue_ready(q, sdev))
+	if (scsi_dev_queue_ready(q, sdev)) {
+		WRITE_ONCE(sdev->restart, 0);
 		return true;
+	}
+
+	/*
+	 * If all in-flight requests originated from this LUN are completed
+	 * before setting .restart, sdev->device_busy will be observed as
+	 * zero, then blk_mq_delay_run_hw_queue() will dispatch this request
+	 * soon. Otherwise, completion of one of these request will observe
+	 * the .restart flag, and the request queue will be run for handling
+	 * this request, see scsi_end_request().
+	 *
+	 * However, the .restart flag may be cleared from other dispatch code
+	 * path after one inflight request is completed, then:
+	 *
+	 * 1) if this reqquest is dispatched from scheduler queue or sw queue one
+	 * by one, this request will be handled in that dispatch path too given
+	 * the request still stays at scheduler/sw queue when calling .get_budget()
+	 * callback.
+	 *
+	 * 2) if this request is dispatched from hctx->dispatch or
+	 * blk_mq_flush_busy_ctxs(), this request will be put into hctx->dispatch
+	 * list soon, and blk-mq will be responsible for covering it, see
+	 * blk_mq_dispatch_rq_list().
+	 */
+	WRITE_ONCE(sdev->restart, 1);
 
 	if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
 		blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 202f4d6a4342..9d8ca662ae86 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -109,6 +109,7 @@  struct scsi_device {
 	atomic_t device_busy;		/* commands actually active on LLDD */
 	atomic_t device_blocked;	/* Device returned QUEUE_FULL. */
 
+	unsigned int restart;
 	spinlock_t list_lock;
 	struct list_head cmd_list;	/* queue of in use SCSI Command structures */
 	struct list_head starved_entry;