diff mbox series

[10/18] scsi: implement reserved command handling

Message ID 20210503150333.130310-11-hare@suse.de (mailing list archive)
State Changes Requested
Headers show
Series scsi: enabled reserved commands for LLDDs | expand

Commit Message

Hannes Reinecke May 3, 2021, 3:03 p.m. UTC
Quite some drivers are using management commands internally, which
typically use the same hardware tag pool (ie they are being allocated
from the same hardware resources) as the 'normal' I/O commands.
These commands are set aside before allocating the block-mq tag bitmap,
so they'll never show up as busy in the tag map.
The block-layer, OTOH, already has 'reserved_tags' to handle precisely
this situation.
So this patch adds a new field 'nr_reserved_cmds' to the SCSI host
template to instruct the block layer to set aside a tag space for these
management commands by using reserved tags.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hosts.c      |  3 +++
 drivers/scsi/scsi_lib.c   | 10 +++++++++-
 drivers/scsi/scsi_sysfs.c |  2 ++
 include/scsi/scsi_host.h  | 22 +++++++++++++++++++++-
 4 files changed, 35 insertions(+), 2 deletions(-)

Comments

Bart Van Assche May 4, 2021, 3:20 a.m. UTC | #1
On 5/3/21 8:03 AM, Hannes Reinecke wrote:
> These commands are set aside before allocating the block-mq tag bitmap,
> so they'll never show up as busy in the tag map.

That doesn't sound correct to me. Should the above perhaps be changed
into "blk_mq_start_request() is never called for internal commands so
they'll never show up as busy in the tag map"?

Thanks,

Bart.
Hannes Reinecke May 4, 2021, 6:17 a.m. UTC | #2
On 5/4/21 5:20 AM, Bart Van Assche wrote:
> On 5/3/21 8:03 AM, Hannes Reinecke wrote:
>> These commands are set aside before allocating the block-mq tag bitmap,
>> so they'll never show up as busy in the tag map.
> 
> That doesn't sound correct to me. Should the above perhaps be changed
> into "blk_mq_start_request() is never called for internal commands so
> they'll never show up as busy in the tag map"?
> 
Yes, will do.

Cheers,

Hannes
John Garry May 4, 2021, 10:55 a.m. UTC | #3
On 04/05/2021 07:17, Hannes Reinecke wrote:
> On 5/4/21 5:20 AM, Bart Van Assche wrote:
>> On 5/3/21 8:03 AM, Hannes Reinecke wrote:
>>> These commands are set aside before allocating the block-mq tag bitmap,
>>> so they'll never show up as busy in the tag map.
>>
>> That doesn't sound correct to me. Should the above perhaps be changed
>> into "blk_mq_start_request() is never called for internal commands so
>> they'll never show up as busy in the tag map"?
>>
> Yes, will do.

So why don't these - or shouldn't these - turn up in the busy tag map?

One of the motivations to use these block requests for internal commands 
is that we can take advantage of the block layer handling for CPU 
hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight, then 
they would be missed in blk_mq_hctx_notify_offline() -> 
blk_mq_hctx_has_requests(), right? And who knows what else...

Thanks,
John
Hannes Reinecke May 4, 2021, 1:12 p.m. UTC | #4
On 5/4/21 12:55 PM, John Garry wrote:
> On 04/05/2021 07:17, Hannes Reinecke wrote:
>> On 5/4/21 5:20 AM, Bart Van Assche wrote:
>>> On 5/3/21 8:03 AM, Hannes Reinecke wrote:
>>>> These commands are set aside before allocating the block-mq tag bitmap,
>>>> so they'll never show up as busy in the tag map.
>>>
>>> That doesn't sound correct to me. Should the above perhaps be changed
>>> into "blk_mq_start_request() is never called for internal commands so
>>> they'll never show up as busy in the tag map"?
>>>
>> Yes, will do.
> 
> So why don't these - or shouldn't these - turn up in the busy tag map?
> 
> One of the motivations to use these block requests for internal commands 
> is that we can take advantage of the block layer handling for CPU 
> hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight, then 
> they would be missed in blk_mq_hctx_notify_offline() -> 
> blk_mq_hctx_has_requests(), right? And who knows what else...
> 
Oh, but of course it's possible to call 'start' on these requests to 
have them counted in the busy map.
I just didn't see the need for it until now, that's all.

Cheers,

Hannes
Bart Van Assche May 4, 2021, 4:59 p.m. UTC | #5
On 5/4/21 6:12 AM, Hannes Reinecke wrote:
> On 5/4/21 12:55 PM, John Garry wrote:
>> On 04/05/2021 07:17, Hannes Reinecke wrote:
>>> On 5/4/21 5:20 AM, Bart Van Assche wrote:
>>>> On 5/3/21 8:03 AM, Hannes Reinecke wrote:
>>>>> These commands are set aside before allocating the block-mq tag
>>>>> bitmap,
>>>>> so they'll never show up as busy in the tag map.
>>>>
>>>> That doesn't sound correct to me. Should the above perhaps be changed
>>>> into "blk_mq_start_request() is never called for internal commands so
>>>> they'll never show up as busy in the tag map"?
>>>>
>>> Yes, will do.
>>
>> So why don't these - or shouldn't these - turn up in the busy tag map?
>>
>> One of the motivations to use these block requests for internal
>> commands is that we can take advantage of the block layer handling for
>> CPU hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight,
>> then they would be missed in blk_mq_hctx_notify_offline() ->
>> blk_mq_hctx_has_requests(), right? And who knows what else...
>>
> Oh, but of course it's possible to call 'start' on these requests to
> have them counted in the busy map.
> I just didn't see the need for it until now, that's all.

This is possible but this will require careful review of at least the
following code paths such that nothing unexpected happens for internal
commands:
* The SCSI timeout code.
* All blk_mq_tagset_busy_iter() and scsi_host_busy_iter() callers. As an
example, scsi_host_busy() must not include LLD-internal commands.

Thanks,

Bart.
Hannes Reinecke May 4, 2021, 6:09 p.m. UTC | #6
On 5/4/21 6:59 PM, Bart Van Assche wrote:
> On 5/4/21 6:12 AM, Hannes Reinecke wrote:
>> On 5/4/21 12:55 PM, John Garry wrote:
>>> On 04/05/2021 07:17, Hannes Reinecke wrote:
>>>> On 5/4/21 5:20 AM, Bart Van Assche wrote:
>>>>> On 5/3/21 8:03 AM, Hannes Reinecke wrote:
>>>>>> These commands are set aside before allocating the block-mq tag
>>>>>> bitmap,
>>>>>> so they'll never show up as busy in the tag map.
>>>>>
>>>>> That doesn't sound correct to me. Should the above perhaps be changed
>>>>> into "blk_mq_start_request() is never called for internal commands so
>>>>> they'll never show up as busy in the tag map"?
>>>>>
>>>> Yes, will do.
>>>
>>> So why don't these - or shouldn't these - turn up in the busy tag map?
>>>
>>> One of the motivations to use these block requests for internal
>>> commands is that we can take advantage of the block layer handling for
>>> CPU hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight,
>>> then they would be missed in blk_mq_hctx_notify_offline() ->
>>> blk_mq_hctx_has_requests(), right? And who knows what else...
>>>
>> Oh, but of course it's possible to call 'start' on these requests to
>> have them counted in the busy map.
>> I just didn't see the need for it until now, that's all.
> 
> This is possible but this will require careful review of at least the
> following code paths such that nothing unexpected happens for internal
> commands:
> * The SCSI timeout code.
> * All blk_mq_tagset_busy_iter() and scsi_host_busy_iter() callers. As an
> example, scsi_host_busy() must not include LLD-internal commands.
> 
Oh, _that_ is easy. These are reserved commands, which will have the
last bool argument to the iter functions set to 'true'.

 bool (*fn)(struct scsi_cmnd *, void *, bool)

So we just need to return from the iter if the last argument is true.

Cheers,

Hannes
Bart Van Assche May 5, 2021, 12:45 a.m. UTC | #7
On 5/3/21 8:03 AM, Hannes Reinecke wrote:
>  struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
>  	unsigned int op, blk_mq_req_flags_t flags)
> @@ -2005,6 +2009,10 @@ struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
>  
>  	WARN_ON_ONCE(((op & REQ_OP_MASK) != REQ_OP_SCSI_IN) &&
>  		     ((op & REQ_OP_MASK) != REQ_OP_SCSI_OUT));
> +
> +	if (sdev->host->nr_reserved_cmds)
> +		flags |= BLK_MQ_REQ_RESERVED;
> +
>  	rq = blk_mq_alloc_request(sdev->request_queue, op, flags);
>  	if (IS_ERR(rq))
>  		return NULL;

Can the if-statement be removed such that scsi_get_internal_cmd() fails
if sdev->host->nr_reserved_cmds == 0? I'm concerned that otherwise it
will be very hard to determine which requests are internal and which
ones not from inside a blk_mq_tagset_busy_iter() callback.

Thanks,

Bart.
Hannes Reinecke May 5, 2021, 5:56 a.m. UTC | #8
On 5/5/21 2:45 AM, Bart Van Assche wrote:
> On 5/3/21 8:03 AM, Hannes Reinecke wrote:
>>  struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
>>  	unsigned int op, blk_mq_req_flags_t flags)
>> @@ -2005,6 +2009,10 @@ struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
>>  
>>  	WARN_ON_ONCE(((op & REQ_OP_MASK) != REQ_OP_SCSI_IN) &&
>>  		     ((op & REQ_OP_MASK) != REQ_OP_SCSI_OUT));
>> +
>> +	if (sdev->host->nr_reserved_cmds)
>> +		flags |= BLK_MQ_REQ_RESERVED;
>> +
>>  	rq = blk_mq_alloc_request(sdev->request_queue, op, flags);
>>  	if (IS_ERR(rq))
>>  		return NULL;
> 
> Can the if-statement be removed such that scsi_get_internal_cmd() fails
> if sdev->host->nr_reserved_cmds == 0? I'm concerned that otherwise it
> will be very hard to determine which requests are internal and which
> ones not from inside a blk_mq_tagset_busy_iter() callback.
> 
Original idea was that one could use scsi_get_internal_cmd() even with
nr_reserved_cmds == 0, but you are right that this will probably just
lead to confusion.

Will be modifying it for the next round.

Cheers,

Hannes
diff mbox series

Patch

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 2f162603876f..661ed7696562 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -469,6 +469,9 @@  struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	if (sht->virt_boundary_mask)
 		shost->virt_boundary_mask = sht->virt_boundary_mask;
 
+	if (sht->nr_reserved_cmds)
+		shost->nr_reserved_cmds = sht->nr_reserved_cmds;
+
 	device_initialize(&shost->shost_gendev);
 	dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
 	shost->shost_gendev.bus = &scsi_bus_type;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f83b04e49bae..3c83b0fabefb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1971,7 +1971,8 @@  int scsi_mq_setup_tags(struct Scsi_Host *shost)
 		tag_set->ops = &scsi_mq_ops_no_commit;
 	tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1;
 	tag_set->nr_maps = shost->nr_maps ? : 1;
-	tag_set->queue_depth = shost->can_queue;
+	tag_set->queue_depth = shost->can_queue + shost->nr_reserved_cmds;
+	tag_set->reserved_tags = shost->nr_reserved_cmds;
 	tag_set->cmd_size = cmd_size;
 	tag_set->numa_node = NUMA_NO_NODE;
 	tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
@@ -1996,6 +1997,9 @@  void scsi_mq_destroy_tags(struct Scsi_Host *shost)
  * @flags: BLK_MQ_REQ_* flags, e.g. BLK_MQ_REQ_NOWAIT.
  *
  * Allocates a SCSI command for internal LLDD use.
+ * If 'nr_reserved_commands' is specified by the host the
+ * command will be allocated from the reserved tag pool;
+ * otherwise the normal tag pool will be used.
  */
 struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
 	unsigned int op, blk_mq_req_flags_t flags)
@@ -2005,6 +2009,10 @@  struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
 
 	WARN_ON_ONCE(((op & REQ_OP_MASK) != REQ_OP_SCSI_IN) &&
 		     ((op & REQ_OP_MASK) != REQ_OP_SCSI_OUT));
+
+	if (sdev->host->nr_reserved_cmds)
+		flags |= BLK_MQ_REQ_RESERVED;
+
 	rq = blk_mq_alloc_request(sdev->request_queue, op, flags);
 	if (IS_ERR(rq))
 		return NULL;
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index d5260d1b7b38..f4119999a402 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -371,6 +371,7 @@  static DEVICE_ATTR(eh_deadline, S_IRUGO | S_IWUSR, show_shost_eh_deadline, store
 shost_rd_attr(unique_id, "%u\n");
 shost_rd_attr(cmd_per_lun, "%hd\n");
 shost_rd_attr(can_queue, "%d\n");
+shost_rd_attr(nr_reserved_cmds, "%d\n");
 shost_rd_attr(sg_tablesize, "%hu\n");
 shost_rd_attr(sg_prot_tablesize, "%hu\n");
 shost_rd_attr(unchecked_isa_dma, "%d\n");
@@ -422,6 +423,7 @@  static struct attribute *scsi_sysfs_shost_attrs[] = {
 	&dev_attr_host_reset.attr,
 	&dev_attr_eh_deadline.attr,
 	&dev_attr_nr_hw_queues.attr,
+	&dev_attr_nr_reserved_cmds.attr,
 	NULL
 };
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index f115150559ca..0831b33ee186 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -367,10 +367,19 @@  struct scsi_host_template {
 	/*
 	 * This determines if we will use a non-interrupt driven
 	 * or an interrupt driven scheme.  It is set to the maximum number
-	 * of simultaneous commands a single hw queue in HBA will accept.
+	 * of simultaneous commands a single hw queue in HBA will accept
+	 * excluding internal commands.
 	 */
 	int can_queue;
 
+	/*
+	 * This determines how many commands the HBA will set aside
+	 * for internal commands. This number will be added to
+	 * @can_queue to calculate the maximum number of simultaneous
+	 * commands sent to the host.
+	 */
+	int nr_reserved_cmds;
+
 	/*
 	 * In many instances, especially where disconnect / reconnect are
 	 * supported, our host also has an ID on the SCSI bus.  If this is
@@ -614,6 +623,11 @@  struct Scsi_Host {
 	unsigned short max_cmd_len;
 
 	int this_id;
+
+	/*
+	 * Number of commands this host can handle at the same time.
+	 * This excludes reserved commands as specified by nr_reserved_cmds.
+	 */
 	int can_queue;
 	short cmd_per_lun;
 	short unsigned int sg_tablesize;
@@ -632,6 +646,12 @@  struct Scsi_Host {
 	 */
 	unsigned nr_hw_queues;
 	unsigned nr_maps;
+
+	/*
+	 * Number of reserved commands to allocate, if any.
+	 */
+	unsigned nr_reserved_cmds;
+
 	unsigned active_mode:2;
 	unsigned unchecked_isa_dma:1;