[RFC,v3,22/41] block: implement persistent commands

Message ID	20200430131904.5847-23-hare@suse.de (mailing list archive)
State	Changes Requested
Headers	show Return-Path: <SRS0=Cu6i=6O=vger.kernel.org=linux-scsi-owner@kernel.org> From: Hannes Reinecke <hare@suse.de> To: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de>, James Bottomley <james.bottomley@hansenpartnership.com>, John Garry <john.garry@huawei.com>, Ming Lei <ming.lei@redhat.com>, Bart van Assche <bvanassche@acm.org>, linux-scsi@vger.kernel.org, Hannes Reinecke <hare@suse.de> Subject: [PATCH RFC v3 22/41] block: implement persistent commands Date: Thu, 30 Apr 2020 15:18:45 +0200 Message-Id: <20200430131904.5847-23-hare@suse.de> In-Reply-To: <20200430131904.5847-1-hare@suse.de> References: <20200430131904.5847-1-hare@suse.de> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk
Series	scsi: enable reserved commands for LLDDs \| expand [RFC,v3,00/41] scsi: enable reserved commands for LLDDs [RFC,v3,01/41] scsi: add 'nr_reserved_cmds' field to the SCSI host template [RFC,v3,02/41] scsi: add scsi_{get,put}_reserved_cmd() [RFC,v3,03/41] scsi: Implement scsi_cmd_is_reserved() [RFC,v3,04/41] csiostor: use reserved command for LUN reset [RFC,v3,05/41] scsi: add scsi_cmd_from_priv() [RFC,v3,06/41] virtio_scsi: use reserved commands for TMF [RFC,v3,07/41] fnic: use reserved commands [RFC,v3,08/41] fnic: use scsi_host_busy_iter() to traverse commands [RFC,v3,09/41] scsi: use real inquiry data when initialising devices [RFC,v3,10/41] scsi: make host device a first-class citizen [RFC,v3,11/41] hpsa: move hpsa_hba_inquiry after scsi_add_host() [RFC,v3,12/41] hpsa: use reserved commands [RFC,v3,13/41] hpsa: use scsi_host_busy_iter() to traverse outstanding commands [RFC,v3,14/41] hpsa: drop refcount field from CommandList [RFC,v3,15/41] aacraid: use private commands [RFC,v3,16/41] aacraid: use scsi_host_busy_iter() to traverse commands [RFC,v3,17/41] megaraid_sas: kill this_id and init_id [RFC,v3,18/41] megaraid_sas: use shost_priv() [RFC,v3,19/41] megaraid_sas: avoid using megaraid_lookup_instance() [RFC,v3,20/41] megaraid_sas: separate out megasas_set_max_sectors() [RFC,v3,21/41] megaraid_sas: megaraid_sas: reshuffle SCSI host allocation [RFC,v3,22/41] block: implement persistent commands [RFC,v3,23/41] scsi: add a 'persistent' argument to scsi_get_reserved_cmd() [RFC,v3,24/41] megaraid_sas: separate out megasas_prepare_aen() [RFC,v3,25/41] megaraid_sas: use reserved commands [RFC,v3,26/41] megaraid_sas_fusion: rearrange mfi and mpt frame pools [RFC,v3,27/41] megaraid_sas_fusion: sanitize command lookup [RFC,v3,28/41] megaraid_sas: use scsi_host_busy_iter to traverse outstanding commands [RFC,v3,29/41] snic: use reserved commands [RFC,v3,30/41] snic: use tagset iter for traversing commands [RFC,v3,31/41] mv_sas: kill mvsas_debug_issue_ssp_tmf() [RFC,v3,32/41] pm8001: kill pm8001_issue_ssp_tmf() [RFC,v3,33/41] pm8001: kill 'dev' argument from pm8001_exec_internal_task_abort() [RFC,v3,34/41] pm8001: use libsas-provided domain devices for SATA [RFC,v3,35/41] libsas: add SCSI target pointer to struct domain_device [RFC,v3,36/41] scsi: libsas,hisi_sas,mvsas,pm8001: Allocate Scsi_cmd for slow task [RFC,v3,37/41] libsas: add tag to struct sas_task [RFC,v3,38/41] scsi: hisi_sas: Use libsas slow task SCSI command [RFC,v3,39/41] hisi_sas: use task tag to reference the slot [RFC,v3,40/41] mv_sas: use reserved tags and drop private tag allocation [RFC,v3,41/41] pm8001: use block-layer tags for ccb allocation

Hannes Reinecke April 30, 2020, 1:18 p.m. UTC

Some LLDDs implement event handling by sending a command to the
firmware, which then will be completed once the firmware wants
to register an event.
So worst case a command is being sent to the firmware then the
driver initializes, and will be returned once the driver unloads.
To avoid these commands to block the queues during freezing or
quiescing this patch implements support for 'persistent' commands,
which will be excluded from blk_queue_enter() and blk_queue_exit()
calls.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c            | 12 +++++++++---
 include/linux/blk-mq.h    |  2 ++
 include/linux/blk_types.h |  4 ++++
 3 files changed, 15 insertions(+), 3 deletions(-)

Bart Van Assche May 1, 2020, 4:59 a.m. UTC | #1

On 2020-04-30 06:18, Hannes Reinecke wrote:
> Some LLDDs implement event handling by sending a command to the
> firmware, which then will be completed once the firmware wants
> to register an event.
     ^^^^^^^^
     report?

> So worst case a command is being sent to the firmware then the
                                                        ^^^^
                                                        when?
> driver initializes, and will be returned once the driver unloads.
> To avoid these commands to block the queues during freezing or
> quiescing this patch implements support for 'persistent' commands,
> which will be excluded from blk_queue_enter() and blk_queue_exit()
> calls.

How is it prevented that the SCSI timeout handler is activated for
persistent commands?

> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  block/blk-mq.c            | 12 +++++++++---
>  include/linux/blk-mq.h    |  2 ++
>  include/linux/blk_types.h |  4 ++++
>  3 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 44482aaed11e..402cf104d183 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -402,9 +402,14 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>  {
>  	struct blk_mq_alloc_data alloc_data = { .flags = flags, .cmd_flags = op };
>  	struct request *rq;
> -	int ret;
> +	int ret = 0;
>  
> -	ret = blk_queue_enter(q, flags);
> +	if (flags & BLK_MQ_REQ_PERSISTENT) {
> +		if (blk_queue_dying(q))
> +			ret = -ENODEV;
> +		alloc_data.cmd_flags |= REQ_PERSISTENT;
> +	} else
> +		ret = blk_queue_enter(q, flags);
>  	if (ret)
>  		return ERR_PTR(ret);
>  

I think that not calling blk_queue_enter() for persistent commands means
opening a giant can of worms. There is quite some code in the block
layer that assumes that neither .queue_rq() nor the request completion
code will be called if q_usage_counter == 0. Skipping the
blk_queue_enter() call for persistent commands breaks that assumption. I
think we need a better solution.

Thanks,

Bart.

Ming Lei May 1, 2020, 8:33 a.m. UTC | #2

On Thu, Apr 30, 2020 at 03:18:45PM +0200, Hannes Reinecke wrote:
> Some LLDDs implement event handling by sending a command to the
> firmware, which then will be completed once the firmware wants
> to register an event.
> So worst case a command is being sent to the firmware then the
> driver initializes, and will be returned once the driver unloads.
> To avoid these commands to block the queues during freezing or
> quiescing this patch implements support for 'persistent' commands,
> which will be excluded from blk_queue_enter() and blk_queue_exit()
> calls.

This way is quite dangerous from block layer viewpoint, and it should
have been done in driver/device specific way instead of polluting block
layer.


thanks, 
Ming

Hannes Reinecke May 2, 2020, 12:11 p.m. UTC | #3

On 5/1/20 6:59 AM, Bart Van Assche wrote:
> On 2020-04-30 06:18, Hannes Reinecke wrote:
>> Some LLDDs implement event handling by sending a command to the
>> firmware, which then will be completed once the firmware wants
>> to register an event.
>       ^^^^^^^^
>       report?
> 
>> So worst case a command is being sent to the firmware then the
>                                                          ^^^^
>                                                          when?
>> driver initializes, and will be returned once the driver unloads.
>> To avoid these commands to block the queues during freezing or
>> quiescing this patch implements support for 'persistent' commands,
>> which will be excluded from blk_queue_enter() and blk_queue_exit()
>> calls.
> 
> How is it prevented that the SCSI timeout handler is activated for
> persistent commands?
> 
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   block/blk-mq.c            | 12 +++++++++---
>>   include/linux/blk-mq.h    |  2 ++
>>   include/linux/blk_types.h |  4 ++++
>>   3 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 44482aaed11e..402cf104d183 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -402,9 +402,14 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>   {
>>   	struct blk_mq_alloc_data alloc_data = { .flags = flags, .cmd_flags = op };
>>   	struct request *rq;
>> -	int ret;
>> +	int ret = 0;
>>   
>> -	ret = blk_queue_enter(q, flags);
>> +	if (flags & BLK_MQ_REQ_PERSISTENT) {
>> +		if (blk_queue_dying(q))
>> +			ret = -ENODEV;
>> +		alloc_data.cmd_flags |= REQ_PERSISTENT;
>> +	} else
>> +		ret = blk_queue_enter(q, flags);
>>   	if (ret)
>>   		return ERR_PTR(ret);
>>   
> 
> I think that not calling blk_queue_enter() for persistent commands means
> opening a giant can of worms. There is quite some code in the block
> layer that assumes that neither .queue_rq() nor the request completion
> code will be called if q_usage_counter == 0. Skipping the
> blk_queue_enter() call for persistent commands breaks that assumption. I
> think we need a better solution.
> 
Well, yeah, maybe.
My aim here is that _all_ I/O requiring a tag from the hardware will be 
tracked by the blocklayer tagset. Only that will give the block-layer 
accurate information about outstanding commands, such that the ongoing 
CPU hotplug discussion can make the correct decisions and implement 
functions really covering all outstanding I/O.
It also allows us to use the scsi_host_busy_iter() functions within the 
driver, and will get rid of the hand-crafted iterations the driver has 
to do now.

It worked reasonably well, until I encountered the infamous AEN 
commands, which actually require the opposite: _not_ to be tracked by 
the block layer at all, as the commands themselves are just placeholders
to be returned by the firmware once an event occurs.
(And yes, I _do_ think this is a quite dangerous operation, because I 
can't quite see how one could reliably return this command in case of a 
firmware crash ...)
(But anyhow, that's how the firmware is written and we have to live with 
it.)

So I implemented this approach, to have tags which are ignored by the 
block layer. But I have to admit that this approach relies on quite some 
assumptions (like these tags are never actually submitted to the 
blocklayer itself, are never started etc), none of which are spelled out 
clearly (yet).
An alternative approach would be to arbitrary decrease the tagset size 
by one (eg by shifting the tags by one), and use the free tag for AENs).
That would have to be coded within the driver, though.

If that's a solution which you like better I could give it a go.

Cheers,

Hannes

Hannes Reinecke May 2, 2020, 12:22 p.m. UTC | #4

On 5/1/20 10:33 AM, Ming Lei wrote:
> On Thu, Apr 30, 2020 at 03:18:45PM +0200, Hannes Reinecke wrote:
>> Some LLDDs implement event handling by sending a command to the
>> firmware, which then will be completed once the firmware wants
>> to register an event.
>> So worst case a command is being sent to the firmware then the
>> driver initializes, and will be returned once the driver unloads.
>> To avoid these commands to block the queues during freezing or
>> quiescing this patch implements support for 'persistent' commands,
>> which will be excluded from blk_queue_enter() and blk_queue_exit()
>> calls.
> 
> This way is quite dangerous from block layer viewpoint, and it should
> have been done in driver/device specific way instead of polluting block
> layer.
> 
As already outlined in the reply to Bart, I'll be rewriting that 
requiring the drivers to set aside a separate tag and decrease the 
tagspace by one.
That should work as well.

Cheers,

Hannes

Bart Van Assche May 2, 2020, 4:22 p.m. UTC | #5

On 2020-05-02 05:11, Hannes Reinecke wrote:
> On 5/1/20 6:59 AM, Bart Van Assche wrote:
>> On 2020-04-30 06:18, Hannes Reinecke wrote:
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 44482aaed11e..402cf104d183 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -402,9 +402,14 @@ struct request *blk_mq_alloc_request(struct
>>> request_queue *q, unsigned int op,
>>>   {
>>>       struct blk_mq_alloc_data alloc_data = { .flags = flags,
>>> .cmd_flags = op };
>>>       struct request *rq;
>>> -    int ret;
>>> +    int ret = 0;
>>>   -    ret = blk_queue_enter(q, flags);
>>> +    if (flags & BLK_MQ_REQ_PERSISTENT) {
>>> +        if (blk_queue_dying(q))
>>> +            ret = -ENODEV;
>>> +        alloc_data.cmd_flags |= REQ_PERSISTENT;
>>> +    } else
>>> +        ret = blk_queue_enter(q, flags);
>>>       if (ret)
>>>           return ERR_PTR(ret);
>>>   
>>
>> I think that not calling blk_queue_enter() for persistent commands means
>> opening a giant can of worms. There is quite some code in the block
>> layer that assumes that neither .queue_rq() nor the request completion
>> code will be called if q_usage_counter == 0. Skipping the
>> blk_queue_enter() call for persistent commands breaks that assumption. I
>> think we need a better solution.
>>
> Well, yeah, maybe.
> My aim here is that _all_ I/O requiring a tag from the hardware will be
> tracked by the blocklayer tagset. Only that will give the block-layer
> accurate information about outstanding commands, such that the ongoing
> CPU hotplug discussion can make the correct decisions and implement
> functions really covering all outstanding I/O.
> It also allows us to use the scsi_host_busy_iter() functions within the
> driver, and will get rid of the hand-crafted iterations the driver has
> to do now.
> 
> It worked reasonably well, until I encountered the infamous AEN
> commands, which actually require the opposite: _not_ to be tracked by
> the block layer at all, as the commands themselves are just placeholders
> to be returned by the firmware once an event occurs.
> (And yes, I _do_ think this is a quite dangerous operation, because I
> can't quite see how one could reliably return this command in case of a
> firmware crash ...)
> (But anyhow, that's how the firmware is written and we have to live with
> it.)
> 
> So I implemented this approach, to have tags which are ignored by the
> block layer. But I have to admit that this approach relies on quite some
> assumptions (like these tags are never actually submitted to the
> blocklayer itself, are never started etc), none of which are spelled out
> clearly (yet).
> An alternative approach would be to arbitrary decrease the tagset size
> by one (eg by shifting the tags by one), and use the free tag for AENs).
> That would have to be coded within the driver, though.
> 
> If that's a solution which you like better I could give it a go.

How about dropping support for AEN commands entirely? As far as I know
such a command has never been standardized. Additionally, all SCSI core
code I'm familiar with supports unit attentions and does not rely on
asynchronous events to be reported immediately.

If dropping support for AEN commands is not an option, how about
aborting these commands before freezing a request queue?

Thanks,

Bart.

[RFC,v3,22/41] block: implement persistent commands

Commit Message

Comments

Patch