diff mbox series

[02/15] scsi: add scsi_{get,put}_internal_cmd() helper

Message ID 20211125151048.103910-3-hare@suse.de (mailing list archive)
State Changes Requested
Headers show
Series scsi: enabled reserved commands for LLDDs | expand

Commit Message

Hannes Reinecke Nov. 25, 2021, 3:10 p.m. UTC
Add helper functions to allow LLDDs to allocate and free
internal commands.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_lib.c    | 44 ++++++++++++++++++++++++++++++++++++++
 include/scsi/scsi_device.h |  3 +++
 2 files changed, 47 insertions(+)

Comments

John Garry Nov. 26, 2021, 9:58 a.m. UTC | #1
On 25/11/2021 15:10, Hannes Reinecke wrote:
> +/**
> + * scsi_get_internal_cmd - allocate an internal SCSI command
> + * @sdev: SCSI device from which to allocate the command
> + * @data_direction: Data direction for the allocated command
> + * @nowait: do not wait for command allocation to succeed.
> + *
> + * Allocates a SCSI command for internal LLDD use.
> + */
> +struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
> +	int data_direction, bool nowait)
> +{
> +	struct request *rq;
> +	struct scsi_cmnd *scmd;
> +	blk_mq_req_flags_t flags = 0;
> +	int op;
> +
> +	if (nowait)
> +		flags |= BLK_MQ_REQ_NOWAIT;
> +	op = (data_direction == DMA_TO_DEVICE) ?
> +		REQ_OP_DRV_OUT : REQ_OP_DRV_IN;
> +	rq = blk_mq_alloc_request(sdev->request_queue, op, flags);
> +	if (IS_ERR(rq))
> +		return NULL;
> +	scmd = blk_mq_rq_to_pdu(rq);
> +	scmd->device = sdev;
> +	return scmd;
> +}
> +EXPORT_SYMBOL_GPL(scsi_get_internal_cmd);

So there are a couple of generally-accepted grievances about this approach:
a. we're being allocated a scsi_cmnd, but not using what is being 
allocated as a scsi_cmnd, but rather just a holder as a reference to an 
allocated tag
b. we're being allocated a request, which is not being sent through the 
block layer*

It just seems to me that what the block layer is providing is not suitable.

How about these:
a. allow block driver to specify size of reserved request PDU separately 
to regular requests, so we can use something like this for rsvd commands:
struct scsi_rsvd_cmnd {
	struct scsi_device *sdev;
}
And fix up SCSI iter functions and LLDs to deal with it.
b. provide block layer API to provide just same as is returned from 
blk_mq_unique_tag(), but no request is provided. This just gives what we 
need but would be disruptive in scsi layer and LLDs.
c. as alternative to b., send all rsvd requests through the block layer, 
but can be very difficult+disruptive for users

*For polling rsvd commands on a poll queue (which we will need for 
hisi_sas driver and maybe others for poll mode support), we would need 
to send the request through the block layer, but block layer polling 
requires a request with a bio, which is a problem.

Thanks,
John
Bart Van Assche Nov. 26, 2021, 11:12 p.m. UTC | #2
On 11/25/21 07:10, Hannes Reinecke wrote:
> +struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
> +	int data_direction, bool nowait)

Please use enum dma_data_direction instead of 'int' for the data direction.
> +{
> +	struct request *rq;
> +	struct scsi_cmnd *scmd;
> +	blk_mq_req_flags_t flags = 0;
> +	int op;

The name 'op' is confusing since that variable is a bitfield that 
includes the operation and operation flags. Consider one of the names 
'opf', 'op_and_flags' or 'cmd_and_flags'. Please also change the data 
type from 'int' into 'unsigned int' or 'u32'.

> +	op = (data_direction == DMA_TO_DEVICE) ?
> +		REQ_OP_DRV_OUT : REQ_OP_DRV_IN;

Please leave out the parentheses from the above assignment.

Thanks,

Bart.
Bart Van Assche Nov. 26, 2021, 11:13 p.m. UTC | #3
On 11/26/21 01:58, John Garry wrote:
> How about these:
> a. allow block driver to specify size of reserved request PDU separately 
> to regular requests, so we can use something like this for rsvd commands:
> struct scsi_rsvd_cmnd {
>      struct scsi_device *sdev;
> }
> And fix up SCSI iter functions and LLDs to deal with it.
> b. provide block layer API to provide just same as is returned from 
> blk_mq_unique_tag(), but no request is provided. This just gives what we 
> need but would be disruptive in scsi layer and LLDs.
> c. as alternative to b., send all rsvd requests through the block layer, 
> but can be very difficult+disruptive for users
> 
> *For polling rsvd commands on a poll queue (which we will need for 
> hisi_sas driver and maybe others for poll mode support), we would need 
> to send the request through the block layer, but block layer polling 
> requires a request with a bio, which is a problem.

How about postponing these changes until a patch is ready that converts 
at least one SCSI LLD such that it uses the above functionality?

Thanks,

Bart.
Bart Van Assche Nov. 28, 2021, 3:33 a.m. UTC | #4
On 11/25/21 07:10, Hannes Reinecke wrote:
> Add helper functions to allow LLDDs to allocate and free
> internal commands.

Are the changes for the SCSI timeout handler perhaps missing from this 
patch? In the UFS driver we need the ability not to trigger the SCSI 
error handler if an internal command times out.

Thanks,

Bart.
Hannes Reinecke Nov. 28, 2021, 10:36 a.m. UTC | #5
On 11/26/21 10:58 AM, John Garry wrote:
> On 25/11/2021 15:10, Hannes Reinecke wrote:
>> +/**
>> + * scsi_get_internal_cmd - allocate an internal SCSI command
>> + * @sdev: SCSI device from which to allocate the command
>> + * @data_direction: Data direction for the allocated command
>> + * @nowait: do not wait for command allocation to succeed.
>> + *
>> + * Allocates a SCSI command for internal LLDD use.
>> + */
>> +struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
>> +    int data_direction, bool nowait)
>> +{
>> +    struct request *rq;
>> +    struct scsi_cmnd *scmd;
>> +    blk_mq_req_flags_t flags = 0;
>> +    int op;
>> +
>> +    if (nowait)
>> +        flags |= BLK_MQ_REQ_NOWAIT;
>> +    op = (data_direction == DMA_TO_DEVICE) ?
>> +        REQ_OP_DRV_OUT : REQ_OP_DRV_IN;
>> +    rq = blk_mq_alloc_request(sdev->request_queue, op, flags);
>> +    if (IS_ERR(rq))
>> +        return NULL;
>> +    scmd = blk_mq_rq_to_pdu(rq);
>> +    scmd->device = sdev;
>> +    return scmd;
>> +}
>> +EXPORT_SYMBOL_GPL(scsi_get_internal_cmd);
> 
> So there are a couple of generally-accepted grievances about this approach:
> a. we're being allocated a scsi_cmnd, but not using what is being 
> allocated as a scsi_cmnd, but rather just a holder as a reference to an 
> allocated tag
> b. we're being allocated a request, which is not being sent through the 
> block layer*
> 
And while being true in general, it does make some assumptions:
- Reserved commands will never being sent via the block layer
- None of the drivers will need to use the additional 'scsi_cmnd' payload.

Here I'm not sure if this is true in general.
While it doesn't seem to be necessary to send reserved commands via the 
block layer (ie calling 'queue_rq' on them), we shouldn't exclude the 
possibility.
Didn't we speak about that in the context of converting libata?

And I have some driver conversions queued (fnic in particular), which 
encapsulate everything into a scsi command.

> It just seems to me that what the block layer is providing is not suitable.
> 
> How about these:
> a. allow block driver to specify size of reserved request PDU separately 
> to regular requests, so we can use something like this for rsvd commands:
> struct scsi_rsvd_cmnd {
>      struct scsi_device *sdev;
> }
> And fix up SCSI iter functions and LLDs to deal with it.

That's what Bart suggested a while back, but then we have to problem 
that the reserved tags are completed with the same interrupt routine, 
and _that_ currently assumes that everything is a scsi command.
Trying to fix up that assumption would mean to audit the midlayer 
(scmd->device is a particular common pattern), _and_ all drivers wanting 
to make use of reserved commands.
For me that's too high an risk to introduce errors; audits are always 
painful and error-prone.

> b. provide block layer API to provide just same as is returned from 
> blk_mq_unique_tag(), but no request is provided. This just gives what we 
> need but would be disruptive in scsi layer and LLDs.

Having looked at the block layer and how tags are allocated I found it 
too deeply interlinked with the request queue and requests in general.
Plus I've suggested that with a previous patchset, which got vetoed by 
Bart as he couldn't use such an API for UFS.

> c. as alternative to b., send all rsvd requests through the block layer, 
> but can be very difficult+disruptive for users
> 
And, indeed, not possible when we will need to send these requests 
during error handling, where the queue might be blocked/frozen/quiesced 
precisely because we are in error handling ...

> *For polling rsvd commands on a poll queue (which we will need for 
> hisi_sas driver and maybe others for poll mode support), we would need 
> to send the request through the block layer, but block layer polling 
> requires a request with a bio, which is a problem.
> 
Allocating a bio is a relatively trivial task. But as soon as we ever 
want to be able to implement polling support for reserved tags we 
essentially _have_ to use requests, and that means we'll have to use the 
provided interfaces from the block layer.

Cheers,

Hannes
Hannes Reinecke Nov. 28, 2021, 12:44 p.m. UTC | #6
On 11/27/21 12:12 AM, Bart Van Assche wrote:
> On 11/25/21 07:10, Hannes Reinecke wrote:
>> +struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
>> +    int data_direction, bool nowait)
> 
> Please use enum dma_data_direction instead of 'int' for the data direction.

I have oriented myself at __scsi_execute(), which also has 
'data_direction' as an integer.
Presumably to avoid header clutter.
Martin?

>> +{
>> +    struct request *rq;
>> +    struct scsi_cmnd *scmd;
>> +    blk_mq_req_flags_t flags = 0;
>> +    int op;
> 
> The name 'op' is confusing since that variable is a bitfield that 
> includes the operation and operation flags. Consider one of the names 
> 'opf', 'op_and_flags' or 'cmd_and_flags'. Please also change the data 
> type from 'int' into 'unsigned int' or 'u32'.
> 
'op' is the name the variable has in blk_mq_alloc_request(), so I prefer 
to stay with that name. Agree with the 'unsigned int', though.

>> +    op = (data_direction == DMA_TO_DEVICE) ?
>> +        REQ_OP_DRV_OUT : REQ_OP_DRV_IN;
> 
> Please leave out the parentheses from the above assignment.
> 
Okay.

Cheers,

Hannes
Hannes Reinecke Nov. 28, 2021, 1:05 p.m. UTC | #7
On 11/28/21 4:33 AM, Bart Van Assche wrote:
> On 11/25/21 07:10, Hannes Reinecke wrote:
>> Add helper functions to allow LLDDs to allocate and free
>> internal commands.
> 
> Are the changes for the SCSI timeout handler perhaps missing from this 
> patch? In the UFS driver we need the ability not to trigger the SCSI 
> error handler if an internal command times out.
> 
It's based on 5.17/scsi-queue; so if the SCSI timeout handler changes 
are not present in there I won't have it.

Cheers,

Hannes
Martin K. Petersen Nov. 30, 2021, 4:17 a.m. UTC | #8
Hannes,

> I have oriented myself at __scsi_execute(), which also has
> 'data_direction' as an integer.  Presumably to avoid header clutter.
> Martin?

Just a vestige from ancient times. I really hate scsi_execute() and its
10,000 randomly ordered arguments. The more sanity checking we have in
that department, the better.

At some point I proposed having scsi_execute() take a single struct as
argument to get better input validation. I've lost count how many things
have been broken because of misordered arguments to this function.
Backporting patches almost inevitably causes regressions because of this
interface.
Hannes Reinecke Nov. 30, 2021, 6:51 a.m. UTC | #9
On 11/30/21 5:17 AM, Martin K. Petersen wrote:
> 
> Hannes,
> 
>> I have oriented myself at __scsi_execute(), which also has
>> 'data_direction' as an integer.  Presumably to avoid header clutter.
>> Martin?
> 
> Just a vestige from ancient times. I really hate scsi_execute() and its
> 10,000 randomly ordered arguments. The more sanity checking we have in
> that department, the better.
> 
> At some point I proposed having scsi_execute() take a single struct as
> argument to get better input validation. I've lost count how many things
> have been broken because of misordered arguments to this function.
> Backporting patches almost inevitably causes regressions because of this
> interface.
> 
Right. As it so happens, I've already created a patch to include 
<linux/dma-direction.h> here.
But yeah, the arguments to __scsi_execute are patently horrible.

Cheers,

Hannes
John Garry Dec. 6, 2021, 5:15 p.m. UTC | #10
On 28/11/2021 10:36, Hannes Reinecke wrote:
>>>   * Allocates a SCSI command for internal LLDD use.
>>> + */
>>> +struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
>>> +    int data_direction, bool nowait)
>>> +{
>>> +    struct request *rq;
>>> +    struct scsi_cmnd *scmd;
>>> +    blk_mq_req_flags_t flags = 0;
>>> +    int op;
>>> +
>>> +    if (nowait)
>>> +        flags |= BLK_MQ_REQ_NOWAIT;
>>> +    op = (data_direction == DMA_TO_DEVICE) ?
>>> +        REQ_OP_DRV_OUT : REQ_OP_DRV_IN;
>>> +    rq = blk_mq_alloc_request(sdev->request_queue, op, flags);
>>> +    if (IS_ERR(rq))
>>> +        return NULL;
>>> +    scmd = blk_mq_rq_to_pdu(rq);
>>> +    scmd->device = sdev;
>>> +    return scmd;
>>> +}
>>> +EXPORT_SYMBOL_GPL(scsi_get_internal_cmd);
>> So there are a couple of generally-accepted grievances about this approach:
>> a. we're being allocated a scsi_cmnd, but not using what is being
>> allocated as a scsi_cmnd, but rather just a holder as a reference to an
>> allocated tag
>> b. we're being allocated a request, which is not being sent through the
>> block layer*
>>
> And while being true in general, it does make some assumptions:
> - Reserved commands will never being sent via the block layer
> - None of the drivers will need to use the additional 'scsi_cmnd' payload.
> 
> Here I'm not sure if this is true in general.
> While it doesn't seem to be necessary to send reserved commands via the
> block layer (ie calling 'queue_rq' on them), we shouldn't exclude the
> possibility.

Agreed

> Didn't we speak about that in the context of converting libata?
> 

We did discuss libata before, but I'm not sure on the context you mean.

One thing that I know is that libata-core has "internal" commands in 
ata_exec_internal(). I could not see how that function could be 
converted to use queue_rq. The problem is that it calls ->qc_issue() 
with IRQs disabled, which is not permitted for calling blk_execute_rq() 
instead.

> And I have some driver conversions queued (fnic in particular), which
> encapsulate everything into a scsi command.
> 
>> It just seems to me that what the block layer is providing is not suitable.
>>
>> How about these:
>> a. allow block driver to specify size of reserved request PDU separately
>> to regular requests, so we can use something like this for rsvd commands:
>> struct scsi_rsvd_cmnd {
>>       struct scsi_device *sdev;
>> }
>> And fix up SCSI iter functions and LLDs to deal with it.
> That's what Bart suggested a while back, 

I don't recall that one.

> but then we have to problem
> that the reserved tags are completed with the same interrupt routine,
> and_that_  currently assumes that everything is a scsi command.

I think that any driver which uses reserved commands needs to be thought 
that not all commands are "real" scsi commands, i.e. we don't call 
scsi_done() in the LLD ISR always. As such, they should be able to deal 
with something like struct scsi_rsvd_cmnd.

BTW, for this current series, please ensure that we can't call 
scsi_host_complete_all_commands() which could iter reserved tags, as we 
call stuff like scsi_done() there. I don't think it's an issue here, but 
just thought that it was worth mentioning.

> Trying to fix up that assumption would mean to audit the midlayer
> (scmd->device is a particular common pattern),_and_  all drivers wanting
> to make use of reserved commands.
> For me that's too high an risk to introduce errors; audits are always
> painful and error-prone.
> 
>> b. provide block layer API to provide just same as is returned from
>> blk_mq_unique_tag(), but no request is provided. This just gives what we
>> need but would be disruptive in scsi layer and LLDs.
> Having looked at the block layer and how tags are allocated I found it
> too deeply interlinked with the request queue and requests in general.

They are indeed interlinked in the block layer, but we don't need expose 
requests or anything else.

Such an interface could just be a wrapper for 
blk_mq_alloc_request()+_start_request().

> Plus I've suggested that with a previous patchset, which got vetoed by
> Bart as he couldn't use such an API for UFS.
>  >> c. as alternative to b., send all rsvd requests through the block layer,
>> but can be very difficult+disruptive for users
>>
> And, indeed, not possible when we will need to send these requests
> during error handling, where the queue might be blocked/frozen/quiesced
> precisely because we are in error handling ...

If we send for the host request queue, would it ever be 
blocked/frozen/quiesced?

> 
>> *For polling rsvd commands on a poll queue (which we will need for
>> hisi_sas driver and maybe others for poll mode support), we would need
>> to send the request through the block layer, but block layer polling
>> requires a request with a bio, which is a problem.
>>
> Allocating a bio is a relatively trivial task.

So do you suggest a dummy bio for that request? I hacked something 
locally to get it to work as a PoC, but no idea on a real solution yet.

> But as soon as we ever
> want to be able to implement polling support for reserved tags we
> essentially_have_  to use requests, 

Agreed

> and that means we'll have to use the
> provided interfaces from the block layer.

Thanks,
John
Hannes Reinecke Dec. 6, 2021, 5:46 p.m. UTC | #11
On 12/6/21 6:15 PM, John Garry wrote:
> On 28/11/2021 10:36, Hannes Reinecke wrote:
>>>>   * Allocates a SCSI command for internal LLDD use.
>>>> + */
>>>> +struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
>>>> +    int data_direction, bool nowait)
>>>> +{
>>>> +    struct request *rq;
>>>> +    struct scsi_cmnd *scmd;
>>>> +    blk_mq_req_flags_t flags = 0;
>>>> +    int op;
>>>> +
>>>> +    if (nowait)
>>>> +        flags |= BLK_MQ_REQ_NOWAIT;
>>>> +    op = (data_direction == DMA_TO_DEVICE) ?
>>>> +        REQ_OP_DRV_OUT : REQ_OP_DRV_IN;
>>>> +    rq = blk_mq_alloc_request(sdev->request_queue, op, flags);
>>>> +    if (IS_ERR(rq))
>>>> +        return NULL;
>>>> +    scmd = blk_mq_rq_to_pdu(rq);
>>>> +    scmd->device = sdev;
>>>> +    return scmd;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(scsi_get_internal_cmd);
>>> So there are a couple of generally-accepted grievances about this 
>>> approach:
>>> a. we're being allocated a scsi_cmnd, but not using what is being
>>> allocated as a scsi_cmnd, but rather just a holder as a reference to an
>>> allocated tag
>>> b. we're being allocated a request, which is not being sent through the
>>> block layer*
>>>
>> And while being true in general, it does make some assumptions:
>> - Reserved commands will never being sent via the block layer
>> - None of the drivers will need to use the additional 'scsi_cmnd' 
>> payload.
>>
>> Here I'm not sure if this is true in general.
>> While it doesn't seem to be necessary to send reserved commands via the
>> block layer (ie calling 'queue_rq' on them), we shouldn't exclude the
>> possibility.
> 
> Agreed
> 
>> Didn't we speak about that in the context of converting libata?
>>
> 
> We did discuss libata before, but I'm not sure on the context you mean.
> 
> One thing that I know is that libata-core has "internal" commands in 
> ata_exec_internal(). I could not see how that function could be 
> converted to use queue_rq. The problem is that it calls ->qc_issue() 
> with IRQs disabled, which is not permitted for calling blk_execute_rq() 
> instead.
> 

We're conflating two issues here.

The one issue is to use 'reserved' tags (as defined by the block layer) 
to allocate commands which are internal within some drivers (like hpsa 
etc). That is what my patchset does.
As these commands are internal within specific drivers these commands 
will never show up in the upper layers, precisely because they are internal.

The other issue is to allow these 'reserved' tags (and the attached 
requests/commands) to be routed via the normal block-layer execution 
path. This is _not_ part of the above patchset, as that just deals with 
driver internal commands and will never execute ->queue_rq.
For that we would need an additional patchset, on top of the first one.

>> And I have some driver conversions queued (fnic in particular), which
>> encapsulate everything into a scsi command.
>>
>>> It just seems to me that what the block layer is providing is not 
>>> suitable.
>>>
>>> How about these:
>>> a. allow block driver to specify size of reserved request PDU separately
>>> to regular requests, so we can use something like this for rsvd 
>>> commands:
>>> struct scsi_rsvd_cmnd {
>>>       struct scsi_device *sdev;
>>> }
>>> And fix up SCSI iter functions and LLDs to deal with it.
>> That's what Bart suggested a while back, 
> 
> I don't recall that one.
> 
>> but then we have to problem
>> that the reserved tags are completed with the same interrupt routine,
>> and_that_  currently assumes that everything is a scsi command.
> 
> I think that any driver which uses reserved commands needs to be thought 
> that not all commands are "real" scsi commands, i.e. we don't call 
> scsi_done() in the LLD ISR always. As such, they should be able to deal 
> with something like struct scsi_rsvd_cmnd.
> 
See above. My patchset is strictly for driver internal commands, which 
will never be send nor completed like 'real' commands.
And the main point (well, the 'other' main point aside from not having 
to allocate a tag yourself) is that the driver can be _simplified_, as 
now every tag references to the _same_ structure.
If we start using different structure we'll lose that ability 
completely, and really haven't gained that much.

> BTW, for this current series, please ensure that we can't call 
> scsi_host_complete_all_commands() which could iter reserved tags, as we 
> call stuff like scsi_done() there. I don't think it's an issue here, but 
> just thought that it was worth mentioning.
> 
See above. These are driver internal commands, for which scsi_done() is 
never called.

I deliberately did _not_ add checks to scsi_done() or queue_rq(), as 
there presumably will be an additional patch which allows precisely 
this, eg when converting libsas.

>> Trying to fix up that assumption would mean to audit the midlayer
>> (scmd->device is a particular common pattern),_and_  all drivers wanting
>> to make use of reserved commands.
>> For me that's too high an risk to introduce errors; audits are always
>> painful and error-prone.
>>
>>> b. provide block layer API to provide just same as is returned from
>>> blk_mq_unique_tag(), but no request is provided. This just gives what we
>>> need but would be disruptive in scsi layer and LLDs.
>> Having looked at the block layer and how tags are allocated I found it
>> too deeply interlinked with the request queue and requests in general.
> 
> They are indeed interlinked in the block layer, but we don't need expose 
> requests or anything else.
> 
Correct. And this is one of the drawbacks of this approach, in that 
we're always allocating a 'struct request' and a 'struct scsi_cmnd' 
payload even if we don't actually use them.
But then again, we _currently_ don't use them.
If we ever want to sent these 'reserved' commands via queue_rqs() and be 
able to call 'scsi_done()' on them (again, the libsas case) then we need 
these payloads.

> Such an interface could just be a wrapper for 
> blk_mq_alloc_request()+_start_request().
> 
I do agree that currently I don't start requests, and consequently they 
won't show up in any iterators.
But then (currently) it doesn't matter, as none of the iterators in any 
of the drivers I've been converting needed to look at those requests.

>> Plus I've suggested that with a previous patchset, which got vetoed by
>> Bart as he couldn't use such an API for UFS.
>>  >> c. as alternative to b., send all rsvd requests through the block 
>> layer,
>>> but can be very difficult+disruptive for users
>>>
>> And, indeed, not possible when we will need to send these requests
>> during error handling, where the queue might be blocked/frozen/quiesced
>> precisely because we are in error handling ...
> 
> If we send for the host request queue, would it ever be 
> blocked/frozen/quiesced?
> 
Possibly not. But be aware that 'reserved' tags is actually independent 
on the host request queue; it's perfectly possible to use 'reserved' 
tags without a host request queue. Again, fnic is such an example.

>>
>>> *For polling rsvd commands on a poll queue (which we will need for
>>> hisi_sas driver and maybe others for poll mode support), we would need
>>> to send the request through the block layer, but block layer polling
>>> requires a request with a bio, which is a problem.
>>>
>> Allocating a bio is a relatively trivial task.
> 
> So do you suggest a dummy bio for that request? I hacked something 
> locally to get it to work as a PoC, but no idea on a real solution yet.
> 
We're allocating bios for all kind of purposes, even 'dummy' bios in the 
case of bio_clone() or bio_split(). So that's nothing new, and should be 
relatively easy.

Cheers,

Hannes
John Garry Dec. 7, 2021, 12:50 p.m. UTC | #12
On 06/12/2021 17:46, Hannes Reinecke wrote:
>> We did discuss libata before, but I'm not sure on the context you mean.
>>
>> One thing that I know is that libata-core has "internal" commands in 
>> ata_exec_internal(). I could not see how that function could be 
>> converted to use queue_rq. The problem is that it calls ->qc_issue() 
>> with IRQs disabled, which is not permitted for calling 
>> blk_execute_rq() instead.
>>
> 
> We're conflating two issues here.

I'm just trying to see the final picture, and not just this first step,
please

> 
> The one issue is to use 'reserved' tags (as defined by the block layer) 
> to allocate commands which are internal within some drivers (like hpsa 
> etc). That is what my patchset does.
> As these commands are internal within specific drivers these commands 
> will never show up in the upper layers, precisely because they are 
> internal.
> 
> The other issue is to allow these 'reserved' tags (and the attached 
> requests/commands) to be routed via the normal block-layer execution 
> path. This is _not_ part of the above patchset, as that just deals with 
> driver internal commands and will never execute ->queue_rq.
> For that we would need an additional patchset, on top of the first one.
> 
>>> And I have some driver conversions queued (fnic in particular), which
>>> encapsulate everything into a scsi command.
>>>
>>>> It just seems to me that what the block layer is providing is not 
>>>> suitable.
>>>>
>>>> How about these:
>>>> a. allow block driver to specify size of reserved request PDU 
>>>> separately
>>>> to regular requests, so we can use something like this for rsvd 
>>>> commands:
>>>> struct scsi_rsvd_cmnd {
>>>>       struct scsi_device *sdev;
>>>> }
>>>> And fix up SCSI iter functions and LLDs to deal with it.
>>> That's what Bart suggested a while back, 
>>
>> I don't recall that one.
>>
>>> but then we have to problem
>>> that the reserved tags are completed with the same interrupt routine,
>>> and_that_  currently assumes that everything is a scsi command.
>>
>> I think that any driver which uses reserved commands needs to be 
>> thought that not all commands are "real" scsi commands, i.e. we don't 
>> call scsi_done() in the LLD ISR always. As such, they should be able 
>> to deal with something like struct scsi_rsvd_cmnd.
>>
> See above. My patchset is strictly for driver internal commands, which 
> will never be send nor completed like 'real' commands.
> And the main point (well, the 'other' main point aside from not having 
> to allocate a tag yourself) is that the driver can be _simplified_, as 
> now every tag references to the _same_ structure.

Sure, but I like the distinction of a separate struct scsi_rsvd_cmnd,
even if it is at the cost of some simplification.

> If we start using different structure we'll lose that ability 
> completely, and really haven't gained that much.
> 
>> BTW, for this current series, please ensure that we can't call 
>> scsi_host_complete_all_commands() which could iter reserved tags, as 
>> we call stuff like scsi_done() there. I don't think it's an issue 
>> here, but just thought that it was worth mentioning.
>>
> See above. These are driver internal commands, for which scsi_done() is 
> never called.

Right, but I am just saying that we need to be careful that it will not be
called possibly in the future. Ignore it for now.

> 
> I deliberately did _not_ add checks to scsi_done() or queue_rq(), as 
> there presumably will be an additional patch which allows precisely 
> this, eg when converting libsas.
> 
>>> Trying to fix up that assumption would mean to audit the midlayer
>>> (scmd->device is a particular common pattern),_and_  all drivers wanting
>>> to make use of reserved commands.
>>> For me that's too high an risk to introduce errors; audits are always
>>> painful and error-prone.
>>>
>>>> b. provide block layer API to provide just same as is returned from
>>>> blk_mq_unique_tag(), but no request is provided. This just gives 
>>>> what we
>>>> need but would be disruptive in scsi layer and LLDs.
>>> Having looked at the block layer and how tags are allocated I found it
>>> too deeply interlinked with the request queue and requests in general.
>>
>> They are indeed interlinked in the block layer, but we don't need 
>> expose requests or anything else.
>>
> Correct. And this is one of the drawbacks of this approach, in that 
> we're always allocating a 'struct request' and a 'struct scsi_cmnd' 
> payload even if we don't actually use them.
> But then again, we _currently_ don't use them.
> If we ever want to sent these 'reserved' commands via queue_rqs() and be 
> able to call 'scsi_done()' on them (again, the libsas case) then we need 
> these payloads.
> 
>> Such an interface could just be a wrapper for 
>> blk_mq_alloc_request()+_start_request().
>>
> I do agree that currently I don't start requests, and consequently they 
> won't show up in any iterators.
> But then (currently) it doesn't matter, as none of the iterators in any 
> of the drivers I've been converting needed to look at those requests.

ok, fine

> 
>>> Plus I've suggested that with a previous patchset, which got vetoed by
>>> Bart as he couldn't use such an API for UFS.
>>>  >> c. as alternative to b., send all rsvd requests through the block 
>>> layer,
>>>> but can be very difficult+disruptive for users
>>>>
>>> And, indeed, not possible when we will need to send these requests
>>> during error handling, where the queue might be blocked/frozen/quiesced
>>> precisely because we are in error handling ...
>>
>> If we send for the host request queue, would it ever be 
>> blocked/frozen/quiesced?
>>
> Possibly not. But be aware that 'reserved' tags is actually independent 
> on the host request queue; it's perfectly possible to use 'reserved' 
> tags without a host request queue. Again, fnic is such an example.

So considering a scsi_device request queue may be blocked/frozen/quiesced,
how should we decide which request queue to use when allocating a
reserved command for some error handling IO? I assume that request
allocation/queuing is failed or blocked in this mentioned scenario for the
sdev request queue.

> 
>>>
>>>> *For polling rsvd commands on a poll queue (which we will need for
>>>> hisi_sas driver and maybe others for poll mode support), we would need
>>>> to send the request through the block layer, but block layer polling
>>>> requires a request with a bio, which is a problem.
>>>>
>>> Allocating a bio is a relatively trivial task.
>>
>> So do you suggest a dummy bio for that request? I hacked something 
>> locally to get it to work as a PoC, but no idea on a real solution yet.
>>
> We're allocating bios for all kind of purposes, even 'dummy' bios in the 
> case of bio_clone() or bio_split(). So that's nothing new, and should be 
> relatively easy.

But these APIs need some bio to begin with, right? Where does that come
from?

Thanks,
John
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 621d841d819a..6fbd36c9c416 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1957,6 +1957,50 @@  void scsi_mq_destroy_tags(struct Scsi_Host *shost)
 	blk_mq_free_tag_set(&shost->tag_set);
 }
 
+/**
+ * scsi_get_internal_cmd - allocate an internal SCSI command
+ * @sdev: SCSI device from which to allocate the command
+ * @data_direction: Data direction for the allocated command
+ * @nowait: do not wait for command allocation to succeed.
+ *
+ * Allocates a SCSI command for internal LLDD use.
+ */
+struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
+	int data_direction, bool nowait)
+{
+	struct request *rq;
+	struct scsi_cmnd *scmd;
+	blk_mq_req_flags_t flags = 0;
+	int op;
+
+	if (nowait)
+		flags |= BLK_MQ_REQ_NOWAIT;
+	op = (data_direction == DMA_TO_DEVICE) ?
+		REQ_OP_DRV_OUT : REQ_OP_DRV_IN;
+	rq = blk_mq_alloc_request(sdev->request_queue, op, flags);
+	if (IS_ERR(rq))
+		return NULL;
+	scmd = blk_mq_rq_to_pdu(rq);
+	scmd->device = sdev;
+	return scmd;
+}
+EXPORT_SYMBOL_GPL(scsi_get_internal_cmd);
+
+/**
+ * scsi_put_internal_cmd - free an internal SCSI command
+ * @scmd: SCSI command to be freed
+ *
+ * Check if @scmd is an internal command, and call
+ * blk_mq_free_request() if true.
+ */
+void scsi_put_internal_cmd(struct scsi_cmnd *scmd)
+{
+	struct request *rq = blk_mq_rq_from_pdu(scmd);
+
+	blk_mq_free_request(rq);
+}
+EXPORT_SYMBOL_GPL(scsi_put_internal_cmd);
+
 /**
  * scsi_device_from_queue - return sdev associated with a request_queue
  * @q: The request queue to return the sdev from
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 5d7204186831..218b541515bf 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -470,6 +470,9 @@  static inline int scsi_execute_req(struct scsi_device *sdev,
 	return scsi_execute(sdev, cmd, data_direction, buffer,
 		bufflen, NULL, sshdr, timeout, retries,  0, 0, resid);
 }
+struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev,
+	int data_direction, bool nowait);
+void scsi_put_internal_cmd(struct scsi_cmnd *scmd);
 extern void sdev_disable_disk_events(struct scsi_device *sdev);
 extern void sdev_enable_disk_events(struct scsi_device *sdev);
 extern int scsi_vpd_lun_id(struct scsi_device *, char *, size_t);