diff mbox series

scsi: core: Add a precondition check in scsi_eh_scmd_add()

Message ID 20231115193343.2262013-1-bvanassche@acm.org (mailing list archive)
State Accepted
Commit 10b53db2db8dfda84b25833043f2b63123572af6
Headers show
Series scsi: core: Add a precondition check in scsi_eh_scmd_add() | expand

Commit Message

Bart Van Assche Nov. 15, 2023, 7:33 p.m. UTC
Calling scsi_eh_scmd_add() may cause the error handler never to be woken
up because this may result in shost->host_failed to become larger than
scsi_host_busy(shost). Hence complain if scsi_eh_scmd_add() is called
after SCMD_STATE_INFLIGHT has been cleared.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_error.c | 1 +
 1 file changed, 1 insertion(+)

Comments

John Garry Nov. 16, 2023, 9:45 a.m. UTC | #1
On 15/11/2023 19:33, Bart Van Assche wrote:
> Calling scsi_eh_scmd_add() may cause the error handler never to be woken
> up because this may result in shost->host_failed to become larger than
> scsi_host_busy(shost). 

This is oddly worded. I think that you need to mention how calling 
scsi_eh_scmd_add() may lead to this scenario occurring.

> Hence complain if scsi_eh_scmd_add() is called
> after SCMD_STATE_INFLIGHT has been cleared.

Now you hint that this mentioned scenario may occur if 
SCMD_STATE_INFLIGHT was cleared.

Can you provide some info on when scsi_eh_scmd_add() could be called for 
SCMD_STATE_INFLIGHT cleared? Or is it that you don't know how (it may 
occur), but it is fatal if it does and we should guard against or warn 
about it.

> 
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Mike Christie <michael.christie@oracle.com>
> Cc: John Garry <john.g.garry@oracle.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/scsi/scsi_error.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index d7f2d90719fd..0734b3f30ef5 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -290,6 +290,7 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
>   	int ret;
>   
>   	WARN_ON_ONCE(!shost->ehandler);
> +	WARN_ON_ONCE(!test_bit(SCMD_STATE_INFLIGHT, &scmd->state));

What about if SCMD_STATE_COMPLETE is set - should we also warn about that?

>   
>   	spin_lock_irqsave(shost->host_lock, flags);
>   	if (scsi_host_set_state(shost, SHOST_RECOVERY)) {


Thanks,
John
Bart Van Assche Nov. 16, 2023, 7:35 p.m. UTC | #2
On 11/16/23 01:45, John Garry wrote:
> On 15/11/2023 19:33, Bart Van Assche wrote:
>> Calling scsi_eh_scmd_add() may cause the error handler never to be woken
>> up because this may result in shost->host_failed to become larger than
>> scsi_host_busy(shost). 
> 
> This is oddly worded. I think that you need to mention how calling
> scsi_eh_scmd_add() may lead to this scenario occurring.

This happened in development code that was never posted on any mailing list.
It took me a while to root-cause that issue. Hence this patch.

>> Hence complain if scsi_eh_scmd_add() is called
>> after SCMD_STATE_INFLIGHT has been cleared.
> 
> Now you hint that this mentioned scenario may occur if SCMD_STATE_INFLIGHT was cleared.
> 
> Can you provide some info on when scsi_eh_scmd_add() could be called for SCMD_STATE_INFLIGHT cleared? Or is it that you don't know how (it may occur), but it is fatal if it does and we should guard against or warn about it.

If anyone would add a call to scsi_eh_scmd_add() for a command for which
SCMD_STATE_INFLIGHT is not set.

>>       WARN_ON_ONCE(!shost->ehandler);
>> +    WARN_ON_ONCE(!test_bit(SCMD_STATE_INFLIGHT, &scmd->state));
> 
> What about if SCMD_STATE_COMPLETE is set - should we also warn about that?

Calls to scsi_eh_scmd_add() for commands for which SCMD_STATE_COMPLETE has
been set are much easier to find by reviewing the code than calls for commands
for which neither SCMD_STATE_COMPLETE nor SCMD_STATE_INFLIGHT have been set.

Thanks,

Bart.
Martin K. Petersen Nov. 25, 2023, 12:17 a.m. UTC | #3
Bart,

> Calling scsi_eh_scmd_add() may cause the error handler never to be
> woken up because this may result in shost->host_failed to become
> larger than scsi_host_busy(shost). Hence complain if
> scsi_eh_scmd_add() is called after SCMD_STATE_INFLIGHT has been
> cleared.

Applied to 6.8/scsi-staging, thanks!
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d7f2d90719fd..0734b3f30ef5 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -290,6 +290,7 @@  void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
 	int ret;
 
 	WARN_ON_ONCE(!shost->ehandler);
+	WARN_ON_ONCE(!test_bit(SCMD_STATE_INFLIGHT, &scmd->state));
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	if (scsi_host_set_state(shost, SHOST_RECOVERY)) {