diff mbox series

RDMA/srp: Do not call scsi_done() from srp_abort()

Message ID 20230823205727.505681-1-bvanassche@acm.org (mailing list archive)
State Accepted
Headers show
Series RDMA/srp: Do not call scsi_done() from srp_abort() | expand

Commit Message

Bart Van Assche Aug. 23, 2023, 8:57 p.m. UTC
After scmd_eh_abort_handler() has called the SCSI LLD eh_abort_handler
callback, it performs one of the following actions:
* Call scsi_queue_insert().
* Call scsi_finish_command().
* Call scsi_eh_scmd_add().
Hence, SCSI abort handlers must not call scsi_done(). Otherwise all
the above actions would trigger a use-after-free. Hence remove the
scsi_done() call from srp_abort(). Keep the srp_free_req() call
before returning SUCCESS because we may not see the command again if
SUCCESS is returned.

Cc: Bob Pearson <rpearsonhpe@gmail.com>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: d8536670916a ("IB/srp: Avoid having aborted requests hang")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

Comments

Shinichiro Kawasaki Aug. 24, 2023, 2:46 a.m. UTC | #1
On Aug 23, 2023 / 13:57, Bart Van Assche wrote:
> After scmd_eh_abort_handler() has called the SCSI LLD eh_abort_handler
> callback, it performs one of the following actions:
> * Call scsi_queue_insert().
> * Call scsi_finish_command().
> * Call scsi_eh_scmd_add().
> Hence, SCSI abort handlers must not call scsi_done(). Otherwise all
> the above actions would trigger a use-after-free. Hence remove the
> scsi_done() call from srp_abort(). Keep the srp_free_req() call
> before returning SUCCESS because we may not see the command again if
> SUCCESS is returned.
> 
> Cc: Bob Pearson <rpearsonhpe@gmail.com>
> Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Fixes: d8536670916a ("IB/srp: Avoid having aborted requests hang")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Thanks Bart. Do you think this patch fixes the hangs at blktests srp/002 and
srp/011? I tried this patch and still see the hang at srp/002, but the hang at
srp/011 looks disappearing.
Bart Van Assche Aug. 24, 2023, 2:38 p.m. UTC | #2
On 8/23/23 19:46, Shinichiro Kawasaki wrote:
> On Aug 23, 2023 / 13:57, Bart Van Assche wrote:
>> After scmd_eh_abort_handler() has called the SCSI LLD eh_abort_handler
>> callback, it performs one of the following actions:
>> * Call scsi_queue_insert().
>> * Call scsi_finish_command().
>> * Call scsi_eh_scmd_add().
>> Hence, SCSI abort handlers must not call scsi_done(). Otherwise all
>> the above actions would trigger a use-after-free. Hence remove the
>> scsi_done() call from srp_abort(). Keep the srp_free_req() call
>> before returning SUCCESS because we may not see the command again if
>> SUCCESS is returned.
>>
>> Cc: Bob Pearson <rpearsonhpe@gmail.com>
>> Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>> Fixes: d8536670916a ("IB/srp: Avoid having aborted requests hang")
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> 
> Thanks Bart. Do you think this patch fixes the hangs at blktests srp/002 and
> srp/011? I tried this patch and still see the hang at srp/002, but the hang at
> srp/011 looks disappearing.

I have not yet tried to verify whether the above patch affects any blktests.
The above patch is a patch that I developed a few weeks ago but that had not
yet been published on the linux-rdma mailing list.

Thanks,

Bart.
Shinichiro Kawasaki Aug. 25, 2023, 6:50 a.m. UTC | #3
On Aug 24, 2023 / 07:38, Bart Van Assche wrote:
> On 8/23/23 19:46, Shinichiro Kawasaki wrote:
> > On Aug 23, 2023 / 13:57, Bart Van Assche wrote:
> > > After scmd_eh_abort_handler() has called the SCSI LLD eh_abort_handler
> > > callback, it performs one of the following actions:
> > > * Call scsi_queue_insert().
> > > * Call scsi_finish_command().
> > > * Call scsi_eh_scmd_add().
> > > Hence, SCSI abort handlers must not call scsi_done(). Otherwise all
> > > the above actions would trigger a use-after-free. Hence remove the
> > > scsi_done() call from srp_abort(). Keep the srp_free_req() call
> > > before returning SUCCESS because we may not see the command again if
> > > SUCCESS is returned.
> > > 
> > > Cc: Bob Pearson <rpearsonhpe@gmail.com>
> > > Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > > Fixes: d8536670916a ("IB/srp: Avoid having aborted requests hang")
> > > Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> > 
> > Thanks Bart. Do you think this patch fixes the hangs at blktests srp/002 and
> > srp/011? I tried this patch and still see the hang at srp/002, but the hang at
> > srp/011 looks disappearing.
> 
> I have not yet tried to verify whether the above patch affects any blktests.
> The above patch is a patch that I developed a few weeks ago but that had not
> yet been published on the linux-rdma mailing list.

Thanks for the explanation. I do not observe regression with this patch on my
blktests test runs. It looks good that it avoids the srp/011 hang, but not sure
why it avoids it.
Leon Romanovsky Sept. 11, 2023, 11:16 a.m. UTC | #4
On Wed, 23 Aug 2023 13:57:27 -0700, Bart Van Assche wrote:
> After scmd_eh_abort_handler() has called the SCSI LLD eh_abort_handler
> callback, it performs one of the following actions:
> * Call scsi_queue_insert().
> * Call scsi_finish_command().
> * Call scsi_eh_scmd_add().
> Hence, SCSI abort handlers must not call scsi_done(). Otherwise all
> the above actions would trigger a use-after-free. Hence remove the
> scsi_done() call from srp_abort(). Keep the srp_free_req() call
> before returning SUCCESS because we may not see the command again if
> SUCCESS is returned.
> 
> [...]

Applied, thanks!

[1/1] RDMA/srp: Do not call scsi_done() from srp_abort()
      https://git.kernel.org/rdma/rdma/c/e193b7955dfad6

Best regards,
diff mbox series

Patch

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 0e513a7e5ac8..be003fa0168c 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2788,7 +2788,6 @@  static int srp_abort(struct scsi_cmnd *scmnd)
 	u32 tag;
 	u16 ch_idx;
 	struct srp_rdma_ch *ch;
-	int ret;
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n");
 
@@ -2802,19 +2801,14 @@  static int srp_abort(struct scsi_cmnd *scmnd)
 	shost_printk(KERN_ERR, target->scsi_host,
 		     "Sending SRP abort for tag %#x\n", tag);
 	if (srp_send_tsk_mgmt(ch, tag, scmnd->device->lun,
-			      SRP_TSK_ABORT_TASK, NULL) == 0)
-		ret = SUCCESS;
-	else if (target->rport->state == SRP_RPORT_LOST)
-		ret = FAST_IO_FAIL;
-	else
-		ret = FAILED;
-	if (ret == SUCCESS) {
+			      SRP_TSK_ABORT_TASK, NULL) == 0) {
 		srp_free_req(ch, req, scmnd, 0);
-		scmnd->result = DID_ABORT << 16;
-		scsi_done(scmnd);
+		return SUCCESS;
 	}
+	if (target->rport->state == SRP_RPORT_LOST)
+		return FAST_IO_FAIL;
 
-	return ret;
+	return FAILED;
 }
 
 static int srp_reset_device(struct scsi_cmnd *scmnd)