diff mbox series

[6/8] block: drivers: complete request locally from blk_mq_tagset_busy_iter

Message ID 20210425085753.2617424-7-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series blk-mq: fix request UAF related with iterating over tagset requests | expand

Commit Message

Ming Lei April 25, 2021, 8:57 a.m. UTC
It can be a bit hard for driver to avoid request UAF between normal
completion and completion via blk_mq_tagset_busy_iter() if async
completion is done in blk_mq_tagset_busy_iter(). Cause request->tag
is only freed after .mq_ops->complete() is called, and rquest->tag
may still be valid after blk_mq_complete_request() is returned from
normal completion path, so this request is still visible in
blk_mq_tagset_busy_iter().

This patch itself can't avoid such request UAF completely. We will
grab a request reference in next patch when walking request via
blk_mq_tagset_busy_iter() for fixing such race, that is why we have
to convert to blk_mq_complete_request_locally() first.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/mtip32xx/mtip32xx.c | 2 +-
 drivers/block/nbd.c               | 2 +-
 drivers/nvme/host/core.c          | 2 +-
 drivers/scsi/scsi_lib.c           | 6 +++++-
 4 files changed, 8 insertions(+), 4 deletions(-)

Comments

Bart Van Assche April 26, 2021, 3:02 a.m. UTC | #1
On 4/25/21 1:57 AM, Ming Lei wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index c289991ffaed..7cbaee282b6d 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1568,7 +1568,11 @@ static void scsi_mq_done(struct scsi_cmnd *cmd)
>  	if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state)))
>  		return;
>  	trace_scsi_dispatch_cmd_done(cmd);
> -	blk_mq_complete_request(cmd->request);
> +
> +	if (unlikely(host_byte(cmd->result) != DID_OK))
> +		blk_mq_complete_request_locally(cmd->request);
> +	else
> +		blk_mq_complete_request(cmd->request);
>  }

This change is so tricky that it deserves a comment.

An even better approach would be *not* to export
blk_mq_complete_request_locally() from the block layer to block drivers
and instead modify the block layer such that it completes a request on
the same CPU if request completion happens from inside the context of a
tag iteration function. That would save driver writers the trouble of
learning yet another block layer API.

Bart.
Ming Lei April 26, 2021, 6:24 a.m. UTC | #2
On Sun, Apr 25, 2021 at 08:02:10PM -0700, Bart Van Assche wrote:
> On 4/25/21 1:57 AM, Ming Lei wrote:
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index c289991ffaed..7cbaee282b6d 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1568,7 +1568,11 @@ static void scsi_mq_done(struct scsi_cmnd *cmd)
> >  	if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state)))
> >  		return;
> >  	trace_scsi_dispatch_cmd_done(cmd);
> > -	blk_mq_complete_request(cmd->request);
> > +
> > +	if (unlikely(host_byte(cmd->result) != DID_OK))
> > +		blk_mq_complete_request_locally(cmd->request);
> > +	else
> > +		blk_mq_complete_request(cmd->request);
> >  }
> 
> This change is so tricky that it deserves a comment.
> 
> An even better approach would be *not* to export
> blk_mq_complete_request_locally() from the block layer to block drivers
> and instead modify the block layer such that it completes a request on
> the same CPU if request completion happens from inside the context of a
> tag iteration function. That would save driver writers the trouble of
> learning yet another block layer API.

Yeah, that is possible, and one request flag(eg. RQF_ITERATED) can be added.
The flag is set before calling ->fn(), and evaluated in
blk_mq_complete_request_remote().

Thanks,
Ming
Ming Lei April 27, 2021, 8:54 a.m. UTC | #3
On Mon, Apr 26, 2021 at 02:24:56PM +0800, Ming Lei wrote:
> On Sun, Apr 25, 2021 at 08:02:10PM -0700, Bart Van Assche wrote:
> > On 4/25/21 1:57 AM, Ming Lei wrote:
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index c289991ffaed..7cbaee282b6d 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1568,7 +1568,11 @@ static void scsi_mq_done(struct scsi_cmnd *cmd)
> > >  	if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state)))
> > >  		return;
> > >  	trace_scsi_dispatch_cmd_done(cmd);
> > > -	blk_mq_complete_request(cmd->request);
> > > +
> > > +	if (unlikely(host_byte(cmd->result) != DID_OK))
> > > +		blk_mq_complete_request_locally(cmd->request);
> > > +	else
> > > +		blk_mq_complete_request(cmd->request);
> > >  }
> > 
> > This change is so tricky that it deserves a comment.
> > 
> > An even better approach would be *not* to export
> > blk_mq_complete_request_locally() from the block layer to block drivers
> > and instead modify the block layer such that it completes a request on
> > the same CPU if request completion happens from inside the context of a
> > tag iteration function. That would save driver writers the trouble of
> > learning yet another block layer API.
> 
> Yeah, that is possible, and one request flag(eg. RQF_ITERATED) can be added.
> The flag is set before calling ->fn(), and evaluated in
> blk_mq_complete_request_remote().

Thinking of the issue further, we have grabbed rq->refcnt before calling
->fn(), not only request UAF is fixed, but also driver gets valid
request instance to check if the request has been completed, so we needn't
to consider the double completion issue[1] any more, which is supposed
to be covered by driver, such as, scsi uses SCMD_STATE_COMPLETE in
scsi_mq_done() for avoiding double completion.

[1] https://lore.kernel.org/linux-block/YIdWz8C5eklPvEiV@T590/T/#u

Thanks,
Ming
diff mbox series

Patch

diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 3be0dbc674bd..05f5e36ee608 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3748,7 +3748,7 @@  static bool mtip_no_dev_cleanup(struct request *rq, void *data, bool reserv)
 	struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
 
 	cmd->status = BLK_STS_IOERR;
-	blk_mq_complete_request(rq);
+	blk_mq_complete_request_locally(rq);
 	return true;
 }
 
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 4ff71b579cfc..3dcf3288efa8 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -809,7 +809,7 @@  static bool nbd_clear_req(struct request *req, void *data, bool reserved)
 	cmd->status = BLK_STS_IOERR;
 	mutex_unlock(&cmd->lock);
 
-	blk_mq_complete_request(req);
+	blk_mq_complete_request_locally(req);
 	return true;
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0896e21642be..a605954477da 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -381,7 +381,7 @@  bool nvme_cancel_request(struct request *req, void *data, bool reserved)
 
 	nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD;
 	nvme_req(req)->flags |= NVME_REQ_CANCELLED;
-	blk_mq_complete_request(req);
+	blk_mq_complete_request_locally(req);
 	return true;
 }
 EXPORT_SYMBOL_GPL(nvme_cancel_request);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c289991ffaed..7cbaee282b6d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1568,7 +1568,11 @@  static void scsi_mq_done(struct scsi_cmnd *cmd)
 	if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state)))
 		return;
 	trace_scsi_dispatch_cmd_done(cmd);
-	blk_mq_complete_request(cmd->request);
+
+	if (unlikely(host_byte(cmd->result) != DID_OK))
+		blk_mq_complete_request_locally(cmd->request);
+	else
+		blk_mq_complete_request(cmd->request);
 }
 
 static void scsi_mq_put_budget(struct request_queue *q)