Message ID | 20210425085753.2617424-7-ming.lei@redhat.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | blk-mq: fix request UAF related with iterating over tagset requests | expand |
On 4/25/21 1:57 AM, Ming Lei wrote: > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index c289991ffaed..7cbaee282b6d 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1568,7 +1568,11 @@ static void scsi_mq_done(struct scsi_cmnd *cmd) > if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state))) > return; > trace_scsi_dispatch_cmd_done(cmd); > - blk_mq_complete_request(cmd->request); > + > + if (unlikely(host_byte(cmd->result) != DID_OK)) > + blk_mq_complete_request_locally(cmd->request); > + else > + blk_mq_complete_request(cmd->request); > } This change is so tricky that it deserves a comment. An even better approach would be *not* to export blk_mq_complete_request_locally() from the block layer to block drivers and instead modify the block layer such that it completes a request on the same CPU if request completion happens from inside the context of a tag iteration function. That would save driver writers the trouble of learning yet another block layer API. Bart.
On Sun, Apr 25, 2021 at 08:02:10PM -0700, Bart Van Assche wrote: > On 4/25/21 1:57 AM, Ming Lei wrote: > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index c289991ffaed..7cbaee282b6d 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1568,7 +1568,11 @@ static void scsi_mq_done(struct scsi_cmnd *cmd) > > if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state))) > > return; > > trace_scsi_dispatch_cmd_done(cmd); > > - blk_mq_complete_request(cmd->request); > > + > > + if (unlikely(host_byte(cmd->result) != DID_OK)) > > + blk_mq_complete_request_locally(cmd->request); > > + else > > + blk_mq_complete_request(cmd->request); > > } > > This change is so tricky that it deserves a comment. > > An even better approach would be *not* to export > blk_mq_complete_request_locally() from the block layer to block drivers > and instead modify the block layer such that it completes a request on > the same CPU if request completion happens from inside the context of a > tag iteration function. That would save driver writers the trouble of > learning yet another block layer API. Yeah, that is possible, and one request flag(eg. RQF_ITERATED) can be added. The flag is set before calling ->fn(), and evaluated in blk_mq_complete_request_remote(). Thanks, Ming
On Mon, Apr 26, 2021 at 02:24:56PM +0800, Ming Lei wrote: > On Sun, Apr 25, 2021 at 08:02:10PM -0700, Bart Van Assche wrote: > > On 4/25/21 1:57 AM, Ming Lei wrote: > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > > index c289991ffaed..7cbaee282b6d 100644 > > > --- a/drivers/scsi/scsi_lib.c > > > +++ b/drivers/scsi/scsi_lib.c > > > @@ -1568,7 +1568,11 @@ static void scsi_mq_done(struct scsi_cmnd *cmd) > > > if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state))) > > > return; > > > trace_scsi_dispatch_cmd_done(cmd); > > > - blk_mq_complete_request(cmd->request); > > > + > > > + if (unlikely(host_byte(cmd->result) != DID_OK)) > > > + blk_mq_complete_request_locally(cmd->request); > > > + else > > > + blk_mq_complete_request(cmd->request); > > > } > > > > This change is so tricky that it deserves a comment. > > > > An even better approach would be *not* to export > > blk_mq_complete_request_locally() from the block layer to block drivers > > and instead modify the block layer such that it completes a request on > > the same CPU if request completion happens from inside the context of a > > tag iteration function. That would save driver writers the trouble of > > learning yet another block layer API. > > Yeah, that is possible, and one request flag(eg. RQF_ITERATED) can be added. > The flag is set before calling ->fn(), and evaluated in > blk_mq_complete_request_remote(). Thinking of the issue further, we have grabbed rq->refcnt before calling ->fn(), not only request UAF is fixed, but also driver gets valid request instance to check if the request has been completed, so we needn't to consider the double completion issue[1] any more, which is supposed to be covered by driver, such as, scsi uses SCMD_STATE_COMPLETE in scsi_mq_done() for avoiding double completion. [1] https://lore.kernel.org/linux-block/YIdWz8C5eklPvEiV@T590/T/#u Thanks, Ming
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 3be0dbc674bd..05f5e36ee608 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -3748,7 +3748,7 @@ static bool mtip_no_dev_cleanup(struct request *rq, void *data, bool reserv) struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq); cmd->status = BLK_STS_IOERR; - blk_mq_complete_request(rq); + blk_mq_complete_request_locally(rq); return true; } diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 4ff71b579cfc..3dcf3288efa8 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -809,7 +809,7 @@ static bool nbd_clear_req(struct request *req, void *data, bool reserved) cmd->status = BLK_STS_IOERR; mutex_unlock(&cmd->lock); - blk_mq_complete_request(req); + blk_mq_complete_request_locally(req); return true; } diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 0896e21642be..a605954477da 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -381,7 +381,7 @@ bool nvme_cancel_request(struct request *req, void *data, bool reserved) nvme_req(req)->status = NVME_SC_HOST_ABORTED_CMD; nvme_req(req)->flags |= NVME_REQ_CANCELLED; - blk_mq_complete_request(req); + blk_mq_complete_request_locally(req); return true; } EXPORT_SYMBOL_GPL(nvme_cancel_request); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index c289991ffaed..7cbaee282b6d 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1568,7 +1568,11 @@ static void scsi_mq_done(struct scsi_cmnd *cmd) if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state))) return; trace_scsi_dispatch_cmd_done(cmd); - blk_mq_complete_request(cmd->request); + + if (unlikely(host_byte(cmd->result) != DID_OK)) + blk_mq_complete_request_locally(cmd->request); + else + blk_mq_complete_request(cmd->request); } static void scsi_mq_put_budget(struct request_queue *q)
It can be a bit hard for driver to avoid request UAF between normal completion and completion via blk_mq_tagset_busy_iter() if async completion is done in blk_mq_tagset_busy_iter(). Cause request->tag is only freed after .mq_ops->complete() is called, and rquest->tag may still be valid after blk_mq_complete_request() is returned from normal completion path, so this request is still visible in blk_mq_tagset_busy_iter(). This patch itself can't avoid such request UAF completely. We will grab a request reference in next patch when walking request via blk_mq_tagset_busy_iter() for fixing such race, that is why we have to convert to blk_mq_complete_request_locally() first. Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/block/mtip32xx/mtip32xx.c | 2 +- drivers/block/nbd.c | 2 +- drivers/nvme/host/core.c | 2 +- drivers/scsi/scsi_lib.c | 6 +++++- 4 files changed, 8 insertions(+), 4 deletions(-)