Message ID | 9698ab08-3f36-42f1-b412-e2190d2e5b6b@kernel.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io_uring/uring_cmd: unconditionally copy SQEs at prep time | expand |
On Thu, Feb 13, 2025 at 7:30 AM Jens Axboe <axboe@kernel.dk> wrote: > > This isn't generally necessary, but conditions have been observed where > SQE data is accessed from the original SQE after prep has been done and > outside of the initial issue. Opcode prep handlers must ensure that any > SQE related data is stable beyond the prep phase, but uring_cmd is a bit > special in how it handles the SQE which makes it susceptible to reading > stale data. If the application has reused the SQE before the original > completes, then that can lead to data corruption. > > Down the line we can relax this again once uring_cmd has been sanitized > a bit, and avoid unnecessarily copying the SQE. > > Reported-by: Caleb Sander Mateos <csander@purestorage.com> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > --- > > Let's just do the unconditional copy for now. I kept it on top of the > other patches deliberately, as they tell a story of how we got there. > This will 100% cover all cases, obviously, and then we can focus on > future work on avoiding the copy when unnecessary without having any > rush on that front. Thanks, we appreciate you quickly addressing the corruption issue. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > index 8af7780407b7..b78d050aaa3f 100644 > --- a/io_uring/uring_cmd.c > +++ b/io_uring/uring_cmd.c > @@ -186,9 +186,14 @@ static int io_uring_cmd_prep_setup(struct io_kiocb *req, > cache->op_data = NULL; > > ioucmd->sqe = sqe; > - /* defer memcpy until we need it */ > - if (unlikely(req->flags & REQ_F_FORCE_ASYNC)) > - io_uring_cmd_cache_sqes(req); > + /* > + * Unconditionally cache the SQE for now - this is only needed for > + * requests that go async, but prep handlers must ensure that any > + * sqe data is stable beyond prep. Since uring_cmd is special in > + * that it doesn't read in per-op data, play it safe and ensure that > + * any SQE data is stable beyond prep. This can later get relaxed. > + */ > + io_uring_cmd_cache_sqes(req); If you wanted to micro-optimize this, you could probably avoid the double store to ioucmd->sqe (ioucmd->sqe = sqe above and ioucmd->sqe = cache->sqes in io_uring_cmd_cache_sqes()). Because of the intervening memcpy(), the compiler probably won't be able to eliminate the first store. Before my change, ioucmd->sqe was only assigned once in io_uring_cmd_prep_setup(). You could pass sqe into io_uring_cmd_cache_sqes() instead of obtaining it from ioucmd->sqe. The cost of an additional (cached) store is probably negligible, though, so I am fine with the patch as is. > return 0; > } > > @@ -251,16 +256,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) > } > > ret = file->f_op->uring_cmd(ioucmd, issue_flags); > - if (ret == -EAGAIN) { > - struct io_uring_cmd_data *cache = req->async_data; > - > - if (ioucmd->sqe != cache->sqes) > - io_uring_cmd_cache_sqes(req); > - return -EAGAIN; > - } else if (ret == -EIOCBQUEUED) { > - return -EIOCBQUEUED; > - } > - > + if (ret == -EAGAIN || ret == -EIOCBQUEUED) > + return ret; > if (ret < 0) > req_set_fail(req); > io_req_uring_cleanup(req, issue_flags); > -- > Jens Axboe >
On 2/13/25 9:16 AM, Caleb Sander Mateos wrote: > On Thu, Feb 13, 2025 at 7:30?AM Jens Axboe <axboe@kernel.dk> wrote: >> >> This isn't generally necessary, but conditions have been observed where >> SQE data is accessed from the original SQE after prep has been done and >> outside of the initial issue. Opcode prep handlers must ensure that any >> SQE related data is stable beyond the prep phase, but uring_cmd is a bit >> special in how it handles the SQE which makes it susceptible to reading >> stale data. If the application has reused the SQE before the original >> completes, then that can lead to data corruption. >> >> Down the line we can relax this again once uring_cmd has been sanitized >> a bit, and avoid unnecessarily copying the SQE. >> >> Reported-by: Caleb Sander Mateos <csander@purestorage.com> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> >> --- >> >> Let's just do the unconditional copy for now. I kept it on top of the >> other patches deliberately, as they tell a story of how we got there. >> This will 100% cover all cases, obviously, and then we can focus on >> future work on avoiding the copy when unnecessary without having any >> rush on that front. > > Thanks, we appreciate you quickly addressing the corruption issue. > > Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> > >> >> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c >> index 8af7780407b7..b78d050aaa3f 100644 >> --- a/io_uring/uring_cmd.c >> +++ b/io_uring/uring_cmd.c >> @@ -186,9 +186,14 @@ static int io_uring_cmd_prep_setup(struct io_kiocb *req, >> cache->op_data = NULL; >> >> ioucmd->sqe = sqe; >> - /* defer memcpy until we need it */ >> - if (unlikely(req->flags & REQ_F_FORCE_ASYNC)) >> - io_uring_cmd_cache_sqes(req); >> + /* >> + * Unconditionally cache the SQE for now - this is only needed for >> + * requests that go async, but prep handlers must ensure that any >> + * sqe data is stable beyond prep. Since uring_cmd is special in >> + * that it doesn't read in per-op data, play it safe and ensure that >> + * any SQE data is stable beyond prep. This can later get relaxed. >> + */ >> + io_uring_cmd_cache_sqes(req); > > If you wanted to micro-optimize this, you could probably avoid the > double store to ioucmd->sqe (ioucmd->sqe = sqe above and ioucmd->sqe = > cache->sqes in io_uring_cmd_cache_sqes()). Because of the intervening > memcpy(), the compiler probably won't be able to eliminate the first > store. Before my change, ioucmd->sqe was only assigned once in > io_uring_cmd_prep_setup(). You could pass sqe into > io_uring_cmd_cache_sqes() instead of obtaining it from ioucmd->sqe. > The cost of an additional (cached) store is probably negligible, > though, so I am fine with the patch as is. I did ponder that (passing in the sqe and storing once), but didn't figure it'd be worthwhile to do. But on second thought, maybe we just do it, passing in the sqe and letting it be assigned in there. I'll send out a v2.
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 8af7780407b7..b78d050aaa3f 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -186,9 +186,14 @@ static int io_uring_cmd_prep_setup(struct io_kiocb *req, cache->op_data = NULL; ioucmd->sqe = sqe; - /* defer memcpy until we need it */ - if (unlikely(req->flags & REQ_F_FORCE_ASYNC)) - io_uring_cmd_cache_sqes(req); + /* + * Unconditionally cache the SQE for now - this is only needed for + * requests that go async, but prep handlers must ensure that any + * sqe data is stable beyond prep. Since uring_cmd is special in + * that it doesn't read in per-op data, play it safe and ensure that + * any SQE data is stable beyond prep. This can later get relaxed. + */ + io_uring_cmd_cache_sqes(req); return 0; } @@ -251,16 +256,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) } ret = file->f_op->uring_cmd(ioucmd, issue_flags); - if (ret == -EAGAIN) { - struct io_uring_cmd_data *cache = req->async_data; - - if (ioucmd->sqe != cache->sqes) - io_uring_cmd_cache_sqes(req); - return -EAGAIN; - } else if (ret == -EIOCBQUEUED) { - return -EIOCBQUEUED; - } - + if (ret == -EAGAIN || ret == -EIOCBQUEUED) + return ret; if (ret < 0) req_set_fail(req); io_req_uring_cleanup(req, issue_flags);
This isn't generally necessary, but conditions have been observed where SQE data is accessed from the original SQE after prep has been done and outside of the initial issue. Opcode prep handlers must ensure that any SQE related data is stable beyond the prep phase, but uring_cmd is a bit special in how it handles the SQE which makes it susceptible to reading stale data. If the application has reused the SQE before the original completes, then that can lead to data corruption. Down the line we can relax this again once uring_cmd has been sanitized a bit, and avoid unnecessarily copying the SQE. Reported-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> --- Let's just do the unconditional copy for now. I kept it on top of the other patches deliberately, as they tell a story of how we got there. This will 100% cover all cases, obviously, and then we can focus on future work on avoiding the copy when unnecessary without having any rush on that front.