diff mbox series

io_uring/uring_cmd: unconditionally copy SQEs at prep time

Message ID 9698ab08-3f36-42f1-b412-e2190d2e5b6b@kernel.dk (mailing list archive)
State New
Headers show
Series io_uring/uring_cmd: unconditionally copy SQEs at prep time | expand

Commit Message

Jens Axboe Feb. 13, 2025, 3:30 p.m. UTC
This isn't generally necessary, but conditions have been observed where
SQE data is accessed from the original SQE after prep has been done and
outside of the initial issue. Opcode prep handlers must ensure that any
SQE related data is stable beyond the prep phase, but uring_cmd is a bit
special in how it handles the SQE which makes it susceptible to reading
stale data. If the application has reused the SQE before the original
completes, then that can lead to data corruption.

Down the line we can relax this again once uring_cmd has been sanitized
a bit, and avoid unnecessarily copying the SQE.

Reported-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

Let's just do the unconditional copy for now. I kept it on top of the
other patches deliberately, as they tell a story of how we got there.
This will 100% cover all cases, obviously, and then we can focus on
future work on avoiding the copy when unnecessary without having any
rush on that front.

Comments

Caleb Sander Mateos Feb. 13, 2025, 4:16 p.m. UTC | #1
On Thu, Feb 13, 2025 at 7:30 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> This isn't generally necessary, but conditions have been observed where
> SQE data is accessed from the original SQE after prep has been done and
> outside of the initial issue. Opcode prep handlers must ensure that any
> SQE related data is stable beyond the prep phase, but uring_cmd is a bit
> special in how it handles the SQE which makes it susceptible to reading
> stale data. If the application has reused the SQE before the original
> completes, then that can lead to data corruption.
>
> Down the line we can relax this again once uring_cmd has been sanitized
> a bit, and avoid unnecessarily copying the SQE.
>
> Reported-by: Caleb Sander Mateos <csander@purestorage.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> ---
>
> Let's just do the unconditional copy for now. I kept it on top of the
> other patches deliberately, as they tell a story of how we got there.
> This will 100% cover all cases, obviously, and then we can focus on
> future work on avoiding the copy when unnecessary without having any
> rush on that front.

Thanks, we appreciate you quickly addressing the corruption issue.

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>

>
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index 8af7780407b7..b78d050aaa3f 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -186,9 +186,14 @@ static int io_uring_cmd_prep_setup(struct io_kiocb *req,
>         cache->op_data = NULL;
>
>         ioucmd->sqe = sqe;
> -       /* defer memcpy until we need it */
> -       if (unlikely(req->flags & REQ_F_FORCE_ASYNC))
> -               io_uring_cmd_cache_sqes(req);
> +       /*
> +        * Unconditionally cache the SQE for now - this is only needed for
> +        * requests that go async, but prep handlers must ensure that any
> +        * sqe data is stable beyond prep. Since uring_cmd is special in
> +        * that it doesn't read in per-op data, play it safe and ensure that
> +        * any SQE data is stable beyond prep. This can later get relaxed.
> +        */
> +       io_uring_cmd_cache_sqes(req);

If you wanted to micro-optimize this, you could probably avoid the
double store to ioucmd->sqe (ioucmd->sqe = sqe above and ioucmd->sqe =
cache->sqes in io_uring_cmd_cache_sqes()). Because of the intervening
memcpy(), the compiler probably won't be able to eliminate the first
store. Before my change, ioucmd->sqe was only assigned once in
io_uring_cmd_prep_setup(). You could pass sqe into
io_uring_cmd_cache_sqes() instead of obtaining it from ioucmd->sqe.
The cost of an additional (cached) store is probably negligible,
though, so I am fine with the patch as is.

>         return 0;
>  }
>
> @@ -251,16 +256,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
>         }
>
>         ret = file->f_op->uring_cmd(ioucmd, issue_flags);
> -       if (ret == -EAGAIN) {
> -               struct io_uring_cmd_data *cache = req->async_data;
> -
> -               if (ioucmd->sqe != cache->sqes)
> -                       io_uring_cmd_cache_sqes(req);
> -               return -EAGAIN;
> -       } else if (ret == -EIOCBQUEUED) {
> -               return -EIOCBQUEUED;
> -       }
> -
> +       if (ret == -EAGAIN || ret == -EIOCBQUEUED)
> +               return ret;
>         if (ret < 0)
>                 req_set_fail(req);
>         io_req_uring_cleanup(req, issue_flags);
> --
> Jens Axboe
>
Jens Axboe Feb. 13, 2025, 4:19 p.m. UTC | #2
On 2/13/25 9:16 AM, Caleb Sander Mateos wrote:
> On Thu, Feb 13, 2025 at 7:30?AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> This isn't generally necessary, but conditions have been observed where
>> SQE data is accessed from the original SQE after prep has been done and
>> outside of the initial issue. Opcode prep handlers must ensure that any
>> SQE related data is stable beyond the prep phase, but uring_cmd is a bit
>> special in how it handles the SQE which makes it susceptible to reading
>> stale data. If the application has reused the SQE before the original
>> completes, then that can lead to data corruption.
>>
>> Down the line we can relax this again once uring_cmd has been sanitized
>> a bit, and avoid unnecessarily copying the SQE.
>>
>> Reported-by: Caleb Sander Mateos <csander@purestorage.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> Let's just do the unconditional copy for now. I kept it on top of the
>> other patches deliberately, as they tell a story of how we got there.
>> This will 100% cover all cases, obviously, and then we can focus on
>> future work on avoiding the copy when unnecessary without having any
>> rush on that front.
> 
> Thanks, we appreciate you quickly addressing the corruption issue.
> 
> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
> 
>>
>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>> index 8af7780407b7..b78d050aaa3f 100644
>> --- a/io_uring/uring_cmd.c
>> +++ b/io_uring/uring_cmd.c
>> @@ -186,9 +186,14 @@ static int io_uring_cmd_prep_setup(struct io_kiocb *req,
>>         cache->op_data = NULL;
>>
>>         ioucmd->sqe = sqe;
>> -       /* defer memcpy until we need it */
>> -       if (unlikely(req->flags & REQ_F_FORCE_ASYNC))
>> -               io_uring_cmd_cache_sqes(req);
>> +       /*
>> +        * Unconditionally cache the SQE for now - this is only needed for
>> +        * requests that go async, but prep handlers must ensure that any
>> +        * sqe data is stable beyond prep. Since uring_cmd is special in
>> +        * that it doesn't read in per-op data, play it safe and ensure that
>> +        * any SQE data is stable beyond prep. This can later get relaxed.
>> +        */
>> +       io_uring_cmd_cache_sqes(req);
> 
> If you wanted to micro-optimize this, you could probably avoid the
> double store to ioucmd->sqe (ioucmd->sqe = sqe above and ioucmd->sqe =
> cache->sqes in io_uring_cmd_cache_sqes()). Because of the intervening
> memcpy(), the compiler probably won't be able to eliminate the first
> store. Before my change, ioucmd->sqe was only assigned once in
> io_uring_cmd_prep_setup(). You could pass sqe into
> io_uring_cmd_cache_sqes() instead of obtaining it from ioucmd->sqe.
> The cost of an additional (cached) store is probably negligible,
> though, so I am fine with the patch as is.

I did ponder that (passing in the sqe and storing once), but didn't
figure it'd be worthwhile to do. But on second thought, maybe we just do
it, passing in the sqe and letting it be assigned in there. I'll send
out a v2.
diff mbox series

Patch

diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 8af7780407b7..b78d050aaa3f 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -186,9 +186,14 @@  static int io_uring_cmd_prep_setup(struct io_kiocb *req,
 	cache->op_data = NULL;
 
 	ioucmd->sqe = sqe;
-	/* defer memcpy until we need it */
-	if (unlikely(req->flags & REQ_F_FORCE_ASYNC))
-		io_uring_cmd_cache_sqes(req);
+	/*
+	 * Unconditionally cache the SQE for now - this is only needed for
+	 * requests that go async, but prep handlers must ensure that any
+	 * sqe data is stable beyond prep. Since uring_cmd is special in
+	 * that it doesn't read in per-op data, play it safe and ensure that
+	 * any SQE data is stable beyond prep. This can later get relaxed.
+	 */
+	io_uring_cmd_cache_sqes(req);
 	return 0;
 }
 
@@ -251,16 +256,8 @@  int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 	}
 
 	ret = file->f_op->uring_cmd(ioucmd, issue_flags);
-	if (ret == -EAGAIN) {
-		struct io_uring_cmd_data *cache = req->async_data;
-
-		if (ioucmd->sqe != cache->sqes)
-			io_uring_cmd_cache_sqes(req);
-		return -EAGAIN;
-	} else if (ret == -EIOCBQUEUED) {
-		return -EIOCBQUEUED;
-	}
-
+	if (ret == -EAGAIN || ret == -EIOCBQUEUED)
+		return ret;
 	if (ret < 0)
 		req_set_fail(req);
 	io_req_uring_cleanup(req, issue_flags);