mbox series

[GIT,PULL] io_uring updates for 6.5

Message ID 6e13b6a5-aa10-75f8-973d-023b7aa7f440@kernel.dk (mailing list archive)
State New
Headers show
Series [GIT,PULL] io_uring updates for 6.5 | expand

Pull-request

git://git.kernel.dk/linux.git tags/for-6.5/io_uring-2023-06-23

Message

Jens Axboe June 26, 2023, 2:39 a.m. UTC
Hi Linus,

Nothing major in this release, just a bunch of cleanups and some
optimizations around networking mostly.

Will throw a minor conflict in io_uring/net.c due to the late fixes in
mainline, you'll want to keep the kmsg->msg.msg_inq = -1U; assignment
there.

Please pull!


- Series cleaning up file request flags handling (Christoph)

- Series cleaning up request freeing and CQ locking (Pavel)

- Support for using pre-registering the io_uring fd at setup time (Josh)

- Add support for user allocated ring memory, rather than having the
  kernel allocate it. Mostly for packing rings into a huge page (me)

- Series avoiding an unnecessary double retry on receive (me)

- Maintain ordering for task_work, which also improves performance (me)

- Misc cleanups/fixes (Pavel, me)

The following changes since commit f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6:

  Linux 6.4-rc2 (2023-05-14 12:51:40 -0700)

are available in the Git repository at:

  git://git.kernel.dk/linux.git tags/for-6.5/io_uring-2023-06-23

for you to fetch changes up to c98c81a4ac37b651be7eb9d16f562fc4acc5f867:

  io_uring: merge conditional unlock flush helpers (2023-06-23 08:19:40 -0600)

----------------------------------------------------------------
for-6.5/io_uring-2023-06-23

----------------------------------------------------------------
Christoph Hellwig (8):
      io_uring: remove __io_file_supports_nowait
      io_uring: remove the mode variable in io_file_get_flags
      io_uring: remove a confusing comment above io_file_get_flags
      io_uring: remove io_req_ffs_set
      io_uring: return REQ_F_ flags from io_file_get_flags
      io_uring: use io_file_from_index in __io_sync_cancel
      io_uring: use io_file_from_index in io_msg_grab_file
      io_uring: add helpers to decode the fixed file file_ptr

Jens Axboe (16):
      net: set FMODE_NOWAIT for sockets
      block: mark bdev files as FMODE_NOWAIT if underlying device supports it
      io_uring: rely solely on FMODE_NOWAIT
      io_uring: remove sq/cq_off memset
      io_uring: return error pointer from io_mem_alloc()
      io_uring: add ring freeing helper
      io_uring: support for user allocated memory for rings/sqes
      io_uring/net: initialize struct msghdr more sanely for io_recv()
      io_uring/net: initalize msghdr->msg_inq to known value
      io_uring/net: push IORING_CQE_F_SOCK_NONEMPTY into io_recv_finish()
      io_uring/net: don't retry recvmsg() unnecessarily
      io_uring: maintain ordering for DEFER_TASKRUN tw list
      io_uring: avoid indirect function calls for the hottest task_work
      io_uring: cleanup io_aux_cqe() API
      io_uring: get rid of unnecessary 'length' variable
      io_uring: wait interruptibly for request completions on exit

Josh Triplett (1):
      io_uring: Add io_uring_setup flag to pre-register ring fd and never install it

Pavel Begunkov (14):
      io_uring: annotate offset timeout races
      io_uring/cmd: add cmd lazy tw wake helper
      nvme: optimise io_uring passthrough completion
      io_uring: open code io_put_req_find_next
      io_uring: remove io_free_req_tw
      io_uring: inline io_dismantle_req()
      io_uring: move io_clean_op()
      io_uring: don't batch task put on reqs free
      io_uring: remove IOU_F_TWQ_FORCE_NORMAL
      io_uring: kill io_cq_unlock()
      io_uring: fix acquire/release annotations
      io_uring: inline __io_cq_unlock
      io_uring: make io_cq_unlock_post static
      io_uring: merge conditional unlock flush helpers

 block/fops.c                   |   5 +-
 drivers/nvme/host/ioctl.c      |   4 +-
 include/linux/io_uring.h       |  18 +-
 include/linux/io_uring_types.h |  10 +
 include/uapi/linux/io_uring.h  |  16 +-
 io_uring/cancel.c              |   5 +-
 io_uring/filetable.c           |  11 +-
 io_uring/filetable.h           |  28 ++-
 io_uring/io_uring.c            | 497 ++++++++++++++++++++++-------------------
 io_uring/io_uring.h            |  17 +-
 io_uring/msg_ring.c            |   4 +-
 io_uring/net.c                 |  58 ++---
 io_uring/poll.c                |   6 +-
 io_uring/poll.h                |   2 +
 io_uring/rsrc.c                |   8 +-
 io_uring/rw.c                  |   6 +-
 io_uring/rw.h                  |   1 +
 io_uring/tctx.c                |  31 ++-
 io_uring/timeout.c             |   6 +-
 io_uring/uring_cmd.c           |  16 +-
 net/socket.c                   |   1 +
 21 files changed, 416 insertions(+), 334 deletions(-)

Comments

Linus Torvalds June 26, 2023, 7:40 p.m. UTC | #1
On Sun, 25 Jun 2023 at 19:39, Jens Axboe <axboe@kernel.dk> wrote:
>
> Will throw a minor conflict in io_uring/net.c due to the late fixes in
> mainline, you'll want to keep the kmsg->msg.msg_inq = -1U; assignment
> there.

Can you please share some of those drugs you are on?

Or, better yet, admit you have a problem, and flush those things down
the toilet.

That

        kmsg->msg.msg_inq = -1U;

pattern is complete insanity.

It's truly completely insane, because:

 (a) the whole concept of "-1U" is broken garbage

 (b) msg_inq is an 'int', not "unsigned int"

I want to note that assigning "-1" to an unsigned variable is fine,
and makes perfect sense. "-1" is signed, so if the unsigned variable
is larger, then the sign extension means that assigning "-1" is the
same as setting all bits. Look, no need to worry about the size of the
end result, it always JustWorks(tm).

Ergo: -1 is fine - regardless of whether the end result is signed or unsigned.

But doing the same with "-1U" is *dangerous". Because "-1U" is an
unsigned int, if you assign it to some larger entity, you basically
get a random end result that depends on the size of 'int' and the size
of the destination.

So any time you see something like "-1U", you should go "those are
some bad bad drugs".

It doesn't just look odd - it's actively *WRONG*. It's *STUPID*. And
it's *DANGEROUS*.

Lookie here: the same completely bogus pattern exists in some testing too:

io_uring/net.c:

        if (msg->msg_inq && msg->msg_inq != -1U)

and it all happens to work, but it happens to work for all the wrong
reasons. Because  -1U is unsigned, the "msg->msg_inq != -1U"
comparison is done as "unsigned int", and msg->msg_inq (which contains
a *signed* -1) becomes 4294967295, and it all matches.

But while it happens to work, it's entirely illogical and makes no sense at all.

And if you ever end up in the situation that something is extended to
'long', it will break horribly on 64-bit architectures, since now
"-1U" will literally be 4294967295, while "msg->msg_inq" will become
-1l, and the two are *not* the same.

I don't know who came up with this crazy pattern, but it must die.
"-1U" is garbage. Yes, it means "all bits set in an 'unsigned int'",
so it does have real semantics, but dammit, those semantics are very
seldom the ones you want.

They most *definitely* aren't the ones you want if you then mix things
with a signed type.

And yes, greping for this I found some truly disgusting things elsewhere too:

mm/zsmalloc.c:
        set_freeobj(zspage, (unsigned int)(-1UL));

net/core/rtnetlink.c:
        filters->mask[i] = -1U;

both of which are impressively confused. There's sadly a number of
other cases too.

That zsmalloc.c case is impressively crazy. First you use the wrong
type, then you use the wrong operation, and then you use a cast to fix
it all up. Absolutely none of which makes any sense, and it should
just have used "-1".

The rtnetlink case is also impressive, since mask[] isn't even an
array of 'unsigned int', but a 'u32'. They may be the same thing in
the end, but it's a sign of true confusion to think that "-1u" is
somehow fine.

Again, if you want to assign "all bits set" to a random unsigned type,
just use "-1". Sign extension is _literally_ your friend, the whole
world is 2's complement, and it's the only reason "-1" works in the
first place.

And if what you want is a particular unsigned type with all bits set
(say, unsigned long), I suggest you use "~0ul" to indicate that.

Because if you don't want to rely on sign extension, just use the
actual bit operation, for chrissake!

Ok, that was a rant about code that happens to work, but that is crap
regardless.

I'm doing this pull, but I want this idiocy fixed.

Writing illogical code, and then relying (probably without realizing
it) on some of the stranger parts of the implicit integer type
conversions in C to make that code work, that is just not good.

           Linus
Jens Axboe June 26, 2023, 7:53 p.m. UTC | #2
On 6/26/23 1:40?PM, Linus Torvalds wrote:
> On Sun, 25 Jun 2023 at 19:39, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> Will throw a minor conflict in io_uring/net.c due to the late fixes in
>> mainline, you'll want to keep the kmsg->msg.msg_inq = -1U; assignment
>> there.
> 
> Can you please share some of those drugs you are on?
> 
> Or, better yet, admit you have a problem, and flush those things down
> the toilet.
> 
> That
> 
>         kmsg->msg.msg_inq = -1U;
> 
> pattern is complete insanity.
> 
> It's truly completely insane, because:
> 
>  (a) the whole concept of "-1U" is broken garbage
> 
>  (b) msg_inq is an 'int', not "unsigned int"
> 
> I want to note that assigning "-1" to an unsigned variable is fine,
> and makes perfect sense. "-1" is signed, so if the unsigned variable
> is larger, then the sign extension means that assigning "-1" is the
> same as setting all bits. Look, no need to worry about the size of the
> end result, it always JustWorks(tm).
> 
> Ergo: -1 is fine - regardless of whether the end result is signed or unsigned.
> 
> But doing the same with "-1U" is *dangerous". Because "-1U" is an
> unsigned int, if you assign it to some larger entity, you basically
> get a random end result that depends on the size of 'int' and the size
> of the destination.
> 
> So any time you see something like "-1U", you should go "those are
> some bad bad drugs".
> 
> It doesn't just look odd - it's actively *WRONG*. It's *STUPID*. And
> it's *DANGEROUS*.
> 
> Lookie here: the same completely bogus pattern exists in some testing too:
> 
> io_uring/net.c:
> 
>         if (msg->msg_inq && msg->msg_inq != -1U)
> 
> and it all happens to work, but it happens to work for all the wrong
> reasons. Because  -1U is unsigned, the "msg->msg_inq != -1U"
> comparison is done as "unsigned int", and msg->msg_inq (which contains
> a *signed* -1) becomes 4294967295, and it all matches.
> 
> But while it happens to work, it's entirely illogical and makes no sense at all.
> 
> And if you ever end up in the situation that something is extended to
> 'long', it will break horribly on 64-bit architectures, since now
> "-1U" will literally be 4294967295, while "msg->msg_inq" will become
> -1l, and the two are *not* the same.

Oops yes, I can get that cleaned up. It doesn't really matter in here,
as all we need to know is "did someone assign this value or not", to
avoid relying on msg_inq _always_ returning something when we ask for
it. The actual value of it is way less interesting. Worst case scenario
here is an extra round trip if the available data just happened to
match, which seems basically impossible.

But do agree that it's confusing and bogus, will just change it to -1
in assignment and test that should be it.

> I'm doing this pull, but I want this idiocy fixed.

Thanks! I'll address it in the pre-rc1 pull.
pr-tracker-bot@kernel.org June 26, 2023, 8:02 p.m. UTC | #3
The pull request you sent on Sun, 25 Jun 2023 20:39:02 -0600:

> git://git.kernel.dk/linux.git tags/for-6.5/io_uring-2023-06-23

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/0aa69d53ac7c30f6184f88f2e310d808b32b35a5

Thank you!