Message ID | 3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fail io_uring zc with sockets not supporting it | expand |
On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote: > We need an efficient way in io_uring to check whether a socket supports > zerocopy with msghdr provided ubuf_info. Add a new flag into the struct > socket flags fields. > > Cc: <stable@vger.kernel.org> # 6.0 > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> > --- > include/linux/net.h | 1 + > net/ipv4/tcp.c | 1 + > net/ipv4/udp.c | 1 + > 3 files changed, 3 insertions(+) > > diff --git a/include/linux/net.h b/include/linux/net.h > index 711c3593c3b8..18d942bbdf6e 100644 > --- a/include/linux/net.h > +++ b/include/linux/net.h > @@ -41,6 +41,7 @@ struct net; > #define SOCK_NOSPACE 2 > #define SOCK_PASSCRED 3 > #define SOCK_PASSSEC 4 > +#define SOCK_SUPPORT_ZC 5 Acked-by: Jakub Kicinski <kuba@kernel.org> Any idea on when this will make it to Linus? If within a week we can probably delay: https://lore.kernel.org/all/dc549a4d5c1d2031c64794c8e12bed55afb85c3e.1666287924.git.pabeni@redhat.com/ and avoid the conflict.
On 10/21/22 17:14, Jakub Kicinski wrote: > On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote: >> We need an efficient way in io_uring to check whether a socket supports >> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct >> socket flags fields. >> >> Cc: <stable@vger.kernel.org> # 6.0 >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >> --- >> include/linux/net.h | 1 + >> net/ipv4/tcp.c | 1 + >> net/ipv4/udp.c | 1 + >> 3 files changed, 3 insertions(+) >> >> diff --git a/include/linux/net.h b/include/linux/net.h >> index 711c3593c3b8..18d942bbdf6e 100644 >> --- a/include/linux/net.h >> +++ b/include/linux/net.h >> @@ -41,6 +41,7 @@ struct net; >> #define SOCK_NOSPACE 2 >> #define SOCK_PASSCRED 3 >> #define SOCK_PASSSEC 4 >> +#define SOCK_SUPPORT_ZC 5 > > Acked-by: Jakub Kicinski <kuba@kernel.org> > > Any idea on when this will make it to Linus? If within a week we can > probably delay: After a chat with Jens, IIUC he can take it and send out to Linus early. So, sounds like a good plan > https://lore.kernel.org/all/dc549a4d5c1d2031c64794c8e12bed55afb85c3e.1666287924.git.pabeni@redhat.com/ > and avoid the conflict.
On 10/22/22 9:57 AM, Pavel Begunkov wrote: > On 10/21/22 17:14, Jakub Kicinski wrote: >> On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote: >>> We need an efficient way in io_uring to check whether a socket supports >>> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct >>> socket flags fields. >>> >>> Cc: <stable@vger.kernel.org> # 6.0 >>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >>> --- >>> ? include/linux/net.h | 1 + >>> ? net/ipv4/tcp.c????? | 1 + >>> ? net/ipv4/udp.c????? | 1 + >>> ? 3 files changed, 3 insertions(+) >>> >>> diff --git a/include/linux/net.h b/include/linux/net.h >>> index 711c3593c3b8..18d942bbdf6e 100644 >>> --- a/include/linux/net.h >>> +++ b/include/linux/net.h >>> @@ -41,6 +41,7 @@ struct net; >>> ? #define SOCK_NOSPACE??????? 2 >>> ? #define SOCK_PASSCRED??????? 3 >>> ? #define SOCK_PASSSEC??????? 4 >>> +#define SOCK_SUPPORT_ZC??????? 5 >> >> Acked-by: Jakub Kicinski <kuba@kernel.org> >> >> Any idea on when this will make it to Linus? If within a week we can >> probably delay: > > After a chat with Jens, IIUC he can take it and send out to > Linus early. So, sounds like a good plan Yes, and let's retain the name for now, can always be changed if we need to make it more granular. I'll ship this off before -rc2.
Am 22.10.22 um 18:07 schrieb Jens Axboe: > On 10/22/22 9:57 AM, Pavel Begunkov wrote: >> On 10/21/22 17:14, Jakub Kicinski wrote: >>> On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote: >>>> We need an efficient way in io_uring to check whether a socket supports >>>> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct >>>> socket flags fields. >>>> >>>> Cc: <stable@vger.kernel.org> # 6.0 >>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >>>> --- >>>> ? include/linux/net.h | 1 + >>>> ? net/ipv4/tcp.c????? | 1 + >>>> ? net/ipv4/udp.c????? | 1 + >>>> ? 3 files changed, 3 insertions(+) >>>> >>>> diff --git a/include/linux/net.h b/include/linux/net.h >>>> index 711c3593c3b8..18d942bbdf6e 100644 >>>> --- a/include/linux/net.h >>>> +++ b/include/linux/net.h >>>> @@ -41,6 +41,7 @@ struct net; >>>> ? #define SOCK_NOSPACE??????? 2 >>>> ? #define SOCK_PASSCRED??????? 3 >>>> ? #define SOCK_PASSSEC??????? 4 >>>> +#define SOCK_SUPPORT_ZC??????? 5 >>> >>> Acked-by: Jakub Kicinski <kuba@kernel.org> >>> >>> Any idea on when this will make it to Linus? If within a week we can >>> probably delay: >> >> After a chat with Jens, IIUC he can take it and send out to >> Linus early. So, sounds like a good plan > > Yes, and let's retain the name for now, can always be changed if we need > to make it more granular. I'll ship this off before -rc2. I'm now always getting -EOPNOTSUPP from SENDMSG_ZC for tcp connections... metze
Am 24.10.22 um 14:49 schrieb Stefan Metzmacher: > Am 22.10.22 um 18:07 schrieb Jens Axboe: >> On 10/22/22 9:57 AM, Pavel Begunkov wrote: >>> On 10/21/22 17:14, Jakub Kicinski wrote: >>>> On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote: >>>>> We need an efficient way in io_uring to check whether a socket supports >>>>> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct >>>>> socket flags fields. >>>>> >>>>> Cc: <stable@vger.kernel.org> # 6.0 >>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >>>>> --- >>>>> ? include/linux/net.h | 1 + >>>>> ? net/ipv4/tcp.c????? | 1 + >>>>> ? net/ipv4/udp.c????? | 1 + >>>>> ? 3 files changed, 3 insertions(+) >>>>> >>>>> diff --git a/include/linux/net.h b/include/linux/net.h >>>>> index 711c3593c3b8..18d942bbdf6e 100644 >>>>> --- a/include/linux/net.h >>>>> +++ b/include/linux/net.h >>>>> @@ -41,6 +41,7 @@ struct net; >>>>> ? #define SOCK_NOSPACE??????? 2 >>>>> ? #define SOCK_PASSCRED??????? 3 >>>>> ? #define SOCK_PASSSEC??????? 4 >>>>> +#define SOCK_SUPPORT_ZC??????? 5 >>>> >>>> Acked-by: Jakub Kicinski <kuba@kernel.org> >>>> >>>> Any idea on when this will make it to Linus? If within a week we can >>>> probably delay: >>> >>> After a chat with Jens, IIUC he can take it and send out to >>> Linus early. So, sounds like a good plan >> >> Yes, and let's retain the name for now, can always be changed if we need >> to make it more granular. I'll ship this off before -rc2. > > I'm now always getting -EOPNOTSUPP from SENDMSG_ZC for tcp connections... The problem is that inet_accept() doesn't set SOCK_SUPPORT_ZC... This hack below fixes it for me, but I'm sure there's a nicer way... The natural way would be to overload inet_csk_accept for tcp, but sk1->sk_prot->accept() is called with sk2->sk_socket == NULL, because sock_graft() is called later... metze diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 3dd02396517d..0f35f2a32756 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -736,6 +736,7 @@ EXPORT_SYMBOL(inet_stream_connect); * Accept a pending connection. The TCP layer now gives BSD semantics. */ +#include <net/transp_v6.h> int inet_accept(struct socket *sock, struct socket *newsock, int flags, bool kern) { @@ -754,6 +755,8 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags, (TCPF_ESTABLISHED | TCPF_SYN_RECV | TCPF_CLOSE_WAIT | TCPF_CLOSE))); + if (sk2->sk_prot == &tcp_prot || sk2->sk_prot == &tcpv6_prot) + set_bit(SOCK_SUPPORT_ZC, &newsock->flags); sock_graft(sk2, newsock); newsock->state = SS_CONNECTED;
On 10/24/22 15:12, Stefan Metzmacher wrote: > Am 24.10.22 um 14:49 schrieb Stefan Metzmacher: >> Am 22.10.22 um 18:07 schrieb Jens Axboe: >>> On 10/22/22 9:57 AM, Pavel Begunkov wrote: >>>> On 10/21/22 17:14, Jakub Kicinski wrote: >>>>> On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote: >>>>>> We need an efficient way in io_uring to check whether a socket supports >>>>>> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct >>>>>> socket flags fields. >>>>>> >>>>>> Cc: <stable@vger.kernel.org> # 6.0 >>>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >>>>>> --- >>>>>> ? include/linux/net.h | 1 + >>>>>> ? net/ipv4/tcp.c????? | 1 + >>>>>> ? net/ipv4/udp.c????? | 1 + >>>>>> ? 3 files changed, 3 insertions(+) >>>>>> >>>>>> diff --git a/include/linux/net.h b/include/linux/net.h >>>>>> index 711c3593c3b8..18d942bbdf6e 100644 >>>>>> --- a/include/linux/net.h >>>>>> +++ b/include/linux/net.h >>>>>> @@ -41,6 +41,7 @@ struct net; >>>>>> ? #define SOCK_NOSPACE??????? 2 >>>>>> ? #define SOCK_PASSCRED??????? 3 >>>>>> ? #define SOCK_PASSSEC??????? 4 >>>>>> +#define SOCK_SUPPORT_ZC??????? 5 >>>>> >>>>> Acked-by: Jakub Kicinski <kuba@kernel.org> >>>>> >>>>> Any idea on when this will make it to Linus? If within a week we can >>>>> probably delay: >>>> >>>> After a chat with Jens, IIUC he can take it and send out to >>>> Linus early. So, sounds like a good plan >>> >>> Yes, and let's retain the name for now, can always be changed if we need >>> to make it more granular. I'll ship this off before -rc2. >> >> I'm now always getting -EOPNOTSUPP from SENDMSG_ZC for tcp connections... > > The problem is that inet_accept() doesn't set SOCK_SUPPORT_ZC... Yeah, you're right, accept is not handled, not great. Thanks for tracking it down. And I'll add a test for it > This hack below fixes it for me, but I'm sure there's a nicer way... > > The natural way would be to overload inet_csk_accept for tcp, > but sk1->sk_prot->accept() is called with sk2->sk_socket == NULL, > because sock_graft() is called later... I think it's acceptable for a fix. sk_is_tcp() sounds a bit better assuming that sk_type/protocol are set there. > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 3dd02396517d..0f35f2a32756 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -736,6 +736,7 @@ EXPORT_SYMBOL(inet_stream_connect); > * Accept a pending connection. The TCP layer now gives BSD semantics. > */ > > +#include <net/transp_v6.h> > int inet_accept(struct socket *sock, struct socket *newsock, int flags, > bool kern) > { > @@ -754,6 +755,8 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags, > (TCPF_ESTABLISHED | TCPF_SYN_RECV | > TCPF_CLOSE_WAIT | TCPF_CLOSE))); > > + if (sk2->sk_prot == &tcp_prot || sk2->sk_prot == &tcpv6_prot) > + set_bit(SOCK_SUPPORT_ZC, &newsock->flags); > sock_graft(sk2, newsock); > > newsock->state = SS_CONNECTED;
Am 24.10.22 um 19:00 schrieb Pavel Begunkov: > On 10/24/22 15:12, Stefan Metzmacher wrote: >> Am 24.10.22 um 14:49 schrieb Stefan Metzmacher: >>> Am 22.10.22 um 18:07 schrieb Jens Axboe: >>>> On 10/22/22 9:57 AM, Pavel Begunkov wrote: >>>>> On 10/21/22 17:14, Jakub Kicinski wrote: >>>>>> On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote: >>>>>>> We need an efficient way in io_uring to check whether a socket supports >>>>>>> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct >>>>>>> socket flags fields. >>>>>>> >>>>>>> Cc: <stable@vger.kernel.org> # 6.0 >>>>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >>>>>>> --- >>>>>>> ? include/linux/net.h | 1 + >>>>>>> ? net/ipv4/tcp.c????? | 1 + >>>>>>> ? net/ipv4/udp.c????? | 1 + >>>>>>> ? 3 files changed, 3 insertions(+) >>>>>>> >>>>>>> diff --git a/include/linux/net.h b/include/linux/net.h >>>>>>> index 711c3593c3b8..18d942bbdf6e 100644 >>>>>>> --- a/include/linux/net.h >>>>>>> +++ b/include/linux/net.h >>>>>>> @@ -41,6 +41,7 @@ struct net; >>>>>>> ? #define SOCK_NOSPACE??????? 2 >>>>>>> ? #define SOCK_PASSCRED??????? 3 >>>>>>> ? #define SOCK_PASSSEC??????? 4 >>>>>>> +#define SOCK_SUPPORT_ZC??????? 5 >>>>>> >>>>>> Acked-by: Jakub Kicinski <kuba@kernel.org> >>>>>> >>>>>> Any idea on when this will make it to Linus? If within a week we can >>>>>> probably delay: >>>>> >>>>> After a chat with Jens, IIUC he can take it and send out to >>>>> Linus early. So, sounds like a good plan >>>> >>>> Yes, and let's retain the name for now, can always be changed if we need >>>> to make it more granular. I'll ship this off before -rc2. >>> >>> I'm now always getting -EOPNOTSUPP from SENDMSG_ZC for tcp connections... >> >> The problem is that inet_accept() doesn't set SOCK_SUPPORT_ZC... > > Yeah, you're right, accept is not handled, not great. Thanks > for tracking it down. And I'll add a test for it > > >> This hack below fixes it for me, but I'm sure there's a nicer way... >> >> The natural way would be to overload inet_csk_accept for tcp, >> but sk1->sk_prot->accept() is called with sk2->sk_socket == NULL, >> because sock_graft() is called later... > > I think it's acceptable for a fix. sk_is_tcp() sounds a bit better > assuming that sk_type/protocol are set there. > > >> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c >> index 3dd02396517d..0f35f2a32756 100644 >> --- a/net/ipv4/af_inet.c >> +++ b/net/ipv4/af_inet.c >> @@ -736,6 +736,7 @@ EXPORT_SYMBOL(inet_stream_connect); >> * Accept a pending connection. The TCP layer now gives BSD semantics. >> */ >> >> +#include <net/transp_v6.h> >> int inet_accept(struct socket *sock, struct socket *newsock, int flags, >> bool kern) >> { >> @@ -754,6 +755,8 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags, >> (TCPF_ESTABLISHED | TCPF_SYN_RECV | >> TCPF_CLOSE_WAIT | TCPF_CLOSE))); >> >> + if (sk2->sk_prot == &tcp_prot || sk2->sk_prot == &tcpv6_prot) >> + set_bit(SOCK_SUPPORT_ZC, &newsock->flags); >> sock_graft(sk2, newsock); Hmm, I think we just need to check if the flag was set on the listening socket, otherwise we have a hard coded value again... This also works for me, and seems like a much nicer patch: diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 3dd02396517d..4728087c42a5 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -754,6 +754,8 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags, (TCPF_ESTABLISHED | TCPF_SYN_RECV | TCPF_CLOSE_WAIT | TCPF_CLOSE))); + if (test_bit(SOCK_SUPPORT_ZC, &sock->flags)) + set_bit(SOCK_SUPPORT_ZC, &newsock->flags); sock_graft(sk2, newsock); newsock->state = SS_CONNECTED;
diff --git a/include/linux/net.h b/include/linux/net.h index 711c3593c3b8..18d942bbdf6e 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -41,6 +41,7 @@ struct net; #define SOCK_NOSPACE 2 #define SOCK_PASSCRED 3 #define SOCK_PASSSEC 4 +#define SOCK_SUPPORT_ZC 5 #ifndef ARCH_HAS_SOCKET_TYPES /** diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 0c51abeee172..aeb7b9eaddc7 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -457,6 +457,7 @@ void tcp_init_sock(struct sock *sk) WRITE_ONCE(sk->sk_sndbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[1])); WRITE_ONCE(sk->sk_rcvbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1])); + set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); sk_sockets_allocated_inc(sk); } EXPORT_SYMBOL(tcp_init_sock); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index d63118ce5900..03616fc5162c 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1620,6 +1620,7 @@ int udp_init_sock(struct sock *sk) { skb_queue_head_init(&udp_sk(sk)->reader_queue); sk->sk_destruct = udp_destruct_sock; + set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); return 0; } EXPORT_SYMBOL_GPL(udp_init_sock);
We need an efficient way in io_uring to check whether a socket supports zerocopy with msghdr provided ubuf_info. Add a new flag into the struct socket flags fields. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- include/linux/net.h | 1 + net/ipv4/tcp.c | 1 + net/ipv4/udp.c | 1 + 3 files changed, 3 insertions(+)