diff mbox series

[for-6.1,1/3] net: flag sockets supporting msghdr originated zerocopy

Message ID 3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com (mailing list archive)
State New
Headers show
Series fail io_uring zc with sockets not supporting it | expand

Commit Message

Pavel Begunkov Oct. 21, 2022, 10:16 a.m. UTC
We need an efficient way in io_uring to check whether a socket supports
zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
socket flags fields.

Cc: <stable@vger.kernel.org> # 6.0
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/net.h | 1 +
 net/ipv4/tcp.c      | 1 +
 net/ipv4/udp.c      | 1 +
 3 files changed, 3 insertions(+)

Comments

Jakub Kicinski Oct. 21, 2022, 4:14 p.m. UTC | #1
On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote:
> We need an efficient way in io_uring to check whether a socket supports
> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
> socket flags fields.
> 
> Cc: <stable@vger.kernel.org> # 6.0
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/linux/net.h | 1 +
>  net/ipv4/tcp.c      | 1 +
>  net/ipv4/udp.c      | 1 +
>  3 files changed, 3 insertions(+)
> 
> diff --git a/include/linux/net.h b/include/linux/net.h
> index 711c3593c3b8..18d942bbdf6e 100644
> --- a/include/linux/net.h
> +++ b/include/linux/net.h
> @@ -41,6 +41,7 @@ struct net;
>  #define SOCK_NOSPACE		2
>  #define SOCK_PASSCRED		3
>  #define SOCK_PASSSEC		4
> +#define SOCK_SUPPORT_ZC		5

Acked-by: Jakub Kicinski <kuba@kernel.org>

Any idea on when this will make it to Linus? If within a week we can
probably delay:
https://lore.kernel.org/all/dc549a4d5c1d2031c64794c8e12bed55afb85c3e.1666287924.git.pabeni@redhat.com/
and avoid the conflict.
Pavel Begunkov Oct. 22, 2022, 3:57 p.m. UTC | #2
On 10/21/22 17:14, Jakub Kicinski wrote:
> On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote:
>> We need an efficient way in io_uring to check whether a socket supports
>> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
>> socket flags fields.
>>
>> Cc: <stable@vger.kernel.org> # 6.0
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   include/linux/net.h | 1 +
>>   net/ipv4/tcp.c      | 1 +
>>   net/ipv4/udp.c      | 1 +
>>   3 files changed, 3 insertions(+)
>>
>> diff --git a/include/linux/net.h b/include/linux/net.h
>> index 711c3593c3b8..18d942bbdf6e 100644
>> --- a/include/linux/net.h
>> +++ b/include/linux/net.h
>> @@ -41,6 +41,7 @@ struct net;
>>   #define SOCK_NOSPACE		2
>>   #define SOCK_PASSCRED		3
>>   #define SOCK_PASSSEC		4
>> +#define SOCK_SUPPORT_ZC		5
> 
> Acked-by: Jakub Kicinski <kuba@kernel.org>
> 
> Any idea on when this will make it to Linus? If within a week we can
> probably delay:

After a chat with Jens, IIUC he can take it and send out to
Linus early. So, sounds like a good plan


> https://lore.kernel.org/all/dc549a4d5c1d2031c64794c8e12bed55afb85c3e.1666287924.git.pabeni@redhat.com/
> and avoid the conflict.
Jens Axboe Oct. 22, 2022, 4:07 p.m. UTC | #3
On 10/22/22 9:57 AM, Pavel Begunkov wrote:
> On 10/21/22 17:14, Jakub Kicinski wrote:
>> On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote:
>>> We need an efficient way in io_uring to check whether a socket supports
>>> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
>>> socket flags fields.
>>>
>>> Cc: <stable@vger.kernel.org> # 6.0
>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>> ---
>>> ? include/linux/net.h | 1 +
>>> ? net/ipv4/tcp.c????? | 1 +
>>> ? net/ipv4/udp.c????? | 1 +
>>> ? 3 files changed, 3 insertions(+)
>>>
>>> diff --git a/include/linux/net.h b/include/linux/net.h
>>> index 711c3593c3b8..18d942bbdf6e 100644
>>> --- a/include/linux/net.h
>>> +++ b/include/linux/net.h
>>> @@ -41,6 +41,7 @@ struct net;
>>> ? #define SOCK_NOSPACE??????? 2
>>> ? #define SOCK_PASSCRED??????? 3
>>> ? #define SOCK_PASSSEC??????? 4
>>> +#define SOCK_SUPPORT_ZC??????? 5
>>
>> Acked-by: Jakub Kicinski <kuba@kernel.org>
>>
>> Any idea on when this will make it to Linus? If within a week we can
>> probably delay:
> 
> After a chat with Jens, IIUC he can take it and send out to
> Linus early. So, sounds like a good plan

Yes, and let's retain the name for now, can always be changed if we need
to make it more granular. I'll ship this off before -rc2.
Stefan Metzmacher Oct. 24, 2022, 12:49 p.m. UTC | #4
Am 22.10.22 um 18:07 schrieb Jens Axboe:
> On 10/22/22 9:57 AM, Pavel Begunkov wrote:
>> On 10/21/22 17:14, Jakub Kicinski wrote:
>>> On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote:
>>>> We need an efficient way in io_uring to check whether a socket supports
>>>> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
>>>> socket flags fields.
>>>>
>>>> Cc: <stable@vger.kernel.org> # 6.0
>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>> ---
>>>> ? include/linux/net.h | 1 +
>>>> ? net/ipv4/tcp.c????? | 1 +
>>>> ? net/ipv4/udp.c????? | 1 +
>>>> ? 3 files changed, 3 insertions(+)
>>>>
>>>> diff --git a/include/linux/net.h b/include/linux/net.h
>>>> index 711c3593c3b8..18d942bbdf6e 100644
>>>> --- a/include/linux/net.h
>>>> +++ b/include/linux/net.h
>>>> @@ -41,6 +41,7 @@ struct net;
>>>> ? #define SOCK_NOSPACE??????? 2
>>>> ? #define SOCK_PASSCRED??????? 3
>>>> ? #define SOCK_PASSSEC??????? 4
>>>> +#define SOCK_SUPPORT_ZC??????? 5
>>>
>>> Acked-by: Jakub Kicinski <kuba@kernel.org>
>>>
>>> Any idea on when this will make it to Linus? If within a week we can
>>> probably delay:
>>
>> After a chat with Jens, IIUC he can take it and send out to
>> Linus early. So, sounds like a good plan
> 
> Yes, and let's retain the name for now, can always be changed if we need
> to make it more granular. I'll ship this off before -rc2.

I'm now always getting -EOPNOTSUPP from SENDMSG_ZC for tcp connections...

metze
Stefan Metzmacher Oct. 24, 2022, 2:12 p.m. UTC | #5
Am 24.10.22 um 14:49 schrieb Stefan Metzmacher:
> Am 22.10.22 um 18:07 schrieb Jens Axboe:
>> On 10/22/22 9:57 AM, Pavel Begunkov wrote:
>>> On 10/21/22 17:14, Jakub Kicinski wrote:
>>>> On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote:
>>>>> We need an efficient way in io_uring to check whether a socket supports
>>>>> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
>>>>> socket flags fields.
>>>>>
>>>>> Cc: <stable@vger.kernel.org> # 6.0
>>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>>> ---
>>>>> ? include/linux/net.h | 1 +
>>>>> ? net/ipv4/tcp.c????? | 1 +
>>>>> ? net/ipv4/udp.c????? | 1 +
>>>>> ? 3 files changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/include/linux/net.h b/include/linux/net.h
>>>>> index 711c3593c3b8..18d942bbdf6e 100644
>>>>> --- a/include/linux/net.h
>>>>> +++ b/include/linux/net.h
>>>>> @@ -41,6 +41,7 @@ struct net;
>>>>> ? #define SOCK_NOSPACE??????? 2
>>>>> ? #define SOCK_PASSCRED??????? 3
>>>>> ? #define SOCK_PASSSEC??????? 4
>>>>> +#define SOCK_SUPPORT_ZC??????? 5
>>>>
>>>> Acked-by: Jakub Kicinski <kuba@kernel.org>
>>>>
>>>> Any idea on when this will make it to Linus? If within a week we can
>>>> probably delay:
>>>
>>> After a chat with Jens, IIUC he can take it and send out to
>>> Linus early. So, sounds like a good plan
>>
>> Yes, and let's retain the name for now, can always be changed if we need
>> to make it more granular. I'll ship this off before -rc2.
> 
> I'm now always getting -EOPNOTSUPP from SENDMSG_ZC for tcp connections...

The problem is that inet_accept() doesn't set SOCK_SUPPORT_ZC...

This hack below fixes it for me, but I'm sure there's a nicer way...

The natural way would be to overload inet_csk_accept for tcp,
but sk1->sk_prot->accept() is called with sk2->sk_socket == NULL,
because sock_graft() is called later...

metze

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 3dd02396517d..0f35f2a32756 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -736,6 +736,7 @@ EXPORT_SYMBOL(inet_stream_connect);
   *	Accept a pending connection. The TCP layer now gives BSD semantics.
   */

+#include <net/transp_v6.h>
  int inet_accept(struct socket *sock, struct socket *newsock, int flags,
  		bool kern)
  {
@@ -754,6 +755,8 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags,
  		  (TCPF_ESTABLISHED | TCPF_SYN_RECV |
  		  TCPF_CLOSE_WAIT | TCPF_CLOSE)));

+	if (sk2->sk_prot == &tcp_prot || sk2->sk_prot == &tcpv6_prot)
+		set_bit(SOCK_SUPPORT_ZC, &newsock->flags);
  	sock_graft(sk2, newsock);

  	newsock->state = SS_CONNECTED;
Pavel Begunkov Oct. 24, 2022, 5 p.m. UTC | #6
On 10/24/22 15:12, Stefan Metzmacher wrote:
> Am 24.10.22 um 14:49 schrieb Stefan Metzmacher:
>> Am 22.10.22 um 18:07 schrieb Jens Axboe:
>>> On 10/22/22 9:57 AM, Pavel Begunkov wrote:
>>>> On 10/21/22 17:14, Jakub Kicinski wrote:
>>>>> On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote:
>>>>>> We need an efficient way in io_uring to check whether a socket supports
>>>>>> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
>>>>>> socket flags fields.
>>>>>>
>>>>>> Cc: <stable@vger.kernel.org> # 6.0
>>>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>>>> ---
>>>>>> ? include/linux/net.h | 1 +
>>>>>> ? net/ipv4/tcp.c????? | 1 +
>>>>>> ? net/ipv4/udp.c????? | 1 +
>>>>>> ? 3 files changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/include/linux/net.h b/include/linux/net.h
>>>>>> index 711c3593c3b8..18d942bbdf6e 100644
>>>>>> --- a/include/linux/net.h
>>>>>> +++ b/include/linux/net.h
>>>>>> @@ -41,6 +41,7 @@ struct net;
>>>>>> ? #define SOCK_NOSPACE??????? 2
>>>>>> ? #define SOCK_PASSCRED??????? 3
>>>>>> ? #define SOCK_PASSSEC??????? 4
>>>>>> +#define SOCK_SUPPORT_ZC??????? 5
>>>>>
>>>>> Acked-by: Jakub Kicinski <kuba@kernel.org>
>>>>>
>>>>> Any idea on when this will make it to Linus? If within a week we can
>>>>> probably delay:
>>>>
>>>> After a chat with Jens, IIUC he can take it and send out to
>>>> Linus early. So, sounds like a good plan
>>>
>>> Yes, and let's retain the name for now, can always be changed if we need
>>> to make it more granular. I'll ship this off before -rc2.
>>
>> I'm now always getting -EOPNOTSUPP from SENDMSG_ZC for tcp connections...
> 
> The problem is that inet_accept() doesn't set SOCK_SUPPORT_ZC...

Yeah, you're right, accept is not handled, not great. Thanks
for tracking it down. And I'll add a test for it


> This hack below fixes it for me, but I'm sure there's a nicer way...
> 
> The natural way would be to overload inet_csk_accept for tcp,
> but sk1->sk_prot->accept() is called with sk2->sk_socket == NULL,
> because sock_graft() is called later...

I think it's acceptable for a fix. sk_is_tcp() sounds a bit better
assuming that sk_type/protocol are set there.


> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 3dd02396517d..0f35f2a32756 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -736,6 +736,7 @@ EXPORT_SYMBOL(inet_stream_connect);
>    *    Accept a pending connection. The TCP layer now gives BSD semantics.
>    */
> 
> +#include <net/transp_v6.h>
>   int inet_accept(struct socket *sock, struct socket *newsock, int flags,
>           bool kern)
>   {
> @@ -754,6 +755,8 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags,
>             (TCPF_ESTABLISHED | TCPF_SYN_RECV |
>             TCPF_CLOSE_WAIT | TCPF_CLOSE)));
> 
> +    if (sk2->sk_prot == &tcp_prot || sk2->sk_prot == &tcpv6_prot)
> +        set_bit(SOCK_SUPPORT_ZC, &newsock->flags);
>       sock_graft(sk2, newsock);
> 
>       newsock->state = SS_CONNECTED;
Stefan Metzmacher Oct. 24, 2022, 5:45 p.m. UTC | #7
Am 24.10.22 um 19:00 schrieb Pavel Begunkov:
> On 10/24/22 15:12, Stefan Metzmacher wrote:
>> Am 24.10.22 um 14:49 schrieb Stefan Metzmacher:
>>> Am 22.10.22 um 18:07 schrieb Jens Axboe:
>>>> On 10/22/22 9:57 AM, Pavel Begunkov wrote:
>>>>> On 10/21/22 17:14, Jakub Kicinski wrote:
>>>>>> On Fri, 21 Oct 2022 11:16:39 +0100 Pavel Begunkov wrote:
>>>>>>> We need an efficient way in io_uring to check whether a socket supports
>>>>>>> zerocopy with msghdr provided ubuf_info. Add a new flag into the struct
>>>>>>> socket flags fields.
>>>>>>>
>>>>>>> Cc: <stable@vger.kernel.org> # 6.0
>>>>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>>>>> ---
>>>>>>> ? include/linux/net.h | 1 +
>>>>>>> ? net/ipv4/tcp.c????? | 1 +
>>>>>>> ? net/ipv4/udp.c????? | 1 +
>>>>>>> ? 3 files changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/include/linux/net.h b/include/linux/net.h
>>>>>>> index 711c3593c3b8..18d942bbdf6e 100644
>>>>>>> --- a/include/linux/net.h
>>>>>>> +++ b/include/linux/net.h
>>>>>>> @@ -41,6 +41,7 @@ struct net;
>>>>>>> ? #define SOCK_NOSPACE??????? 2
>>>>>>> ? #define SOCK_PASSCRED??????? 3
>>>>>>> ? #define SOCK_PASSSEC??????? 4
>>>>>>> +#define SOCK_SUPPORT_ZC??????? 5
>>>>>>
>>>>>> Acked-by: Jakub Kicinski <kuba@kernel.org>
>>>>>>
>>>>>> Any idea on when this will make it to Linus? If within a week we can
>>>>>> probably delay:
>>>>>
>>>>> After a chat with Jens, IIUC he can take it and send out to
>>>>> Linus early. So, sounds like a good plan
>>>>
>>>> Yes, and let's retain the name for now, can always be changed if we need
>>>> to make it more granular. I'll ship this off before -rc2.
>>>
>>> I'm now always getting -EOPNOTSUPP from SENDMSG_ZC for tcp connections...
>>
>> The problem is that inet_accept() doesn't set SOCK_SUPPORT_ZC...
> 
> Yeah, you're right, accept is not handled, not great. Thanks
> for tracking it down. And I'll add a test for it
> 
> 
>> This hack below fixes it for me, but I'm sure there's a nicer way...
>>
>> The natural way would be to overload inet_csk_accept for tcp,
>> but sk1->sk_prot->accept() is called with sk2->sk_socket == NULL,
>> because sock_graft() is called later...
> 
> I think it's acceptable for a fix. sk_is_tcp() sounds a bit better
> assuming that sk_type/protocol are set there.
> 
> 
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index 3dd02396517d..0f35f2a32756 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -736,6 +736,7 @@ EXPORT_SYMBOL(inet_stream_connect);
>>    *    Accept a pending connection. The TCP layer now gives BSD semantics.
>>    */
>>
>> +#include <net/transp_v6.h>
>>   int inet_accept(struct socket *sock, struct socket *newsock, int flags,
>>           bool kern)
>>   {
>> @@ -754,6 +755,8 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags,
>>             (TCPF_ESTABLISHED | TCPF_SYN_RECV |
>>             TCPF_CLOSE_WAIT | TCPF_CLOSE)));
>>
>> +    if (sk2->sk_prot == &tcp_prot || sk2->sk_prot == &tcpv6_prot)
>> +        set_bit(SOCK_SUPPORT_ZC, &newsock->flags);
>>       sock_graft(sk2, newsock);

Hmm, I think we just need to check if the flag was set on the listening socket,
otherwise we have a hard coded value again...

This also works for me, and seems like a much nicer patch:

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 3dd02396517d..4728087c42a5 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -754,6 +754,8 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags,
  		  (TCPF_ESTABLISHED | TCPF_SYN_RECV |
  		  TCPF_CLOSE_WAIT | TCPF_CLOSE)));

+	if (test_bit(SOCK_SUPPORT_ZC, &sock->flags))
+		set_bit(SOCK_SUPPORT_ZC, &newsock->flags);
  	sock_graft(sk2, newsock);

  	newsock->state = SS_CONNECTED;
diff mbox series

Patch

diff --git a/include/linux/net.h b/include/linux/net.h
index 711c3593c3b8..18d942bbdf6e 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -41,6 +41,7 @@  struct net;
 #define SOCK_NOSPACE		2
 #define SOCK_PASSCRED		3
 #define SOCK_PASSSEC		4
+#define SOCK_SUPPORT_ZC		5
 
 #ifndef ARCH_HAS_SOCKET_TYPES
 /**
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0c51abeee172..aeb7b9eaddc7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -457,6 +457,7 @@  void tcp_init_sock(struct sock *sk)
 	WRITE_ONCE(sk->sk_sndbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[1]));
 	WRITE_ONCE(sk->sk_rcvbuf, READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1]));
 
+	set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags);
 	sk_sockets_allocated_inc(sk);
 }
 EXPORT_SYMBOL(tcp_init_sock);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d63118ce5900..03616fc5162c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1620,6 +1620,7 @@  int udp_init_sock(struct sock *sk)
 {
 	skb_queue_head_init(&udp_sk(sk)->reader_queue);
 	sk->sk_destruct = udp_destruct_sock;
+	set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(udp_init_sock);